Skip to content

Sphinx4

What is Sphinx?

Sphinx is an open source project by Carnegie Mellon University that deals with Natural Language Processing. I primarily use it for speech recognition.

Read more about it on the CMU Sphinx project page.

A four page PDF overview of the sphinx four system.
There are several versions, the latest being written in Java, which is what I’m going to walk through below.

Getting Sphinx

You can get the sphinx code or binaries from sourceforge. If you’re feeling really lucky, get the source, or check it out from subversion, but if you just want to use the engine for speech recognition, just download the binaries. It comes in a jar. We’ll step through that in this post.

When you go to the above sourceforce link, select sphinx4 then download the sphinx4 bin file. Once you download and unzip it, you’ll see a few jars in the bin folder, a demo folder, some documentation, and a lib folder (among other things). The lib folder has the sphinx4 jar in it. What you want is the entire lib folder because it has everything you need in it to do some speech recognition.

Running Sphinx

Get an IDE. You can use whatever IDE you want (IntelliJ, NetBeans) but I will step you through eclipse, which is free. You can read about how to get eclipse in my previous post. In eclipse, create a new Java project (file->new->Java Project) and give it a name (I called mine sphinx4). You’ll see that it made a src folder. Copy the lib folder from the sphinx4 download folder you just unzipped by pasting it into the root folder of the project. Also go into the demo folder and copy the wavfile folder and paste it into the src folder in your eclipse project.

There’s one more file you need. The jsapi.jar file is necessary, but it doesn’t show up anywhere. There is a legal issue about just downloading the jar file, so in the lib folder you’ll see the jsapi.exe file. Run that and the jaspi.jar file will magically appear in the same folder as the jsapi.exe file. In linux, run the jsapi.sh file and it should have the same result. If you can’t get it, Google for it and you should be able to find it. If all else fails, let me know and I’ll help you get it. It must be in your lib folder before we move on.

With the wavfile folder in the src folder and the lib folder under your project root (and with the jsapi.jar file in the lib folder),you can start to link in the jars that you will need to do some simple speech recognition. Expand the lib folder and you’ll see the following jar files in it:

js.jar
jsapi.jar
sphinx4.jar
TIDIGITS_.jar

Right-click on the sphinx4 jar->build path->add to build path. This adds (links) the jar to your build path allows the IDE to use code from the jar for your project. Do the same for all of the jars above.

When that is done, your folder structure should look something like this:

 

snapshot1.png

Notice that there are a few wav files, a .gram file, and a config.xml file. You’ll need to open the config.xml file (right-click on the file->open with->text editor otherwise it’ll open some xml editor that is hard to understand. Find the part of the file that looks like this:

<component name=”jsgfGrammar” type=”edu.cmu.sphinx.jsapi.JSGFGrammar”>
<property name=”dictionary” value=”dictionary”/>
<property name=”grammarLocation”
value=”resource:/demo.sphinx.wavfile.WavFile!/demo/sphinx/wavfile/”/>
<property name=”grammarName” value=”digits”/>
<property name=”logMath” value=”logMath”/>
</component>

It’s about half way into the file. You need to make some changes here. In stead it should look like this (you can paste this in or just remove the demo.sphinx from the first part and the /demo/sphinx from the second part of the middle line):

<component name=”jsgfGrammar” type=”edu.cmu.sphinx.jsapi.JSGFGrammar”>
<property name=”dictionary” value=”dictionary”/>
<property name=”grammarLocation”
value=”resource:/wavfile.WavFile!/wavfile/”/>
<property name=”grammarName” value=”digits”/>
<property name=”logMath” value=”logMath”/>
</component>

Save the file (ctrl-s). Now you’re ready to run sphinx and recognize some simple speech.

Go ahead and open up the wav file named 12345.wav and listen to it. Notice that the spoken words are just that: one two three four five. If all goes well, that’s what sphinx should recognize.

You can run the program by right-clicking on the WavFile.java file (located in the src/wavfile folder)->Run As->Java Application. This should run the recognizer. After a few seconds, you’ll see the text “one two three four five” show up in the Console portion of eclipse. If you got that far, nice work. You were able to perform some speech recognition with sphinx.

 

{ 61 } Comments

  1. ajmagnifico | May 12, 2008 at 4:01 pm | Permalink

    Good work! I was able to get the WavFile working on Linux using NetBeans.

    At first, I ran into a NullPointerException, choking the program right there. This line:

    URL configURL = WavFile.class.getResource(“config.xml”);

    was returning null, because it couldn’t actually find the “config.xml” file in the directory my WavFile.class compiled class file was being run from. This may just be a NetBeans idiosyncrasy. I’m not sure why the .wav and other files ended up being placed in a folder different from the .class file.

    I figured out where NetBeans was placing everything after the compile, and I changed the package declaration at the top of the WavFile.java file to read:

    package wavfile;

    This placed the .class file in the same directory as the config.xml file, and voila! “one two three four five”

  2. admin | May 14, 2008 at 2:49 pm | Permalink

    I’m glad to hear that things worked out. Something about the working path in NetBeans was probably the culprit. When you type WavFile.class it references from the folder where the class file is located, so the config.xml file should have been in there, but you knew that. It was probably putting class files into a compiled folder and not moving the config.xml because it wasn’t importing anything from it. I’m glad to hear you got it working.

  3. Mark | June 3, 2008 at 8:33 pm | Permalink

    I got “one two three four five”.
    Thanks a lot!
    Right now, I want to integrate Asterisk, Festival, and Sphinx. Any suggestion?

  4. admin | June 4, 2008 at 1:30 pm | Permalink

    Integrating those three things is no easy task! I know little about Festival and even less about Asterisk. I did download the latter. Neither look too hard to get running alone, but since they are different programming languages, the integration of the three will be tricky. Give me some time to get to know Asterisk and Festival and I’ll get back to you.

  5. Mark | June 5, 2008 at 10:39 pm | Permalink

    You right!
    Special sphinx4+Asterisk.
    Now I am learning Perl.

  6. rizwan | December 20, 2009 at 11:44 pm | Permalink

    hi,

    it was a nice tutorial, thanks for uploading.

    I just want to know that has any one of you had successfully integrated Asterisk with Sphinx 4..?

    Plz help me doing that..i would be very thankful to you..

    Regards,

  7. Kundan | December 22, 2009 at 3:19 am | Permalink

    I got this error.Can anyone help me to remove it.

    kundan@dev-desktop:~/wavefile$ java FileRecognizer 12345.wav
    file:/home/kundan/wavefile/12345.wav
    URL:file:/home/kundan/wavefile/config.xml
    Loading Recognizer…

    cm:edu.cmu.sphinx.util.props.ConfigurationManager@1
    Recognise: Recognizer: recognizer State: Deallocated
    Recognizer : Recognizer: recognizer State: Ready
    reader: null
    Exception in thread “main” java.lang.NoClassDefFoundError: com/jcraft/jogg/SyncState
    at org.tritonus.sampled.file.jorbis.JorbisAudioFileReader.getAudioFileFormat(JorbisAudioFileReader.java:73)
    at org.tritonus.share.sampled.file.TAudioFileReader.getAudioInputStream(TAudioFileReader.java:366)
    at org.tritonus.share.sampled.file.TAudioFileReader.getAudioInputStream(TAudioFileReader.java:283)
    at javax.sound.sampled.AudioSystem.getAudioInputStream(AudioSystem.java:1128)
    at FileRecognizer.main(FileRecognizer.java:101)
    Caused by: java.lang.ClassNotFoundException: com.jcraft.jogg.SyncState
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at sun.misc.Launcher$ExtClassLoader.findClass(Launcher.java:229)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:303)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:316)
    … 5 more

  8. Kundan | December 22, 2009 at 3:20 am | Permalink

    Mu code is:

    /*
    * Copyright 1999-2004 Carnegie Mellon University.
    * Portions Copyright 2004 Sun Microsystems, Inc.
    * Portions Copyright 2004 Mitsubishi Electric Research Laboratories.
    * All Rights Reserved. Use is subject to license terms.
    *
    * See the file “license.terms” for information on usage and
    * redistribution of this file, and for a DISCLAIMER OF ALL
    * WARRANTIES.
    *
    */

    import edu.cmu.sphinx.frontend.util.StreamDataSource;

    import edu.cmu.sphinx.recognizer.Recognizer;

    import edu.cmu.sphinx.result.Result;

    import edu.cmu.sphinx.util.props.ConfigurationManager;
    import edu.cmu.sphinx.util.props.PropertyException;

    import java.io.File;
    import java.io.IOException;
    import java.net.URL;

    import javax.sound.sampled.AudioFileFormat;
    import javax.sound.sampled.AudioFormat;
    import javax.sound.sampled.AudioInputStream;
    import javax.sound.sampled.AudioSystem;
    import javax.sound.sampled.UnsupportedAudioFileException;

    /**
    * A simple Sphinx-4 application that decodes a .WAV file containing
    * connnected-digits audio data. The audio format
    * itself should be PCM-linear, with the sample rate, bits per sample,
    * sign and endianness as specified in the config.xml file.
    * “${file_prompt}”
    *
    * Set up the default eclipse jre to be the one included
    *
    * Classpath lib order
    * JRE System Lib (custom 5.x version)
    * jl1.0.jar
    * tritonus.jar
    * tritonus_share.jar
    * tritonus_remaining.jar
    * tritonus_mp3.jar
    * sphinx4.jar
    * tools.jar
    * jsapi.jar
    * junit4.1.jar
    * javalayer.jar
    * corpora
    * sphinx4
    *
    */
    public class FileRecognizer {

    /**
    * Main method for running the WavFile demo.
    *
    *
    */
    public boolean convertedFile = false;

    public static void main(String[] args) {
    try {

    URL audioFileURL;

    if (args.length > 0) {
    audioFileURL = new File(args[0]).toURI().toURL();
    System.out.println(audioFileURL);
    } else {
    //if the ${file_prompt} isn’t in the program arguments, it’ll go with this:
    audioFileURL = FileRecognizer.class.getResource(“”);
    }
    URL configURL = FileRecognizer.class.getResource(“config.xml”);
    System.out.println(“URL:”+configURL);

    System.out.println(“Loading Recognizer…\n”);

    ConfigurationManager cm = new ConfigurationManager(configURL);
    System.out.println(“cm:”+cm);
    Recognizer recognizer = (Recognizer) cm.lookup(“recognizer”);
    System.out.println(“Recognise: “+recognizer);
    /* allocate the resource necessary for the recognizer */
    recognizer.allocate();
    System.out.println(“Recognizer : “+recognizer);

    // System.out.println(“Decoding ” + audioFileURL.getFile());
    // System.out.println(AudioSystem.getAudioFileFormat(audioFileURL));

    StreamDataSource reader = (StreamDataSource) cm.lookup(“streamDataSource”);
    System.out.println(“reader: “+reader);

    AudioInputStream ais = AudioSystem.getAudioInputStream(audioFileURL);
    System.out.println(“ais: “+ais);
    FileRecognizer wavFile = new FileRecognizer();
    System.out.println(wavFile);
    // Convert it to the proper format
    AudioFormat targetFormat =

    new AudioFormat(16000f,
    16, // sample size in bits
    1, // mono
    true, // signed
    true);
    System.out.println(targetFormat);

    //new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 16000, 16, 1, 2, 16000, false);
    AudioInputStream convertedAis = wavFile.convertAudioInputStream(ais, targetFormat);
    File newFile = null;
    if (wavFile.convertedFile)
    {
    newFile = wavFile.writeConvertedFile(convertedAis, audioFileURL.toString());
    audioFileURL = newFile.toURI().toURL();
    ais = AudioSystem.getAudioInputStream(audioFileURL);
    }

    /* set the stream data source to read from the audio file */
    reader.setInputStream(ais, audioFileURL.getFile());

    /* decode the audio file */
    Result result = recognizer.recognize();

    /* print out the results */
    if (result != null) {
    System.out.println(“\nRESULT: ” +
    result.getBestFinalResultNoFiller() + “\n”);
    } else {
    System.out.println(“Result: null\n”);
    }

    if (newFile != null)
    newFile.delete();

    } catch (IOException e) {
    System.err.println(“Problem when loading WavFile: ” + e);
    e.printStackTrace();
    } catch (PropertyException e) {
    System.err.println(“Problem configuring WavFile: ” + e);
    e.printStackTrace();
    }
    // catch (InstantiationException e) {System.err.println(“Problem creating WavFile: ” + e); e.printStackTrace();}
    catch (UnsupportedAudioFileException e) {
    System.err.println(“Audio file format not supported: ” + e);
    e.printStackTrace();
    }
    }

    private AudioInputStream convertAudioInputStream(AudioInputStream sourceAis, AudioFormat targetFormat) {
    AudioFormat baseFormat = sourceAis.getFormat();
    AudioFormat intermediateFormat;
    AudioInputStream convertedAis = sourceAis;

    // First convert the encoding, if necessary
    if (!baseFormat.getEncoding().equals(targetFormat.getEncoding())) {
    intermediateFormat = new AudioFormat(
    targetFormat.getEncoding(),
    baseFormat.getSampleRate(), baseFormat.getSampleSizeInBits(), baseFormat.getChannels(),
    baseFormat.getChannels() * 2, baseFormat.getSampleRate(),
    false);
    convertedAis = AudioSystem.getAudioInputStream(intermediateFormat, sourceAis);
    //this.writeConvertedFile(convertedAis, “C:\\encoding.wav”);
    baseFormat = intermediateFormat;
    sourceAis = convertedAis;
    convertedFile = true;
    }

    // Then convert the sample rate
    if (baseFormat.getSampleRate() != targetFormat.getSampleRate()) {
    intermediateFormat = new AudioFormat(
    baseFormat.getEncoding(),
    targetFormat.getSampleRate(), baseFormat.getSampleSizeInBits(), baseFormat.getChannels(),
    baseFormat.getChannels() * 2, targetFormat.getSampleRate(),
    false);
    convertedAis = AudioSystem.getAudioInputStream(intermediateFormat, sourceAis);
    //this.writeConvertedFile(convertedAis, “C:\\sample.wav”);
    baseFormat = intermediateFormat;
    sourceAis = convertedAis;
    convertedFile = true;
    }

    // Then convert the number of channels
    if (baseFormat.getChannels() > targetFormat.getChannels()) {
    intermediateFormat = new AudioFormat(
    baseFormat.getEncoding(),
    baseFormat.getSampleRate(), baseFormat.getSampleSizeInBits(), targetFormat.getChannels(),
    targetFormat.getChannels() * 2, baseFormat.getSampleRate(),
    false);
    convertedAis = AudioSystem.getAudioInputStream(intermediateFormat, sourceAis);
    //this.writeConvertedFile(convertedAis, “C:\\channels.wav”);
    baseFormat = intermediateFormat;
    sourceAis = convertedAis;
    convertedFile = true;
    }
    return convertedAis;
    }

    private File writeConvertedFile(AudioInputStream sourceAis, String fileName)
    {
    File tempfile = null;
    fileName = “tempwavfile.wav”;
    //fileName = fileName.substring(6, fileName.length()-4) + “_new.wav”;

    try
    {
    //This just takes an audio stream, writes it to disk, then plays it the way TALL usually does.
    //it’s a test to see if the input stream is readable by the Java audio providers like Tritonus
    //System.out.println(fileName);
    tempfile = new File(fileName);
    AudioSystem.write(sourceAis, AudioFileFormat.Type.WAVE, tempfile);
    }
    catch (Exception e)
    {
    System.out.println(e);
    }
    return tempfile;
    }

    }

  9. admin | December 29, 2009 at 12:55 am | Permalink

    The exception looks like an audio provider problem, like your ogg vorbis audio provider jar isn’t in your classpath. Add that (link to the site in the MP3s in Sphinx post) and you should be good to go.

  10. pradeep | January 26, 2010 at 10:10 pm | Permalink

    i tried configuring it. but not working.

    it gives a error like this

    java.lang.ExceptionInInitializerError
    Caused by: java.lang.RuntimeException: Uncompilable source code – package edu.cmu.sphinx.frontend.util does not exist
    at wavfile.WavFile.(WavFile.java:15)
    Could not find the main class: wavfile.WavFile. Program will exit.
    Exception in thread “main” Java Result: 1
    BUILD SUCCESSFUL (total time: 0 seconds)

    pls help me if you can

  11. pradeep | January 27, 2010 at 6:27 am | Permalink

    now my error is in

    Exception in thread “main” java.lang.Error: Unresolved compilation problem:

    at wavfile.WavFile.main(WavFile.java:28)

    can someone help me out

  12. pradeep | January 27, 2010 at 7:13 am | Permalink

    thanks.. its working after one day of configuration..

    i want to know how to develop it to the words. only numbers are recognising. and like 4 out of 20 numbers are only correct also.

  13. sagar | January 30, 2010 at 11:50 pm | Permalink

    hey i did as you said above but i m havin a problem…the jsapi.exe file runs in eclipse bt the jsapi.jar file does not appear….if i copy the jar file from lib (by extracting the jar file there in the lib folder and then copying it into eclipse) it does not build path….plz can you help me out with this….i know very little about this….

  14. admin | January 31, 2010 at 12:12 am | Permalink

    First, you just run the jsapi.exe file to extract the jar. Then you put the jar in a folder somewhere in your eclipse project. Then in eclipse, you may need to right-click on the folder you put it in and hit “refresh”. Then go to project->properties->build path->and add the jar.

  15. sagar | January 31, 2010 at 12:14 am | Permalink

    is there any changes that we have to make in wavefile.java….its showing me an error there

    Descriptio The declared package “demo.sphinx.wavfile” does not match the expected package
    “wavfile”
    Resource WavFile.java
    Path /speechreg/src/wavfile
    Location line 13
    Type Java Problem

  16. admin | January 31, 2010 at 12:20 am | Permalink

    This just means that your package layout seems to be different from what it should be. It looks like you could remove the demo.sphinx and leave wavfile and it might work.

  17. sagar | February 2, 2010 at 10:32 am | Permalink

    thx for yr help…as you said i hd tried the same thing removing the demo.sphinx and keeping wavfile as it is and still it hasnt worked…

  18. Virendra | February 9, 2010 at 1:26 pm | Permalink

    I have done the changes in the configuration as above but still after running I am getting this exception please help me out..

    Exception in thread “main” java.lang.RuntimeException: java.io.FileNotFoundException: D:\sphinx4-1.0beta3-src\speech\config.xml (The system cannot find the file specified)
    at edu.cmu.sphinx.util.props.ConfigurationManager.(ConfigurationManager.java:61)
    at edu.cmu.sphinx.demo.wavfile.WavFile.main(WavFile.java:40)

  19. wolfgar | February 10, 2010 at 6:54 am | Permalink

    Nice job, but after doing all steps Im getting this error message:
    Exception in thread “main” Property Exception component:’grammarLocation’ property:’grammarLocation’ – Can’t locate resource:/wavfile.WavFile
    edu.cmu.sphinx.util.props.InternalConfigurationException: java.lang.ClassNotFoundException: wavfile.WavFile
    …..
    Im newbie to eclipse and sphinx so it is possible Im missing something, Ive checked file locations and everything seems to be in right place.
    Thx for any advice!

  20. naxo | February 11, 2010 at 8:15 am | Permalink

    pradeep:

    now my error is in:
    Exception in thread “main” java.lang.Error: Unresolved compilation problem:
    at wavfile.WavFile.main(WavFile.java:28)

    Please, help me.. how you fix your problem??

  21. naxo | February 11, 2010 at 9:57 am | Permalink

    I’m sorry.. already fixed my problem.. into conf.xml had a location wrong

    resource:/wavefile.WavFile!/wavfile/” // :(
    resource:/wavfile.WavFile!/wavfile/” // ok

  22. pradeep | February 25, 2010 at 9:11 am | Permalink

    @naxo

    is it working now. it should work. but have to learn how to make acoustic model. otherwise the accuracy is pretty low

  23. Blitzkrieg | March 16, 2010 at 11:13 pm | Permalink

    Hi, Anybody found how to integrate sphinx-4 with asterisk ?

    Thanks!!!!

  24. olfa | March 29, 2011 at 6:15 am | Permalink

    hi,
    i don’t find sphinx4.jar

  25. admin | April 1, 2011 at 2:50 am | Permalink

    Check the CMU sphinx website, it has more up-to-date information and jars. I have to admit that my post on sphinx4 is somewhat out of date. I hope to post an update soon.

  26. techstu123 | June 7, 2011 at 2:50 am | Permalink

    Hi, i got the following error:

    Problem when loading WavFile: java.io.IOException: Error while parsing line 102 of file:/D:/Books%20&%20Notes/MCA/SEM%205/Project/sphinx4/bin/wavfile/config.xml: Open quote is expected for attribute “{1}” associated with an element type “name”.
    java.io.IOException: Error while parsing line 102 of file:/D:/Books%20&%20Notes/MCA/SEM%205/Project/sphinx4/bin/wavfile/config.xml: Open quote is expected for attribute “{1}” associated with an element type “name”.
    at edu.cmu.sphinx.util.props.SaxLoader.load(SaxLoader.java:70)
    at edu.cmu.sphinx.util.props.ConfigurationManager.loader(ConfigurationManager.java:383)
    at edu.cmu.sphinx.util.props.ConfigurationManager.(ConfigurationManager.java:115)
    at wavfile.WavFile.main(WavFile.java:61)

    What could be wrong?I really need this for a project.Please help.

  27. admin | June 7, 2011 at 2:53 am | Permalink

    Look at your config xml file. It looks like something in there on line 102 that is causing the error.

  28. techstu123 | June 7, 2011 at 3:02 am | Permalink

    My mistake,i didn’t correct the quotes aftr copying, it worked.thankyou :)

  29. stutech123 | June 7, 2011 at 5:23 am | Permalink

    THANKS A LOT!

  30. stutech123 | June 7, 2011 at 5:24 am | Permalink

    i’ve been searchin from a long time how to make it work,its working now :)

  31. msmoon | June 28, 2011 at 11:40 pm | Permalink

    I know this is a fairly old thread but I REALLY need help getting this error fixed. Thank you for any and all help in advance :)

    Exception in thread “main” java.lang.NullPointerException
    at edu.cmu.sphinx.util.props.SaxLoader.load(SaxLoader.java:74)
    at edu.cmu.sphinx.util.props.ConfigurationManager.(ConfigurationManager.java:58)
    at robottest.Robottest.main(Robottest.java:28)

    74: InputStream is = url.openStream();

    58: rawPropertyMap = new SaxLoader(url, globalProperties).load();

    28: cm = new ConfigurationManager(Robottest.class.getResource(“robottest.config.xml”));

  32. msmoon | June 29, 2011 at 6:34 am | Permalink

    s/n: I am using netbeans to build and compile and my configuration file is there.

  33. admin | June 29, 2011 at 6:54 am | Permalink

    @msmoon – Try using an absolute path to your config file. It’s probably a netbeans relative path issue.

  34. msmoon | June 29, 2011 at 10:54 am | Permalink

    Thanks for the fast response. Unfortunately that did not correct the error.

    /home/dekita/NetBeansProjects/robottest/build.xml:13: The following error occurred while executing this line:
    /home/dekita/NetBeansProjects/robottest/robottest.config.xml:11: Unexpected element “{}config” {antlib:org.apache.tools.ant}config

    13:
    11:
    ?????
    I even tried spelling out the word configuration, taking out the configuration.

  35. msmoon | June 29, 2011 at 10:58 am | Permalink

    13:
    11:

    sorry, it didnt show up(tried to copy and paste from virtual box to Mac)

    Thank you again!

  36. msmoon | June 29, 2011 at 10:59 am | Permalink

    line 13 is the
    and line 11 is the

  37. msmoon | June 29, 2011 at 11:01 am | Permalink

    ok, one more time..I apologize again for this 13 is import and 11 is just the config line

  38. admin | July 1, 2011 at 5:00 am | Permalink

    Any chance you can paste in the lines in your build.xml files? Paste in the surrounding lines, as well.

  39. Romika | September 22, 2011 at 9:00 am | Permalink

    Hi,i m doin g project on sphinx4.0 with netbeans.My project is about recognizing two nubers and wheni m speak next it will add that nuber to the textarea.My problem is that it will continuous recgniges even i m not speak anything.so please help me to make the config file i am using wsjLoader.please replay soon.

  40. admin | September 22, 2011 at 10:02 pm | Permalink

    Romika, I sent you an email, but the short answer is that there is a while loop that just picks up microphone input continuously. You need to remove the while loop. For an example, look in HelloWorld.java.

  41. Romika | September 23, 2011 at 3:33 am | Permalink

    hi,thanks for the answer but it will not work.My problem is that i have to speek first two numbers and then speek next to display that two numbers as one entire number
    in textarea and again speek two numbers and speek next to display that number in the textarea and so on.I send my code to you please refer it replay me soon because i have to demonstrate it tomorrow.Please replay as early as possible.

    package demo.sphinx;

    import java.awt.event.ActionEvent;
    import javax.swing.JFrame;
    import javax.swing.JTextField;
    import javax.swing.JTextArea;

    import edu.cmu.sphinx.frontend.util.Microphone;
    import edu.cmu.sphinx.linguist.dictionary.Pronunciation;
    import edu.cmu.sphinx.linguist.dictionary.Word;
    import edu.cmu.sphinx.recognizer.Recognizer;
    import edu.cmu.sphinx.result.Result;
    import edu.cmu.sphinx.util.props.ConfigurationManager;
    import edu.cmu.sphinx.util.props.PropertyException;
    import java.awt.ComponentOrientation;
    import java.awt.Dimension;
    import java.awt.GridBagConstraints;
    import java.awt.GridBagLayout;
    import java.awt.GridLayout;
    import java.awt.event.ActionEvent;
    import javax.swing.DefaultListModel;
    import javax.swing.JList;
    import javax.swing.JLabel;
    import javax.swing.JButton;
    import javax.swing.JFrame;
    import javax.swing.JOptionPane;
    import javax.swing.JPanel;
    import javax.swing.JTabbedPane;
    import javax.swing.JScrollPane;
    import javax.swing.JSplitPane;
    import java.awt.event.ActionListener;
    import java.io.IOException;
    import java.util.ArrayList;
    import javax.swing.JTable;
    import javax.swing.JTextArea;
    public class Testing extends JFrame implements ActionListener{

    JList ul,ll,fi;
    DefaultListModel d1,d2,d3;
    JLabel msg,lblul,lblfi;
    JButton btnst,btnso,btnclr;
    JTable tb1;
    JTextArea ta;
    JPanel mainpane,mainpane1,fc1,h1,fp;
    JSplitPane sp,sp1;
    JTabbedPane main;
    JScrollPane listll;
    Recognizer recognizer;
    ConfigurationManager cm;
    Result result;
    String resultText;
    Testing()
    {
    mainpane=new JPanel();
    mainpane1=new JPanel();
    main=new JTabbedPane();
    sp=new JSplitPane();
    btnst=new JButton(“Start”);
    btnso=new JButton(“Stop”);
    btnclr=new JButton(“Show”);
    btnst.addActionListener(this);
    btnso.addActionListener(this);
    btnclr.addActionListener(this);
    ta=new JTextArea(10,10);
    ta.setEnabled(true);
    ta.setEditable(true);

    sp1=new JSplitPane(JSplitPane.VERTICAL_SPLIT);
    add(sp);
    sp.setLeftComponent(sp1);
    sp.setRightComponent(main);
    sp1.setTopComponent(mainpane);
    sp1.setBottomComponent(mainpane1);
    mainpane1.add(new JLabel(“HELPSpeak :Start to start the Microphone.”
    + “L For Lower Limit.U For Upper Limit.F for Frequency.”
    + “N for Next.C for Clear.”
    + “Stop to stop the Microphone.”));
    mainpane1.add(ta);
    mainpane.setLayout(new GridBagLayout());
    GridBagConstraints c = new GridBagConstraints();
    d1=new DefaultListModel();
    d2=new DefaultListModel();
    d3=new DefaultListModel();
    ul = new JList(d2);
    ll = new JList(d1);
    fi = new JList(d3);
    JScrollPane listul = new JScrollPane(ul);
    listll = new JScrollPane(ll);
    JScrollPane listfi = new JScrollPane(fi);
    listul.setPreferredSize(new Dimension(60, 200));
    listll.setPreferredSize(new Dimension(60, 200));
    listfi.setPreferredSize(new Dimension(60, 200));
    c.gridx = 0;
    c.gridy = 0;
    mainpane.add(new JLabel(“Lower Limit “),c);
    c.gridx = 1;
    c.gridy = 0;
    mainpane.add(new JLabel(“Upper Limit “),c);
    c.gridx = 2;
    c.gridy = 0;
    mainpane.add(new JLabel(“Frequency”),c);
    c.gridx = 0;

    c.gridy = 1;
    mainpane.add(listll,c);
    c.gridx = 1;
    c.gridy = 1;
    mainpane.add(listul,c);
    c.gridx = 2;
    c.gridy = 1;
    mainpane.add(listfi,c);
    c.gridx = 0;

    c.gridy = 2;
    mainpane.add(btnst,c);
    c.gridx = 1;
    c.gridy = 2;
    mainpane.add(btnso,c);
    c.gridx = 2;
    c.gridy = 2;
    mainpane.add(btnclr,c);

    }
    public void actionPerformed(ActionEvent ae)
    {
    String str=ae.getActionCommand();
    String digit=”";
    if(str.equals(“Start”))
    {
    ta.append(“hi”);

    }
    //ta.append(digit);
    if(str.equals(“Show”))
    {
    btnclr.setEnabled(false) ;
    main.addTab(“Histogram”,new Histogram());
    main.addTab(“Frequency Polygon”,new FrequencyPolygon());
    }

    }

    public void Voice()
    {
    try{
    ConfigurationManager cm = new ConfigurationManager(Testing.class.getResource(“gui.config.xml”));

    Recognizer recognizer = (Recognizer) cm.lookup(“recognizer”);
    Microphone microphone = (Microphone) cm.lookup(“microphone”);
    recognizer.allocate();
    if (microphone.startRecording()) {

    while(true){

    Result result = recognizer.recognize();

    if (result != null) {
    String resultText = result.getBestFinalResultNoFiller();
    ta.append(“You said: ” + resultText + “\n”);
    } else {
    ta.append(“I can’t hear what you said.\n”);
    }
    }
    } else {
    ta.append(“Cannot start microphone.”);
    recognizer.deallocate();
    System.exit(1);
    }
    } catch (IOException e) {
    System.err.println(“Problem when loading HelloWorld: ” + e);
    e.printStackTrace();
    } catch (PropertyException e) {
    System.err.println(“Problem configuring HelloWorld: ” + e);
    e.printStackTrace();
    } catch (InstantiationException e) {
    System.err.println(“Problem creating HelloWorld: ” + e);
    e.printStackTrace();
    }
    }

    public static void main(String[] args)
    {
    Testing t=new Testing();
    t.setVisible(true);
    t.setSize(500,500);
    t.setTitle(“Statistical Voice Calculator for Graph Plotting”);
    t.setExtendedState(JFrame.MAXIMIZED_BOTH);
    t.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
    t.Voice();

    }

    }

  42. admin | September 23, 2011 at 3:45 am | Permalink

    I would need your config.xml file and your grammar. You should just be using a grammar that has all the numbers separated by |. That will fitler out any input as numbers.

  43. Romika | September 23, 2011 at 10:09 am | Permalink

    hi,i’m sending you the config and grammar files you asked for.please reply soon

    gui.config.xml







    accuracyTracker
    speedTracker
    memoryTracker















    microphone
    premphasizer
    windower
    fft
    melFilterBank
    dct
    liveCMN
    featureExtraction



    microphone
    speechClassifier
    speechMarker
    nonSpeechDataFilter
    premphasizer
    windower
    fft
    melFilterBank
    dct
    liveCMN
    featureExtraction







    grammar file:=

    #JSGF V1.0;

    /**
    * JSGF Grammar for Hello World example
    */

    grammar Test_1;
    public = ( NEXT | BACK | CLEAR | CURVE | HISTOGRAM | UPPER | LOWER | FREQUENCY | SHOW );
    public = ( ONE | TWO | THREE | FOUR | FIVE | SIX | SEVEN | EIGHT | NINE | ZERO );

  44. Romika | September 23, 2011 at 10:24 am | Permalink

    hi, m sending you the config file again…







    accuracyTracker
    speedTracker
    memoryTracker















    microphone
    premphasizer
    windower
    fft
    melFilterBank
    dct
    liveCMN
    featureExtraction



    microphone
    speechClassifier
    speechMarker
    nonSpeechDataFilter
    premphasizer
    windower
    fft
    melFilterBank
    dct
    liveCMN
    featureExtraction







  45. admin | September 23, 2011 at 11:13 am | Permalink

    I still think it’s a problem with the while loop. It will recognize speech until there is an aparent pause, then it transribes that, then does it all over again. Can you add some action events to you code that starts the microphone, then another button to turn it off when you are done? That’s the way it’s usually done.

  46. Romika | September 23, 2011 at 11:21 am | Permalink

    hi, thnx for the reply.Can u send me the code 4 that coz we tried to do that only but it didn’t work out.We tried all possible ways but none is working.my problem is that if don’t say anything then it automatically takes 8 and doesn’t recognize 6 and 7.We even tried to remove the while loop but then it recognizes only a single digit.

  47. madhulatha | November 3, 2011 at 1:50 am | Permalink

    hi,
    i’ve tried to execute the program but i’m getting the following errors..plz help me

    Loading Recognizer as defined in ‘file:/C:/Program%20Files/Java/speechtotext/sphinx4/bin/wavfile/config.xml’…

    Exception in thread “main” java.lang.NoSuchFieldError: engineListeners
    at com.sun.speech.engine.recognition.BaseRecognizer.fireRecognizerSuspended(BaseRecognizer.java:922)
    at com.sun.speech.engine.recognition.BaseRecognizer.dispatchSpeechEvent(BaseRecognizer.java:1262)
    at com.sun.speech.engine.SpeechEventUtilities.postSpeechEvent(SpeechEventUtilities.java:201)
    at com.sun.speech.engine.SpeechEventUtilities.postSpeechEvent(SpeechEventUtilities.java:132)
    at com.sun.speech.engine.recognition.BaseRecognizer.postRecognizerSuspended(BaseRecognizer.java:912)
    at com.sun.speech.engine.recognition.BaseRecognizer.commitChanges(BaseRecognizer.java:358)
    at edu.cmu.sphinx.jsapi.JSGFGrammar.commitChanges(JSGFGrammar.java:536)
    at edu.cmu.sphinx.jsapi.JSGFGrammar.createGrammar(JSGFGrammar.java:243)
    at edu.cmu.sphinx.linguist.language.grammar.Grammar.allocate(Grammar.java:101)
    at edu.cmu.sphinx.linguist.flat.FlatLinguist.allocate(FlatLinguist.java:229)
    at edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager.allocate(SimpleBreadthFirstSearchManager.java:603)
    at edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:67)
    at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:157)
    at wavfile.WavFile.main(WavFile.java:46)

  48. admin | November 3, 2011 at 2:16 am | Permalink

    It means that “engineListeners” is not defined in your config.xml file. Can you search the text of your config file and find engineListeners? You can look at some of the demos in sphinx for some other sample config.xml files.

  49. madhulatha | November 28, 2011 at 11:34 pm | Permalink

    hi,

    this is our config.xml file for wavFile demo





















    audioFileDataSource
    preemphasizer
    dither
    windower
    fft
    melFilterBank
    dct
    batchCMN
    featureExtraction





    we didn’t find the engine listeners

  50. admin | November 29, 2011 at 12:11 am | Permalink

    Okay, the problem might be in the Java code that is getting the elements from the .xml file. I would suggest modeling your code after the latest sphinx4 code. In fact, get the latest sphinx4 with subversion, then run the edu.cmu.sphinx.demo.hellongram.HelloNGram file. That shows you how to access a microphone directly and how the config xml file should be.

  51. madhulatha | December 1, 2011 at 9:59 pm | Permalink

    Hi,

    we have tried using sphinx4-.0-beta6 version andexecuted hellongram program but getting the following error

    Exception in thread “main” Property exception component:’trigramModel’ property:’location’ – Can’t locate resource:/edu/cmu/sphinx/demo/hellongram/hellongram.trigram.lm
    edu.cmu.sphinx.util.props.InternalConfigurationException: Can’t locate resource:/edu/cmu/sphinx/demo/hellongram/hellongram.trigram.lm
    at edu.cmu.sphinx.util.props.ConfigurationManagerUtils.getResource(ConfigurationManagerUtils.java:483)
    at edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel.newProperties(SimpleNGramModel.java:93)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)
    at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.newProperties(LexTreeLinguist.java:311)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)
    at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.newProperties(WordPruningBreadthFirstSearchManager.java:204)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)
    at edu.cmu.sphinx.decoder.AbstractDecoder.newProperties(AbstractDecoder.java:65)
    at edu.cmu.sphinx.decoder.Decoder.newProperties(Decoder.java:37)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)
    at edu.cmu.sphinx.recognizer.Recognizer.newProperties(Recognizer.java:90)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)
    at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:161)
    at hellongram.HelloNGram.main(HelloNGram.java:38)

  52. admin | December 1, 2011 at 11:18 pm | Permalink

    Did you just download the jar, or check it out from svn? I didn’t have any problems checking it out directly in eclipse with svn. You can download the trigram langauge model (along with everything else that HelloNGram needs) directly from the svn repository:
    https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/sphinx4/src/apps/edu/cmu/sphinx/demo/hellongram/
    And you can use your own language model, just open the config xml file and change where it points to.

  53. madhulatha | December 2, 2011 at 1:43 am | Permalink

    i have just downloaded the sphinx4-1.0beta6 bin and unzipped it and got the hellongram folder in the src folder.
    it has
    HelloNGram.java,HelloNGram.config,HelloNGram.manifest,HelloNgram.test,HelloNgram.trigram.lm files

  54. admin | December 2, 2011 at 1:49 am | Permalink

    Right, so follow the instructions on the CMU Sphinx site to run it:
    http://cmusphinx.sourceforge.net/sphinx4/src/apps/edu/cmu/sphinx/demo/hellongram/README.html

    I just downloaded it and it ran just fine. But this goes back to your original problem with your config. My suggestion is that you look at the HelloNGram.config file and use that as a basis for your config file.

  55. madhulatha | December 2, 2011 at 2:06 am | Permalink

    what ever you told is the execution of jar folder in the bin directory but how to execute the program in the src\apps folder

  56. admin | December 2, 2011 at 2:36 am | Permalink

    For that, you need to get the source code. I suggest downloading eclipse, adding eclipse subversive (see previous post about eclipse) and then checking out the sphinx4 code from https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/sphinx4/. You can follow similar instructions as in this post. If you download and extract eclipse, run it, install subversive, then check out the project you will have all the sphinx4 code and you should be able to run the HelloNGram.java file directly from eclipse.

  57. madhulatha | December 2, 2011 at 11:06 pm | Permalink

    i have tried it using the subversive but still i’m getting the same errors,the error is
    Exception in thread “main” Property exception component:’trigramModel’ property:’location’ – Can’t locate resource:/edu/cmu/sphinx/demo/hellongram/hellongram.trigram.lm
    edu.cmu.sphinx.util.props.InternalConfigurationException: Can’t locate resource:/edu/cmu/sphinx/demo/hellongram/hellongram.trigram.lm
    at edu.cmu.sphinx.util.props.ConfigurationManagerUtils.getResource(ConfigurationManagerUtils.java:483)
    at edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel.newProperties(SimpleNGramModel.java:93)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)
    at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.newProperties(LexTreeLinguist.java:311)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)
    at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.newProperties(WordPruningBreadthFirstSearchManager.java:204)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)
    at edu.cmu.sphinx.decoder.AbstractDecoder.newProperties(AbstractDecoder.java:65)
    at edu.cmu.sphinx.decoder.Decoder.newProperties(Decoder.java:37)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)
    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)
    at edu.cmu.sphinx.recognizer.Recognizer.newProperties(Recognizer.java:90)
    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)
    at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:161)
    at hellongram.HelloNGram.main(HelloNGram.java:38)

  58. madhulatha | December 3, 2011 at 3:19 am | Permalink

    hey i got the execution for the program..that’s a problem in config.xml :)

  59. madhulatha | December 4, 2011 at 11:28 pm | Permalink

    hii,
    my program got executed in ecllipse but the speech is not recognised and there’s no output

  60. Amit | January 9, 2012 at 5:07 am | Permalink

    I want to create an application which takes a .wav (or any other standard audio file format) and converts it to text.

    For Speech Recognition I have decided to use sphinx4, I have tried to run the demo Transcriber.jar provided with sphinx. Its good but That only works for a specific Grammar (written in .gram and .gxml files). How can I develop a similar program for US English ?

    How do I proceed? Where do I get the language model for US English that can be used with Sphinx4?

    Any Step by step tutorial/blog/post/answer will be of great help,

  61. admin | January 9, 2012 at 5:18 am | Permalink

    You can either make your own, or get an existing one. You can make a simple trigram language model with the online tool by CMU: http://www.speech.cs.cmu.edu/tools/lmtool-adv.html
    You can use SRILM, IRSTLM, or whatever else to train an n-gram language model on a corpus of English text and output it into arpa format which is readable by Sphinx. There are language models that go with the corresponding acoustic models that you can download (like Hub4 and WSJ). Those can be found at the bottom of the list here:http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
    Make sure the pronunciation dictionaries, etc, are all there too. If you download the acoustic model and language model, then you should have everything you need. Just make sure the config.xml points to the right places.

{ 1 } Trackback

  1. [...] recognizes the file and prints the results. Assuming you have the Sphinx4 jar file linked in (see this post if you don’t