Sphinx4

What is Sphinx?

Sphinx is an open source project by Carnegie Mellon University that deals with Natural Language Processing. I primarily use it for speech recognition.

Read more about it on the CMU Sphinx project page.

A four page PDF overview of the sphinx four system.
There are several versions, the latest being written in Java, which is what I’m going to walk through below.

Getting Sphinx

You can get the sphinx code or binaries from sourceforge. If you’re feeling really lucky, get the source, or check it out from subversion, but if you just want to use the engine for speech recognition, just download the binaries. It comes in a jar. We’ll step through that in this post.

When you go to the above sourceforce link, select sphinx4 then download the sphinx4 bin file. Once you download and unzip it, you’ll see a few jars in the bin folder, a demo folder, some documentation, and a lib folder (among other things). The lib folder has the sphinx4 jar in it. What you want is the entire lib folder because it has everything you need in it to do some speech recognition.

Running Sphinx

Get an IDE. You can use whatever IDE you want (IntelliJ, NetBeans) but I will step you through eclipse, which is free. You can read about how to get eclipse in my previous post. In eclipse, create a new Java project (file->new->Java Project) and give it a name (I called mine sphinx4). You’ll see that it made a src folder. Copy the lib folder from the sphinx4 download folder you just unzipped by pasting it into the root folder of the project. Also go into the demo folder and copy the wavfile folder and paste it into the src folder in your eclipse project.

There’s one more file you need. The jsapi.jar file is necessary, but it doesn’t show up anywhere. There is a legal issue about just downloading the jar file, so in the lib folder you’ll see the jsapi.exe file. Run that and the jaspi.jar file will magically appear in the same folder as the jsapi.exe file. In linux, run the jsapi.sh file and it should have the same result. If you can’t get it, Google for it and you should be able to find it. If all else fails, let me know and I’ll help you get it. It must be in your lib folder before we move on.

With the wavfile folder in the src folder and the lib folder under your project root (and with the jsapi.jar file in the lib folder),you can start to link in the jars that you will need to do some simple speech recognition. Expand the lib folder and you’ll see the following jar files in it:

js.jar
jsapi.jar
sphinx4.jar
TIDIGITS_.jar

Right-click on the sphinx4 jar->build path->add to build path. This adds (links) the jar to your build path allows the IDE to use code from the jar for your project. Do the same for all of the jars above.

When that is done, your folder structure should look something like this:

snapshot1.png

Notice that there are a few wav files, a .gram file, and a config.xml file. You’ll need to open the config.xml file (right-click on the file->open with->text editor otherwise it’ll open some xml editor that is hard to understand. Find the part of the file that looks like this:

<component name=”jsgfGrammar” type=”edu.cmu.sphinx.jsapi.JSGFGrammar”>
<property name=”dictionary” value=”dictionary”/>
<property name=”grammarLocation”
value=”resource:/demo.sphinx.wavfile.WavFile!/demo/sphinx/wavfile/”/>
<property name=”grammarName” value=”digits”/>
<property name=”logMath” value=”logMath”/>
</component>

It’s about half way into the file. You need to make some changes here. In stead it should look like this (you can paste this in or just remove the demo.sphinx from the first part and the /demo/sphinx from the second part of the middle line):

<component name=”jsgfGrammar” type=”edu.cmu.sphinx.jsapi.JSGFGrammar”>
<property name=”dictionary” value=”dictionary”/>
<property name=”grammarLocation”
value=”resource:/wavfile.WavFile!/wavfile/”/>
<property name=”grammarName” value=”digits”/>
<property name=”logMath” value=”logMath”/>
</component>

Save the file (ctrl-s). Now you’re ready to run sphinx and recognize some simple speech.

Go ahead and open up the wav file named 12345.wav and listen to it. Notice that the spoken words are just that: one two three four five. If all goes well, that’s what sphinx should recognize.

You can run the program by right-clicking on the WavFile.java file (located in the src/wavfile folder)->Run As->Java Application. This should run the recognizer. After a few seconds, you’ll see the text “one two three four five” show up in the Console portion of eclipse. If you got that far, nice work. You were able to perform some speech recognition with sphinx.

24 Comments

  1. ajmagnifico:

    Good work! I was able to get the WavFile working on Linux using NetBeans.

    At first, I ran into a NullPointerException, choking the program right there. This line:

    URL configURL = WavFile.class.getResource(”config.xml”);

    was returning null, because it couldn’t actually find the “config.xml” file in the directory my WavFile.class compiled class file was being run from. This may just be a NetBeans idiosyncrasy. I’m not sure why the .wav and other files ended up being placed in a folder different from the .class file.

    I figured out where NetBeans was placing everything after the compile, and I changed the package declaration at the top of the WavFile.java file to read:

    package wavfile;

    This placed the .class file in the same directory as the config.xml file, and voila! “one two three four five”

  2. admin:

    I’m glad to hear that things worked out. Something about the working path in NetBeans was probably the culprit. When you type WavFile.class it references from the folder where the class file is located, so the config.xml file should have been in there, but you knew that. It was probably putting class files into a compiled folder and not moving the config.xml because it wasn’t importing anything from it. I’m glad to hear you got it working.

  3. Mark:

    I got “one two three four five”.
    Thanks a lot!
    Right now, I want to integrate Asterisk, Festival, and Sphinx. Any suggestion?

  4. admin:

    Integrating those three things is no easy task! I know little about Festival and even less about Asterisk. I did download the latter. Neither look too hard to get running alone, but since they are different programming languages, the integration of the three will be tricky. Give me some time to get to know Asterisk and Festival and I’ll get back to you.

  5. Mark:

    You right!
    Special sphinx4+Asterisk.
    Now I am learning Perl.

  6. bakuzen » Blog Archive » MP3 Conversion in Sphinx4:

    […] recognizes the file and prints the results. Assuming you have the Sphinx4 jar file linked in (see this post if you don’t

  7. rizwan:

    hi,

    it was a nice tutorial, thanks for uploading.

    I just want to know that has any one of you had successfully integrated Asterisk with Sphinx 4..?

    Plz help me doing that..i would be very thankful to you..

    Regards,

  8. Kundan:

    I got this error.Can anyone help me to remove it.

    kundan@dev-desktop:~/wavefile$ java FileRecognizer 12345.wav
    file:/home/kundan/wavefile/12345.wav
    URL:file:/home/kundan/wavefile/config.xml
    Loading Recognizer…

    cm:edu.cmu.sphinx.util.props.ConfigurationManager@1
    Recognise: Recognizer: recognizer State: Deallocated
    Recognizer : Recognizer: recognizer State: Ready
    reader: null
    Exception in thread “main” java.lang.NoClassDefFoundError: com/jcraft/jogg/SyncState
    at org.tritonus.sampled.file.jorbis.JorbisAudioFileReader.getAudioFileFormat(JorbisAudioFileReader.java:73)
    at org.tritonus.share.sampled.file.TAudioFileReader.getAudioInputStream(TAudioFileReader.java:366)
    at org.tritonus.share.sampled.file.TAudioFileReader.getAudioInputStream(TAudioFileReader.java:283)
    at javax.sound.sampled.AudioSystem.getAudioInputStream(AudioSystem.java:1128)
    at FileRecognizer.main(FileRecognizer.java:101)
    Caused by: java.lang.ClassNotFoundException: com.jcraft.jogg.SyncState
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at sun.misc.Launcher$ExtClassLoader.findClass(Launcher.java:229)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:303)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:316)
    … 5 more

  9. Kundan:

    Mu code is:

    /*
    * Copyright 1999-2004 Carnegie Mellon University.
    * Portions Copyright 2004 Sun Microsystems, Inc.
    * Portions Copyright 2004 Mitsubishi Electric Research Laboratories.
    * All Rights Reserved. Use is subject to license terms.
    *
    * See the file “license.terms” for information on usage and
    * redistribution of this file, and for a DISCLAIMER OF ALL
    * WARRANTIES.
    *
    */

    import edu.cmu.sphinx.frontend.util.StreamDataSource;

    import edu.cmu.sphinx.recognizer.Recognizer;

    import edu.cmu.sphinx.result.Result;

    import edu.cmu.sphinx.util.props.ConfigurationManager;
    import edu.cmu.sphinx.util.props.PropertyException;

    import java.io.File;
    import java.io.IOException;
    import java.net.URL;

    import javax.sound.sampled.AudioFileFormat;
    import javax.sound.sampled.AudioFormat;
    import javax.sound.sampled.AudioInputStream;
    import javax.sound.sampled.AudioSystem;
    import javax.sound.sampled.UnsupportedAudioFileException;

    /**
    * A simple Sphinx-4 application that decodes a .WAV file containing
    * connnected-digits audio data. The audio format
    * itself should be PCM-linear, with the sample rate, bits per sample,
    * sign and endianness as specified in the config.xml file.
    * “${file_prompt}”
    *
    * Set up the default eclipse jre to be the one included
    *
    * Classpath lib order
    * JRE System Lib (custom 5.x version)
    * jl1.0.jar
    * tritonus.jar
    * tritonus_share.jar
    * tritonus_remaining.jar
    * tritonus_mp3.jar
    * sphinx4.jar
    * tools.jar
    * jsapi.jar
    * junit4.1.jar
    * javalayer.jar
    * corpora
    * sphinx4
    *
    */
    public class FileRecognizer {

    /**
    * Main method for running the WavFile demo.
    *
    *
    */
    public boolean convertedFile = false;

    public static void main(String[] args) {
    try {

    URL audioFileURL;

    if (args.length > 0) {
    audioFileURL = new File(args[0]).toURI().toURL();
    System.out.println(audioFileURL);
    } else {
    //if the ${file_prompt} isn’t in the program arguments, it’ll go with this:
    audioFileURL = FileRecognizer.class.getResource(”");
    }
    URL configURL = FileRecognizer.class.getResource(”config.xml”);
    System.out.println(”URL:”+configURL);

    System.out.println(”Loading Recognizer…\n”);

    ConfigurationManager cm = new ConfigurationManager(configURL);
    System.out.println(”cm:”+cm);
    Recognizer recognizer = (Recognizer) cm.lookup(”recognizer”);
    System.out.println(”Recognise: “+recognizer);
    /* allocate the resource necessary for the recognizer */
    recognizer.allocate();
    System.out.println(”Recognizer : “+recognizer);

    // System.out.println(”Decoding ” + audioFileURL.getFile());
    // System.out.println(AudioSystem.getAudioFileFormat(audioFileURL));

    StreamDataSource reader = (StreamDataSource) cm.lookup(”streamDataSource”);
    System.out.println(”reader: “+reader);

    AudioInputStream ais = AudioSystem.getAudioInputStream(audioFileURL);
    System.out.println(”ais: “+ais);
    FileRecognizer wavFile = new FileRecognizer();
    System.out.println(wavFile);
    // Convert it to the proper format
    AudioFormat targetFormat =

    new AudioFormat(16000f,
    16, // sample size in bits
    1, // mono
    true, // signed
    true);
    System.out.println(targetFormat);

    //new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 16000, 16, 1, 2, 16000, false);
    AudioInputStream convertedAis = wavFile.convertAudioInputStream(ais, targetFormat);
    File newFile = null;
    if (wavFile.convertedFile)
    {
    newFile = wavFile.writeConvertedFile(convertedAis, audioFileURL.toString());
    audioFileURL = newFile.toURI().toURL();
    ais = AudioSystem.getAudioInputStream(audioFileURL);
    }

    /* set the stream data source to read from the audio file */
    reader.setInputStream(ais, audioFileURL.getFile());

    /* decode the audio file */
    Result result = recognizer.recognize();

    /* print out the results */
    if (result != null) {
    System.out.println(”\nRESULT: ” +
    result.getBestFinalResultNoFiller() + “\n”);
    } else {
    System.out.println(”Result: null\n”);
    }

    if (newFile != null)
    newFile.delete();

    } catch (IOException e) {
    System.err.println(”Problem when loading WavFile: ” + e);
    e.printStackTrace();
    } catch (PropertyException e) {
    System.err.println(”Problem configuring WavFile: ” + e);
    e.printStackTrace();
    }
    // catch (InstantiationException e) {System.err.println(”Problem creating WavFile: ” + e); e.printStackTrace();}
    catch (UnsupportedAudioFileException e) {
    System.err.println(”Audio file format not supported: ” + e);
    e.printStackTrace();
    }
    }

    private AudioInputStream convertAudioInputStream(AudioInputStream sourceAis, AudioFormat targetFormat) {
    AudioFormat baseFormat = sourceAis.getFormat();
    AudioFormat intermediateFormat;
    AudioInputStream convertedAis = sourceAis;

    // First convert the encoding, if necessary
    if (!baseFormat.getEncoding().equals(targetFormat.getEncoding())) {
    intermediateFormat = new AudioFormat(
    targetFormat.getEncoding(),
    baseFormat.getSampleRate(), baseFormat.getSampleSizeInBits(), baseFormat.getChannels(),
    baseFormat.getChannels() * 2, baseFormat.getSampleRate(),
    false);
    convertedAis = AudioSystem.getAudioInputStream(intermediateFormat, sourceAis);
    //this.writeConvertedFile(convertedAis, “C:\\encoding.wav”);
    baseFormat = intermediateFormat;
    sourceAis = convertedAis;
    convertedFile = true;
    }

    // Then convert the sample rate
    if (baseFormat.getSampleRate() != targetFormat.getSampleRate()) {
    intermediateFormat = new AudioFormat(
    baseFormat.getEncoding(),
    targetFormat.getSampleRate(), baseFormat.getSampleSizeInBits(), baseFormat.getChannels(),
    baseFormat.getChannels() * 2, targetFormat.getSampleRate(),
    false);
    convertedAis = AudioSystem.getAudioInputStream(intermediateFormat, sourceAis);
    //this.writeConvertedFile(convertedAis, “C:\\sample.wav”);
    baseFormat = intermediateFormat;
    sourceAis = convertedAis;
    convertedFile = true;
    }

    // Then convert the number of channels
    if (baseFormat.getChannels() > targetFormat.getChannels()) {
    intermediateFormat = new AudioFormat(
    baseFormat.getEncoding(),
    baseFormat.getSampleRate(), baseFormat.getSampleSizeInBits(), targetFormat.getChannels(),
    targetFormat.getChannels() * 2, baseFormat.getSampleRate(),
    false);
    convertedAis = AudioSystem.getAudioInputStream(intermediateFormat, sourceAis);
    //this.writeConvertedFile(convertedAis, “C:\\channels.wav”);
    baseFormat = intermediateFormat;
    sourceAis = convertedAis;
    convertedFile = true;
    }
    return convertedAis;
    }

    private File writeConvertedFile(AudioInputStream sourceAis, String fileName)
    {
    File tempfile = null;
    fileName = “tempwavfile.wav”;
    //fileName = fileName.substring(6, fileName.length()-4) + “_new.wav”;

    try
    {
    //This just takes an audio stream, writes it to disk, then plays it the way TALL usually does.
    //it’s a test to see if the input stream is readable by the Java audio providers like Tritonus
    //System.out.println(fileName);
    tempfile = new File(fileName);
    AudioSystem.write(sourceAis, AudioFileFormat.Type.WAVE, tempfile);
    }
    catch (Exception e)
    {
    System.out.println(e);
    }
    return tempfile;
    }

    }

  10. admin:

    The exception looks like an audio provider problem, like your ogg vorbis audio provider jar isn’t in your classpath. Add that (link to the site in the MP3s in Sphinx post) and you should be good to go.

  11. pradeep:

    i tried configuring it. but not working.

    it gives a error like this

    java.lang.ExceptionInInitializerError
    Caused by: java.lang.RuntimeException: Uncompilable source code - package edu.cmu.sphinx.frontend.util does not exist
    at wavfile.WavFile.(WavFile.java:15)
    Could not find the main class: wavfile.WavFile. Program will exit.
    Exception in thread “main” Java Result: 1
    BUILD SUCCESSFUL (total time: 0 seconds)

    pls help me if you can

  12. pradeep:

    now my error is in

    Exception in thread “main” java.lang.Error: Unresolved compilation problem:

    at wavfile.WavFile.main(WavFile.java:28)

    can someone help me out

  13. pradeep:

    thanks.. its working after one day of configuration..

    i want to know how to develop it to the words. only numbers are recognising. and like 4 out of 20 numbers are only correct also.

  14. sagar:

    hey i did as you said above but i m havin a problem…the jsapi.exe file runs in eclipse bt the jsapi.jar file does not appear….if i copy the jar file from lib (by extracting the jar file there in the lib folder and then copying it into eclipse) it does not build path….plz can you help me out with this….i know very little about this….

  15. admin:

    First, you just run the jsapi.exe file to extract the jar. Then you put the jar in a folder somewhere in your eclipse project. Then in eclipse, you may need to right-click on the folder you put it in and hit “refresh”. Then go to project->properties->build path->and add the jar.

  16. sagar:

    is there any changes that we have to make in wavefile.java….its showing me an error there

    Descriptio The declared package “demo.sphinx.wavfile” does not match the expected package
    “wavfile”
    Resource WavFile.java
    Path /speechreg/src/wavfile
    Location line 13
    Type Java Problem

  17. admin:

    This just means that your package layout seems to be different from what it should be. It looks like you could remove the demo.sphinx and leave wavfile and it might work.

  18. sagar:

    thx for yr help…as you said i hd tried the same thing removing the demo.sphinx and keeping wavfile as it is and still it hasnt worked…

  19. Virendra:

    I have done the changes in the configuration as above but still after running I am getting this exception please help me out..

    Exception in thread “main” java.lang.RuntimeException: java.io.FileNotFoundException: D:\sphinx4-1.0beta3-src\speech\config.xml (The system cannot find the file specified)
    at edu.cmu.sphinx.util.props.ConfigurationManager.(ConfigurationManager.java:61)
    at edu.cmu.sphinx.demo.wavfile.WavFile.main(WavFile.java:40)

  20. wolfgar:

    Nice job, but after doing all steps Im getting this error message:
    Exception in thread “main” Property Exception component:’grammarLocation’ property:’grammarLocation’ - Can’t locate resource:/wavfile.WavFile
    edu.cmu.sphinx.util.props.InternalConfigurationException: java.lang.ClassNotFoundException: wavfile.WavFile
    …..
    Im newbie to eclipse and sphinx so it is possible Im missing something, Ive checked file locations and everything seems to be in right place.
    Thx for any advice!

  21. naxo:

    pradeep:

    now my error is in:
    Exception in thread “main” java.lang.Error: Unresolved compilation problem:
    at wavfile.WavFile.main(WavFile.java:28)

    Please, help me.. how you fix your problem??

  22. naxo:

    I’m sorry.. already fixed my problem.. into conf.xml had a location wrong

    resource:/wavefile.WavFile!/wavfile/” // :(
    resource:/wavfile.WavFile!/wavfile/” // ok

  23. pradeep:

    @naxo

    is it working now. it should work. but have to learn how to make acoustic model. otherwise the accuracy is pretty low

  24. Blitzkrieg:

    Hi, Anybody found how to integrate sphinx-4 with asterisk ?

    Thanks!!!!

Leave a comment

You must be logged in to post a comment.