Note

I’ve been getting a lot of comments, questions, and emails lately which is great, but this week is finals week for me in my master’s program. Therefore, I’m quite busy. I’ll have time to look at some of your comments in a week or two.

Python in Eclipse

There are many IDE’s available for any language, and Python has plenty to choose from. However, I’ve been programming in eclipse for Java, JSP/Servlets, Flex, and PHP for a while now and found it to be a solid IDE for at least those languages. I’ve found it to be quite good for Python, as well. If you already have something you like, then use that, of course. But for those of you computational linguists out there who are new to programming, but realize that you had better get with the computational side of things, including programming, this is where you can start.

First, download eclipse. You can either search in Google for “eclipse” and choose what you want, or you can click here. I would recommend the eclipse for PHP developers. WHY?!? Because it has built-in web tools that might come in handy later if you really get into Python. So, look to the right of that option and choose your operating system. Clicking on that link should take you to the download site which will choose the closest download mirror for you. It’s about 138 mb to download.

Once you’ve downloaded it, unzip it. If all is well, you should be able to just run it. In windows, just double-click on the eclipse file inside your eclipse folder. In linux, you may have to open a terminal, navtigate to the folder, and type: ./eclipse

If that worked, you should see the eclipse Galileo splash screen. If it doesn’t load up, the problem could be anything. One problem might be that you downloaded the wrong one for your OS. The other problem might be your java version (eclipse uses java to run). I don’t know what version it was built with, but my eclipse works fine and I am using java 1.6.16. You can check your java version in Linux (or windows) by opening a terminal (command window) and typing: java -version
If you type that and it says it can’t find java, then you don’t have it installed. So, install it. I won’t go over that here. If someone needs help with that, shoot me an email or leave a comment.

Now that your eclipse is open, you’re halfway there. Now, click on Help->Install New Software->Add. Type in “Python” in the name area and:
http://pydev.org/updates/
in the Location box and click OK. Now, you’ll see a drop-box to the left of the Add button. Click on that and find Python. Now, select the box next to “Pydev” and click Next. You may be taken to a place where you choose your mirror site (just choose any one), and you’ll need to read some licensing agreement. It will take a few minutes to download and it will ask if you want to restart eclipse when it is done. Yes, do restart eclipse.

Now, you have the ability to program in Python, but we’re not quite there yet. With eclipse open, click on File->New->Other. Scroll to Pydev, and expand the tree. Then choose Pydev Project and click Next. Type in the name of your project, anything will do. This will create a project folder for your code under the name you give. Notice that you can’t go on yet because you don’t have an interpreter. Click on the link to configure the interpreter. Now, click on Auto Config and then OK. You’ll see it spend some time looking through your computer for libraries. This means you won’t have to set your PYTHON_PATH variable, eclipse takes care of that (assuming you already installed the nltk).

When it’s done, click Next until it creates your project. You’ll see your project folder on the left. Now, expand your project folder and find the src folder. Right-click on that src folder, and select New->Pydev Module. Skip the package name and just put in a name for the file (eg, “test”). You’ll notice that it can fill in the template stuff for classes, etc., but you can just choose “none” and click OK. You should now see a new file.

Now let’s test it. First, type:
print “Hello World”

And press the green play button on top. It will ask you how you want to run the file. Just scroll down and choose “Python Run” and you can set it to autosave the file when you click run so you don’t have to. Then click OK and it should say “Hello World” in the output below.

To see if the nltk works, type:
import nltk
nltk.probability.demo()

And run it again. It should show some probabilities in the output. Now you know that you can use the nltk. You can make as many files as you’d like, classes, access those classes easily with eclipse, etc. Happy programming. Feel free to ask questions of any kind.

Python NLTK

The Python Natural Language Tool Kit has a lot of stuff to offer the DIY NLPer. It has a parser, POS tagger, lambda calculus, a chunker, a classifier, a tokenizer, even a WordNet interface, and much, much…..much more. It’s loaded and it’s not terribly difficult to use granted you know some python and at least a little bit about NLP.

First, you need python, the programming language in which that the tool kit is developed. Most distros of Linux will have python installed, but if you don’t have it you can go to www.python.org to get it and download it. If you don’t know how to do that, you’re hard-pressed to know how to use the tool kit anyway. So, spend some time learning python before you go crazy with the tool kit.

If you’re beyond that and you’re ready for the tool kit, you can go to www.nltk.org and download it. I tried a few different things and ended up just getting the zip file and downloading that, extracting it, then go into the directory with your console and type (as root or sudo):

python setup.py install

and you’re almost done. Run python by just typing:

python

and you’ll see the python command-line interpreter interface. Type:

include nltk
nltk.probability.demo()

and you should see some output with some frequency distributions. There are more tutorials on how to use the tools individually on the www.nltk.org website.

One more thing. You might want to include some of the other optional packages, like numpy. Go back to the same download site as the nltk and grab what you want.  Open a console and get to where you downloaded the file,then run:

tar -xvf numpy[ver]

Then go into your numpy directory and run (again, as root/sudo):

python setup.py install

It will take some time because it is also compiling a lot of c code. Best of luck. I’ll post more as I learn more about it.

KWIC Dictionary

There are many different versions of (Key-Word in Context” (KWIC) dictionaries out there, but for the most part they simply take a search string and lines up the search string in context as found in a given corpus. For example, I searched for “beef” in the Online BLC KWIC Concordance Dictionary and got the following results:

     1  remises of our supplying less expensive beef and management know-how in running the
     2  At the time, I proposed to supply dried beef continuously, but you made a counter of
     3  antage of your supplying less expensive beef is considered, and we most reluctantly
     4  cities in Japan; and 2) that you supply beef to the restaurant chain.
     5  lp sell chicken and pork in addition to beef.
     6  0 tons of Nebraska USDA choice corn-fed beef.

Some KWIC dictionaries are rich in features. For example, you can change the justification of the search string, add more words returned with each found result, etc. Is it useful? That depends on what you need to do. For me, I’ve been studying German and haven’t been able to see certain things in context. For example, I want to understand the difference, in context, between those prepositions that can take the accusitive or dative case depending on what you’re trying to say. I want to see examples, but can’t find any real German KWIC online (that responds faster than 10 minutes). So, I wrote a DIY KWIC.

What it requires: A URL and a search string. The search string you understand. The URL is going to be the corpus. When you click “Go” it will actually go to the URL, parse out the HTML and find links. It will dig several links and get the text from those sites, as well. It will then generate a single corpus of multi-page data will be your corpus that will then be searched through, looking for your search string. Just open up the link to the right called “KWIC” and just click “Go” with what’s there and see what happens. It’s not feature rich, nor is it pretty, but it gets the job done. I’m happy for feedback. Now, when I want to study German, I just throw in a Wikipedia article in German and search for a string.

Disclaimer: This is only to be used for personal purposes. It parses any website given, so you are responsible for the URL you search. You are not able to copy text, so it is read-only. The primary purpose is to help you with language study.

Acoustic Model Creation using SphinxTrain

Before you look at this, you can peruse the official SphinxTrain documentation at the CMU website. It’s not for the faint-hearted, but if you’re a programmer and know how to get around Linux, then use it instead. Even if you’re interested in each step of how to do this, you may want to consider a much easier way….

Getting the programs

First, you need sphinx3 (I had to go back a few versions, or else it wouldn’t work). and SphinxTrain (This is the nightly build location; there isn’t an official release). Again, I assume you’re using Linux as root. Once downloaded, un-tar them:

>tar -xvf sphinx3…
>tar -xvf SphinxTrain-

(Another assumption….you have gcc and g++ (the c and c++ compilers) installed on your machine.) After they expand into their respective folders, go into the sphinx3 folder and run the config script:
>./configure
If there are no errors, it should have made a make file. Make sure you’re root and run the command:
>make
This will take a while. You will also need to run
>make install

Now move into the SphinxTrain folder and perform the following steps:
>./configure
>make

No need to run “make install” for SphinxTrain

Creating your project/task folder

Okay, now you need to make a project folder. For example’s sake, I’ll call our project myam (>mkdir myam) and it needs to be in the same folder that SphinxTrain and sphinx3 are in.  Then naviagte into myam and run this command:
>../SphinxTrain/scripts_pl/setup_SphinxTrain.pl –task myam

Notice that myam is the name of the task and is also the name of your folder. It doesn’t have to be, but it makes things easier later.

Collecting your data

Put all of these files, unless otherwise specified, into your myam/etc folder.

First, you need the audio files that you want to use as your model of speech. I happened to have about 160 wav files, each of them is a single-sentence utterance. For example, if you listened to the first one, it might say “a player threw the ball to me” and that is all. Therefore, you need a bunch of single-sentence audio files, preferably in wav or raw format. Put all of your audio files into the myasm/wav folder

Next, you need a control file. It’s just a text file. Name it myam_train.fileids (you MUST name it [name]_train.fileids where [name] is the name of your taks, if you’re not using myam) that has the name of each of your audio files (note that there are no file extensions).     0001
0002
0003
0004

Next, you need a transcription file that has the transcript of everything uttered where each line has a single file’s utterance on it. It MUST correspond to your control file your control file. For example, if I look at my control file, it says 0001 on the first line, therefore the transcript for the first line of my corpus file will be “A player threw the ball to me” because that’s the transcript of 0001.wav. The corpus file, another text file named myam.corpus, should have as many lines as your control file. Remove any punctuation. For exmaple:

a player threw the ball to me
does he like to swim out to sea
how many fish are in the water
you are a good kind of person

Corresponds to my 0001, 0002, 0003, and 0004 files in that order.

What if I don’t have any transcripts of my audio files? Well, you’ll have to get some. NLP has to start somewhere, which means some people have to deal with manually creating data to train from. There is also a vast amount of data on the Internet where you can find audio/transcript bundles, some at the LDC (but that requires a membership).

Anyway, you now have your folder of audio files, a control file, and a transcription file. You still have a long ways to go. You still need a main dictionary which includes each word and the phonemes that make up the word, a filler dictionary, and a phone list. Lucky for you, CMU has an online tool that does the dictionary part for free: http://www.speech.cs.cmu.edu/tools/lmtool-adv.html.

This website asks for several files, but you really only need one and that’s transcript file (myam.corpus). Browse for your transcription file under the “Sentence corpus file:” field.  Then click “Compile Knowledge Base” and wait a few seconds for the results. Download the sentence file and call it myam_train.transcription (notice that this file differs from your corpus file only in that it has start and end stentence tags <s> and </s> and everything is upper-case). Download the dictionary file and call it myam.dic. Download the LM file and call it myam.lm. You’ll only need the first two for SphinxTrain, but the LM file is handy to have for other things. Put all files in your myam/etc file.

You will next need a filler dictionary. You can get specific here with different filler sounds, but we’ll just put together a base one. Make a file called myam.filler and paste this into it:

<s>     SIL
<sil>   SIL
</s>    SIL

That leaves one last file, your phone file. This tells the trainer what phonemes that are part of your training set. You should only have the phonemes you need, no more, no less. How do you find the phonemes you need? Open up your myam.dic file. You’ll see words and then you’ll see a breakdown of how those words are pronounced. For example, in my dic file I have:

ACTING    AE K T IH NG

The AE, K, T, IH, NG are all phonemes that make up the word acting. You’ll need a list of all the phonemes used without duplicates. You can either just follow the next steps and home the errors tell you which phonemes are missing, or you can go to the page I made that will extract the phonemes for you:

http://bakuzen.com/extractphoneme.php

Be gentle. I just threw it together as I put together this post. It takes the dictionary file (myam.dic) that was generated by the CMU site and displays all the unique phonemes. The only problem is…they aren’t all completely unique. You may have to go through and take out duplicates. I don’t know why, but some of them aren’t counted as unique in the php unique_array function. Anyway, copy the phoneme list into a file called myam.phone.

That’s it for file collecting. A recap:

  1. All wav audio files into the myam/wav folder
  2. The rest will be in the myam/etc folder
    1. myam.dic
    2. myam.filler
    3. myam.phone
    4. myam_train.fileids
    5. myam_train.transcription
    6. feat.params
    7. sphinx_train.cfg

NOTE!!! Double check the following…..

  1. your .dic,.filler, .phones, and .transcription file have everything capitalized. If not, you can capitalize everything with Kate in Linux or PSPad in Windows (or a similar program)
  2. you have an empty line at the bottom of each file
  3. you have the same number of lines in the .transcription file as you do in the .fileids file
  4. make sure your .phone file has no duplicate entries

You have some configuring to do now. Open up myam/etc/sphinx_train.cfg with an editor (>kate myam/etc/sphinx_train.cfg). It looks like a fairly daunting file, but there won’t be much you have to change here. First, notice that $CFG_DB_NAME = “myam” or whatever you set your task name to be. Many other properties in this file hinge around that name. That’s why we named the .dic, .phone, and other files the way we did. Also notice the $CFG_BASE_DIR is set to the directory where your task folder exists. If you ever moved the folder, you’ll need to  change this path. The next property, $CFG_SPHINXTRAIN_DIR, is set to the relative path where your SphinxTrain folder is, just in case it needs something from it.

Now, on to editing a few things. First, you’ll want to find the line that has the property: $CFG_WAVFILE_EXTENSION in it. To the right of the = sign is the file extension of your audio files. This is appended onto each of the filenames in your myam.fileids file. I set mine to ‘wav’ and you need to be sure the single quotes are there, too. I also had to set the $CFG_WAVFILE_TYPE = ‘mswav’ since my wav files were created in Windows (by someone else). I forgot to set this at first, and it never gives an error; the training sort of just hangs and doesn’t do anything. Save your changes, and close the editor.

Creating the model

NOW you get to create the acoustic model. First, navigate to your myam folder and then run this command:

>./scripts.pl/make_feats -ctl etc/voxforge_it_train.fileids

This creates feature files from the wav files and stores them in the myam/feat directory. It should move through the files fairly quickly. It went through my 160 files, each averaging about 10 words, in a few seconds.

If there were no errors, you can move onto the last part. Run this command from the myam folder:

>./scripts.pl/RunAll.pl

This is where the magic happens, and it could (really should) take several minutes depending on how much data you have. It first goes through and makes sure the data you have are usable, and then it actually goes through the different phases of the acoustic model training. It logs errors to the myam/logdir folder, and it creates an easier-to-read html error log in your myam folder, named myam.html (or the name of your task+.html). The bottom of the file has your latest log information.You will probably have several errors and warnings, but if there was no “fatal error” then your training should be complete.

Making the model usable for Sphinx4

I really just copied an acoustic model jar file, like the WSJ one, renamed it to zip, and created a similar file structure. CMU has a tutorial that is very helpful in putting together your sphinx4 acoustic mode, and I refer you to that for further help. Once you get your folder structure created (just to test out, make it the same as CMU’s structure), you’ll need this file structure:

cd_continuous_8gau/means
cd_continuous_8gau/mixture_weights
cd_continuous_8gau/variances
cd_continuous_8gau/transition_matrices
dict/cmudict.0.6d
dict/fillerdict
etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef
etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.ci.mdef

There are several files in cd_continuous_8ga, etc, and dict. The files that belong in the cd_continuous_8gau folder can be found in your myam/model_parameters/eimodel.ci_cont/ folder (the names correspond). The dic folder wants any dictionary files. Add your myam.dic and your myam.filler dictionaries to it. The etc directory uses two files found in the myam/model_architecture/ directory. The mdef file will be your myam.alltriphones.mdef file, and the ci.mdef file will be your myam.ci.mdef file. Copy each file into the correct folders.

Now you need to create a file in the directory that holds the etc, dict, and cd_continuous_8ga folders. The file needs to be named model.props and you need to add these properties to it:

description = any description of your model file
modelClass = edu.cmu.sphinx.model.acoustic.EI_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
modelLoader = edu.cmu.sphinx.model.acoustic.EI_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader

isBinary = true
featureType = 1s_c_d_dd
vectorLength = 39
sparseForm = false

numberFftPoints = 512
numberFilters = 40
gaussians = 8
minimumFrequency = 130
maximumFrequency = 6800
sampleRate = 16000

dataLocation = cd_continuous_8gau
modelDefinition = etc/myam.ci.mdef

Save the file and close the editor. You will also need the  Model.class, ModelLoader.class, and the PropertiesDumper.class. You can either get those from an exisiting jar, or go to the site I referred you to on how to create it correctly.

Now navtigate into the etc folder where your mdef files are. Create a file called variables.def and put this info into it:

set exptname = myam
set vector_length = 13
set dictionary = $base_dir/lists/myam.dic
set fillerdict = $base_dir/lists/myam.filler
set statesperhmm = 3
set skipstate = no
set gaussiansperstate = 8
set feature = 1s_c_d_dd
set n_tied_states = 4000
set agc = none
set cmn = current
set varnorm = no

Notice the myam.dic and myam.filler. Be sure to use the link I provided for more information. Save the file and close.

Now, if you want to do it the easy way, go back to your first folder (if you followed the CMU way, the foler “edu” and create a zip file out of it. Rename the zip file to a jar file exension and you now have an acoustic model. The rest is linking it into Sphinx4 via eclipse and setting up the information in your config.xml file. There will be three places do to that, the dictionary, the loader, and the acoustic model definitions. Refer to my original post on sphinx4 on how do deal with the config file.  I had to play with it for a while before I got everything to work correctly, but it was thrilling to see my own first acoustic model to work in Sphinx4.

If you have any trouble, feel free to leave a comment with your question and I’ll see what I can’t help you through. There is also a great site called http://voxforge.org/ that is a go-to site for DIY NLP people out there. A site like that may make my site obsolete one way, but I’ll still be around for those of you who aren’t as programming savvy as those folks typically are. It’s an excellent site and I encourage you to look there for data you can use in acoustic model creation, help on problems you run into, and also to contribute by adding data you have, giving insights on aspects of NLP, or helping people by answering questions they may have.

LiveMocha

I haven’t posted for a while because I’ve been applying for graduate school. That’s all I have to say about that.

But that’s not all. I’ve also been studying German. In a late issue of PC Magazine, it listed “100 useful websites” and among it was a website for people who wish to learn foreign languages. It’s called…

http://www.livemocha.com

It’s fairly new, but has been around long enough to get a very diverse userbase. Here’s how it works.

Livemocha is a site that couples social networking with language lessons. You can go to the site and exclusively focus on the lessons they offer or you can go to the site just to chat and ask language questions to other people who know your target language. When you sign up, it asks you what languages you are learning and what languages you know. It then suggests friends to add to your friends list that either know the language you’re trying to learn or people trying to learn a language that you know. You can add people to your friends list and either send them messages like email or chat with them online (audio or typed). But that’s not all. As you take the Livemocha language lessons, you will be confronted with 40 flashcards, drilled on those flashcards (it will test your listening, reading, and fill-in-the-sentence abilities), and then you take what you learn and practice typing in your target language. You can then send that which you typed to friends on your list who can comment on how well you did or how to improve. You are also recorded via microphone which is also send to friends of your choice who coach you on pronunciation.

For example…

I am a native English speaker and I also speak advanced Japanese. I have people on my friends list who are learning English or Japanese who send me their submissions and ask me questions about the two languages that I help answer.

A typical session for  me goes like this…. I log into Livemoha.com and continue my German lesson. It takes me through the 40 flash cards, then drills me, then I am asked to write about something specific (eg, “Based on what you learned in this lesson, write about what you did over the weekend. Describe 3 events that took place and who was with you,” etc). Then I spend some time forming German sentences to answer the question of what I did over the weekend making sure I include everything it asks. When I submit, it asks me which friends I want to notify so they can rate what I wrote. Then I am taken to a speaking activity where I read already written German text into a microphone and then choose which friends I want to notify about that so they can comment. That process takes about 30 minutes. Then, in the next day or two, I get emails telling me that people have commented on my writing and speaking items. I can go look at their comments anytime. Then, when I have time, I can catch certain freinds online and practice my German by chatting with them.

Sounds easy? It is. Did I mention it’s….free.

What’s the catch? There are advertisements, but they aren’t too bad. I click on ones that interest me sometimes in order to help keep the website going. The other catch is that if you receive help, you should also offer help to others. When you comment on other people’s submissions, be specific and helpful about what they need to do to improve. I am lucky enough to have a very nice German woman who I send all of my submissions to and she is nitpicky about how I pronounce things. She sends me to websites to learn about certain German grammar that I struggle with. It’s very effective, and very affordable. Good luck, and happy language learning.

PS- It’s still in Beta form, so be patient as they add it is a continual work in progress.

MP3 Conversion in Sphinx4

It’s actually not that specific; this is how you can do MP3 conversion in any Java program. I’m just going to show you how to do it in the context of Sphinx4. The tool that does the job is called Tritonus a series of Java jars that can encode from several different formats into several different formats.

Why bother? Because Sphinx4 requires 16-bit mono files in wav format. It can’t recognize anything else. Sometimes all you get to work with are MP3s and that means you have to convert them. There are a lot of programs out that (free ones, even) that you can use to convert from various formats to various formats, but Sphinx4 is a little bit pickier than just an easy conversion. For example, Audacity is a common program you would use to manipulate or convert an audio file, but for some reason the output is difficult to read by Sphinx4 even though it sounds just fine through any audio player. Trinonus does it right and…it can do it on the fly. It can detect the format of an audio file and, if all is set up properly, can convert to the coveted 16-bit format needed by Sphinx. With Tritonus there’s no need to convert everything beforehand.That’s why.

Now for the how. If you’ve followed my posts by now you should have been able to recognize a simple file, understand how the config file works, and change some things around to meet your needs. It’s actually not too difficult. All you need is a file in the proper format to recognize, a config file, and a Java program to utilize the Sphinx4 jar. Assuming you have the former 2, the Java file can be a pretty base file that looks something like this (note that the blog software I use automatically left-justified everything, sorry for the formatting!):

public class FileRecognizer {
public static void main(String[] args) {
try {
URL audioFileURL = FileRecognizer.class.getResource(”insert path to audio file here”);
URL configURL = FileRecognizer.class.getResource(”config.xml”);
ConfigurationManager cm = new ConfigurationManager(configURL);
Recognizer recognizer = (Recognizer) cm.lookup(”recognizer”);
/* allocate the resource necessary for the recognizer */
recognizer.allocate();
StreamDataSource reader = (StreamDataSource) cm.lookup(”streamDataSource”);
AudioInputStream ais  = AudioSystem.getAudioInputStream(audioFileURL);
reader.setInputStream(ais, audioFileURL.getFile());
Result result = recognizer.recognize();
if (result != null) {
System.out.println(”\nRESULT: ” +
result.getBestFinalResultNoFiller() + “\n”);
}
catch (Exception e) {
e.printStackTrace();
}
}

At a glance, this Java class file has the main method, a line that gets the path of the file you want to recognize, reads information from the config file, then recognizes the file and prints the results. Assuming you have the Sphinx4 jar file linked in (see this post if you don’t  know what I’m talking about) this should compile and work just fine.

Now the trick is to be able to read in an audio file and have it be converted to the proper format on the fly. The steps to be able to do this are similar to using Sphinx4. You have to link in the proper jars (and in the proper order) and know how to import and utilize them.

First, you need to go to the trinonus plugins website and download a few things.

  1. tritonus.jar
  2. tritonus_share.jar
  3. tritonus_remaining.jar
  4. tritonus_mp3.jar

Note that these jars probably have version numbers attached to them (for example, tritonus_mp3 might be tritonus_mp3-0.3.6.jar) which is fine. Download the latest of each jar into (or move them into) the lib folder you created inside your sphinx4 directory. If you don’t know what I’m talking about, then refer back to this post to see what I’m talking about). Now, in order to link them in using eclipse, you want to go to Project->Properties->Java Build Path->Libaries->Add Jars (add all 4 of them) then go to the Order and Export tab and move the order of them to the order shown above (tritonus.jar, share, remaining, mp3).

You’re not out of the woods yet. You still need some more jars to get the job done. You may need to specially link in tools.jar which can be found in your java jre/lib folder. For some reason I had to do that to get it to work properly. The other jar you’ll need is called javalayer.jar. You can run a Google search. It’s a javazoom project and for some reason is necessary for Sphinx4 to work with tritonus to do anything with mp3s. Download it, get it into the lib folder, and link it in like you did before.

That should do it for getting the necessary plugins- now you need to utilize them with the code. It’s not too tough. First, import the right stuff:

import javax.sound.sampled.AudioFileFormat;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.UnsupportedAudioFileException;

Then, there are some steps that you’ll need to follow. After you get the AudioInputStream for the file you want to recognize, you’ll need to tell it what to change it into and then actually do the change:

AudioFormat targetFormat =
new AudioFormat(16000f,
16,    // sample size in bits
1,     // mono
true,  // signed
true);

AudioInputStream convertedAis = wavFile.convertAudioInputStream(ais, targetFormat);

The last line is a call to a function that will run through a few things and then return the converted AudioInputStream. If all goes well, that’s what you’ll be sending through to the recognizer. This is the function convertAudioInputStream (NOTE!!! I didn’t make this function. I was kindly shown this by Robbie Haertel so he gets credit for all of this):

private AudioInputStream convertAudioInputStream(AudioInputStream sourceAis, AudioFormat targetFormat) {
AudioFormat baseFormat = sourceAis.getFormat();
AudioFormat intermediateFormat;
AudioInputStream convertedAis = sourceAis;

// First convert the encoding, if necessary
if (!baseFormat.getEncoding().equals(targetFormat.getEncoding())) {
intermediateFormat = new AudioFormat(
targetFormat.getEncoding(),
baseFormat.getSampleRate(), baseFormat.getSampleSizeInBits(), baseFormat.getChannels(),
baseFormat.getChannels() * 2, baseFormat.getSampleRate(),
false);
convertedAis = AudioSystem.getAudioInputStream(intermediateFormat, sourceAis);
//this.writeConvertedFile(convertedAis, “C:\\encoding.wav”);
baseFormat = intermediateFormat;
sourceAis = convertedAis;
convertedFile = true;
}

// Then convert the sample rate
if (baseFormat.getSampleRate() != targetFormat.getSampleRate()) {
intermediateFormat = new AudioFormat(
baseFormat.getEncoding(),
targetFormat.getSampleRate(), baseFormat.getSampleSizeInBits(), baseFormat.getChannels(),
baseFormat.getChannels() * 2, targetFormat.getSampleRate(),
false);
convertedAis = AudioSystem.getAudioInputStream(intermediateFormat, sourceAis);
//this.writeConvertedFile(convertedAis, “C:\\sample.wav”);
baseFormat = intermediateFormat;
sourceAis = convertedAis;
convertedFile = true;
}

// Then convert the number of channels
if (baseFormat.getChannels() > targetFormat.getChannels()) {
intermediateFormat = new AudioFormat(
baseFormat.getEncoding(),
baseFormat.getSampleRate(), baseFormat.getSampleSizeInBits(), targetFormat.getChannels(),
targetFormat.getChannels() * 2, baseFormat.getSampleRate(),
false);
convertedAis = AudioSystem.getAudioInputStream(intermediateFormat, sourceAis);
//this.writeConvertedFile(convertedAis, “C:\\channels.wav”);
baseFormat = intermediateFormat;
sourceAis = convertedAis;
convertedFile = true;
}
return convertedAis;
}

Of course, on the reader.setInputStream call you send it the convertedAis in stead of the original ais. That should do the trick.

If only it were that easy.

I found that this didn’t work sometimes for some reasons. It turns out that working with audio providers (like tritonus) things can get in the way and audio streaming becomes a difficult thing to do. For those of you who had the same problem, I found a work-around hack that has never failed me. After I create the new AudioInputStream aptly known as convertedAis, I write it out to disk as a new audio file (in this case, it’ll be the converted wav file). Then I read it in from scratch as if the conversion never happened. As soon as I recognize the file, I delete it so no one is the wiser. However, this can be useful because you can now use Sphinx4 to not only convert your MP3 files and recognize them on the fly, but you can write the converted wav files to disk (of course, giving them good names) and using them later for….whatever you want. You can write your own program to convert whole folders of files.

You can now see how adding in other jars from tritonus will give you the option of converting files from and into different formats. Have fun. I attached my final version of the FileRecognizer.java file complete with conversion, writing the file out to disk, recognizing it, then deleting it.

LREC

The International Language Resource and Evaluation Conference took place at the end of May this year in Marrakesh, Morocco. I was able to go for the same research we did on second language proficiency testing. We presented a poster in one of the poster sessions and had a lot of interested people ask many questions.

There was a big difference between the conference goers here and the ones at CALICO. The CALICO conference sported mostly educators looking for ways to improve language teaching in the classroom where LREC focused more on natural language processing. There would be more software engineers and linguists rather than educators. There were talks in the range from very in-depth statistical theory to corpora. I mostly sat in on what people were doing with machine translation or the Japanese language.

Now a word on corpora. For some naive reason, I thought that we had a pretty good amount of corpora for most purposes, like POS tagging, word chunking, parsing, etc. But, from this conference, I found that many organizations are working on new corpora all the time. There are general corpora like the Wall Street Journal spoken English to more specific corpora like the utterances of drunk people. Corpora is huge in NLP whether it’s statistical NLP or otherwise. The big corpora repositories are the LDC in the United States and ELRA in Europe. There are a few in Asia, as well. The problem is most useful corpora isn’t freely available. You can either 1. contribute or 2. pay for membership to get corpora. They will give corpora for free, but not typically to a hobbyist individual. They like to let universities use the data and they like to know why. That doesn’t mean the individual can’t have fun, he/she just has to be more creative.

Big companies like Microsoft presented some things at the conference, as well. Companies use NLP more and more these days even if they aren’t a specific NLP company like, say, Nuance. Microsoft can use NLP in MS Word. I worked for a company where we worked on developing a way to make a part of speech tagger to automatically tag new dialogs so someone wouldn’t have to go in and do it by hand- something that didn’t necessarily affect the end user. Cell phone companies, car companies, and many different software companies are using NLP more and more. This conference may not have the bleeding edge of NLP technology of our time, but it is a great conference for seeing what’s going on in the field and possibly finding a job doing NLP.

Word-Level Forced Alignment in Sphinx4

If you’re not sure what forced alignment is, I posted previously on what it is and how do do it in sphinx 2 here. I’ve been working feverishly to find a way to do a phoneme-level alignment like sphinx2 can do, but I haven’t been able to without spending many hours deep in the code. Maybe our friends at CMU will make that available to us sometime in the future. For now, we have word-level alignment and if you must have phoneme-level alignment, refer to my original post.

It’s not much more work than setting up sphinx4 and then getting the right result. There is a ForcedAlignerGrammar that you should use along with the DynamicFlatLinguist. That means some changes to your config file. Add the following:

<component name=”forcedGrammar” type=”edu.cmu.sphinx.linguist.language.grammar.ForcedAlignerGrammar”>
<property name=”dictionary” value=”dictionaryWSJ”/>
<property name=”referenceText” value=”"/>
<property name=”addSilenceWords” value=”true”/>
<property name=”addFillerWords” value=”false”/>
</component>

<!– This might already be in your config file –>

<component name=”dynamicFlatLinguist”
type=”edu.cmu.sphinx.linguist.dflat.DynamicFlatLinguist”>
<property name=”logMath” value=”logMath”/>
<property name=”grammar” value=”forcedGrammar”/>
<property name=”acousticModel” value=”wsj”/>
<property name=”wordInsertionProbability”
value=”${wordInsertionProbability}”/>
<property name=”silenceInsertionProbability”
value=”${silenceInsertionProbability}”/>
<property name=”languageWeight” value=”${languageWeight}”/>
<property name=”unitManager” value=”unitManager”/>
</component>

You can download my full config file here. By the way, I set the log level to INFO. If you don’t like all the output, you can set it back to WARNING.

As for the code, you need to change a few things in the FileRecognizer class we made before. You have to bring in a lot of classes from the configuration manager because you have to allocate a few things by hand at certain times to get things working correctly. At least, this was my experience. You’ll have to add these lines next to where you get the recognizer from the configuration manager:

grammar = (ForcedAlignerGrammar) cm.lookup(”forcedGrammar”);
ling = (DynamicFlatLinguist) cm.lookup(”dynamicFlatLinguist”);
ling.allocate();

Notice that the grammar is the ForcedAlignerGrammar that we added to the config file. Be sure that class is being imported.

The next change is setting up the grammar in the call to the recognizer.  Recall that The point here isn’t to transcribe spoken audio. The term “forced alignment” means you take the audio and the transcript and you find the timestamps where the audio aligns with the text. Therefore, you need to tell the recognizer what text to align the audio file with. In my case, I have an audio file that says “are you done” so I need to tell the grammar and the recognizer what the reference text is. You can set the reference text in the recognizer as you make the actual call to do the recognition:

grammar.setReferenceText(”are you done”);
recognizer.allocate();
Result result = recognizer.recognize(”are you done”);

Note also here that you allocate the recognizer just before you use it. The last thing you need to do is get the result that has the timestamps. That can be done with this line:

System.out.println(result.getTimedBestResult(true, true));

The two boolean parameters are for fillers (silences) and if you want the word token first. That is, the first one is true if you want to see where the pauses or silences start and finish between the words, which can be useful. The second one should probably be set to true because you want to know what word the timestamps are referring to.

When I run my recognition with “are you done” as the reference text, this is what I get:

<s>(0.0,0.32) are(0.32,0.53) <sil>(0.53,0.62) you(0.62,0.81) done(0.81,1.1) <sil>(1.1,1.47 )

The final java code can be found here. I know I retype the reference text and that’s not good programming practice. I’m not going to tell you where the file needs to go or any other eclipse-specific thing because if you’re wanting to do forced alignment, you probably know what you’re doing. But, if you have any questions even about that, please let me know. Good luck!

The Case for Standards

Standards are a huge issue in the computational linguistics world. At CALICO and LREC some big discussions were made on standards, more so at CALICO. There was talk about standardizing XML schemas, or some format for something so everyone could read it. No one should have proprietary software, according to most people there. Well, here is a practical post on a famous computer science blog about standards: Martian Headsets.