MP3 Conversion in Sphinx4

It’s actually not that specific; this is how you can do MP3 conversion in any Java program. I’m just going to show you how to do it in the context of Sphinx4. The tool that does the job is called Tritonus a series of Java jars that can encode from several different formats into several different formats.

Why bother? Because Sphinx4 requires 16-bit mono files in wav format. It can’t recognize anything else. Sometimes all you get to work with are MP3s and that means you have to convert them. There are a lot of programs out that (free ones, even) that you can use to convert from various formats to various formats, but Sphinx4 is a little bit pickier than just an easy conversion. For example, Audacity is a common program you would use to manipulate or convert an audio file, but for some reason the output is difficult to read by Sphinx4 even though it sounds just fine through any audio player. Trinonus does it right and…it can do it on the fly. It can detect the format of an audio file and, if all is set up properly, can convert to the coveted 16-bit format needed by Sphinx. With Tritonus there’s no need to convert everything beforehand.That’s why.

Now for the how. If you’ve followed my posts by now you should have been able to recognize a simple file, understand how the config file works, and change some things around to meet your needs. It’s actually not too difficult. All you need is a file in the proper format to recognize, a config file, and a Java program to utilize the Sphinx4 jar. Assuming you have the former 2, the Java file can be a pretty base file that looks something like this (note that the blog software I use automatically left-justified everything, sorry for the formatting!):

public class FileRecognizer {
public static void main(String[] args) {
try {
URL audioFileURL = FileRecognizer.class.getResource(”insert path to audio file here”);
URL configURL = FileRecognizer.class.getResource(”config.xml”);
ConfigurationManager cm = new ConfigurationManager(configURL);
Recognizer recognizer = (Recognizer) cm.lookup(”recognizer”);
/* allocate the resource necessary for the recognizer */
recognizer.allocate();
StreamDataSource reader = (StreamDataSource) cm.lookup(”streamDataSource”);
AudioInputStream ais  = AudioSystem.getAudioInputStream(audioFileURL);
reader.setInputStream(ais, audioFileURL.getFile());
Result result = recognizer.recognize();
if (result != null) {
System.out.println(”\nRESULT: ” +
result.getBestFinalResultNoFiller() + “\n”);
}
catch (Exception e) {
e.printStackTrace();
}
}

At a glance, this Java class file has the main method, a line that gets the path of the file you want to recognize, reads information from the config file, then recognizes the file and prints the results. Assuming you have the Sphinx4 jar file linked in (see this post if you don’t  know what I’m talking about) this should compile and work just fine.

Now the trick is to be able to read in an audio file and have it be converted to the proper format on the fly. The steps to be able to do this are similar to using Sphinx4. You have to link in the proper jars (and in the proper order) and know how to import and utilize them.

First, you need to go to the trinonus plugins website and download a few things.

  1. tritonus.jar
  2. tritonus_share.jar
  3. tritonus_remaining.jar
  4. tritonus_mp3.jar

Note that these jars probably have version numbers attached to them (for example, tritonus_mp3 might be tritonus_mp3-0.3.6.jar) which is fine. Download the latest of each jar into (or move them into) the lib folder you created inside your sphinx4 directory. If you don’t know what I’m talking about, then refer back to this post to see what I’m talking about). Now, in order to link them in using eclipse, you want to go to Project->Properties->Java Build Path->Libaries->Add Jars (add all 4 of them) then go to the Order and Export tab and move the order of them to the order shown above (tritonus.jar, share, remaining, mp3).

You’re not out of the woods yet. You still need some more jars to get the job done. You may need to specially link in tools.jar which can be found in your java jre/lib folder. For some reason I had to do that to get it to work properly. The other jar you’ll need is called javalayer.jar. You can run a Google search. It’s a javazoom project and for some reason is necessary for Sphinx4 to work with tritonus to do anything with mp3s. Download it, get it into the lib folder, and link it in like you did before.

That should do it for getting the necessary plugins- now you need to utilize them with the code. It’s not too tough. First, import the right stuff:

import javax.sound.sampled.AudioFileFormat;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.UnsupportedAudioFileException;

Then, there are some steps that you’ll need to follow. After you get the AudioInputStream for the file you want to recognize, you’ll need to tell it what to change it into and then actually do the change:

AudioFormat targetFormat =
new AudioFormat(16000f,
16,    // sample size in bits
1,     // mono
true,  // signed
true);

AudioInputStream convertedAis = wavFile.convertAudioInputStream(ais, targetFormat);

The last line is a call to a function that will run through a few things and then return the converted AudioInputStream. If all goes well, that’s what you’ll be sending through to the recognizer. This is the function convertAudioInputStream (NOTE!!! I didn’t make this function. I was kindly shown this by Robbie Haertel so he gets credit for all of this):

private AudioInputStream convertAudioInputStream(AudioInputStream sourceAis, AudioFormat targetFormat) {
AudioFormat baseFormat = sourceAis.getFormat();
AudioFormat intermediateFormat;
AudioInputStream convertedAis = sourceAis;

// First convert the encoding, if necessary
if (!baseFormat.getEncoding().equals(targetFormat.getEncoding())) {
intermediateFormat = new AudioFormat(
targetFormat.getEncoding(),
baseFormat.getSampleRate(), baseFormat.getSampleSizeInBits(), baseFormat.getChannels(),
baseFormat.getChannels() * 2, baseFormat.getSampleRate(),
false);
convertedAis = AudioSystem.getAudioInputStream(intermediateFormat, sourceAis);
//this.writeConvertedFile(convertedAis, “C:\\encoding.wav”);
baseFormat = intermediateFormat;
sourceAis = convertedAis;
convertedFile = true;
}

// Then convert the sample rate
if (baseFormat.getSampleRate() != targetFormat.getSampleRate()) {
intermediateFormat = new AudioFormat(
baseFormat.getEncoding(),
targetFormat.getSampleRate(), baseFormat.getSampleSizeInBits(), baseFormat.getChannels(),
baseFormat.getChannels() * 2, targetFormat.getSampleRate(),
false);
convertedAis = AudioSystem.getAudioInputStream(intermediateFormat, sourceAis);
//this.writeConvertedFile(convertedAis, “C:\\sample.wav”);
baseFormat = intermediateFormat;
sourceAis = convertedAis;
convertedFile = true;
}

// Then convert the number of channels
if (baseFormat.getChannels() > targetFormat.getChannels()) {
intermediateFormat = new AudioFormat(
baseFormat.getEncoding(),
baseFormat.getSampleRate(), baseFormat.getSampleSizeInBits(), targetFormat.getChannels(),
targetFormat.getChannels() * 2, baseFormat.getSampleRate(),
false);
convertedAis = AudioSystem.getAudioInputStream(intermediateFormat, sourceAis);
//this.writeConvertedFile(convertedAis, “C:\\channels.wav”);
baseFormat = intermediateFormat;
sourceAis = convertedAis;
convertedFile = true;
}
return convertedAis;
}

Of course, on the reader.setInputStream call you send it the convertedAis in stead of the original ais. That should do the trick.

If only it were that easy.

I found that this didn’t work sometimes for some reasons. It turns out that working with audio providers (like tritonus) things can get in the way and audio streaming becomes a difficult thing to do. For those of you who had the same problem, I found a work-around hack that has never failed me. After I create the new AudioInputStream aptly known as convertedAis, I write it out to disk as a new audio file (in this case, it’ll be the converted wav file). Then I read it in from scratch as if the conversion never happened. As soon as I recognize the file, I delete it so no one is the wiser. However, this can be useful because you can now use Sphinx4 to not only convert your MP3 files and recognize them on the fly, but you can write the converted wav files to disk (of course, giving them good names) and using them later for….whatever you want. You can write your own program to convert whole folders of files.

You can now see how adding in other jars from tritonus will give you the option of converting files from and into different formats. Have fun. I attached my final version of the FileRecognizer.java file complete with conversion, writing the file out to disk, recognizing it, then deleting it.

4 Comments

  1. Ross:

    Could we use Tritonus to on the fly convert aiff files? (Whatever Mac’s proprietary audio codec is)? :) Boo!

  2. admin:

    From what I understand, you can convert TO aiff files, but not from them. I’ll give it a try just for fun in the next few days. The audio provider (jar file) you need from tritonus is tritonus_aos.jar

  3. Jim:

    Hi - I’ve been reading your blog posts on NLP, sphinx in particular, and found them to be very helpful.

    However, I’m finding that I get very good transcriptions with the audioFileDataSource and horrible ones with the streamDataSource like you have used here. This doesn’t make sense to me, shouldn’t they be identical?

    Thanks

  4. admin:

    Yeah, they should be the same. Something in the underlying code makes things not transfer well. Sorry, I don’t have a reasonable explanation. But the workaround works and I’m happy with it!

Leave a comment

You must be logged in to post a comment.