Moses Machine Translation
Moses is, as stated in the title, a machine translation system. That is, it’s an open-source system that one can download and use to translate, potentially, from any language to any language. That’s saying a lot, but there’s also a lot to do in order to get it working.The moses website is actually quite good, so before you try my instructions, follow the instructions there: http://www.statmt.org/moses/But if you still don’t know what to do, I might be able to help. I’m going to step through getting the source code from an SVN repository via eclipse. You can get the source code via the website in a tar file, or you can just download the binary and run it without compiling, assuming your computer can handle the binary.<br>
The Environment: One could potentially use any means of an editor and svn, but, like I said, I will use eclipse. Eclipse was written for Java, but I find it, though not perfect, pretty functional with other languages, including C/C++ in which moses was written. So, you will need a version of eclipse that has the plugin for C/C++. If you don’t know how to install it, just download a fresh eclipse with it already included here:
http://www.eclipse.org/cdt/downloads.php
The other thing eclipse will need is a way to access svn. If you know of one, add it and make it work. Or, you can just install subclipse. Once you have your eclipse up and running with C/C++ capabilities, you can then go to help->install new software->add (type Subclipse for the name and the URL is: http://subclipse.tigris.org/svn/subclipse/tags/subclipse/1.6.5) then just continue on until it downloads and installs it. You’ll need to restart eclipse to have it take effect.Now, with C/C++ and SVN capabilities in eclipse, you are ready to get your hands dirty with moses.
Getting Moses: In eclipse, go to File->new->other->svn->Checkout Projects from SVN. Create a new location and click next. Add the following:
https://mosesdecoder.svn.sourceforge.new/svnroot/mosesdecoderand
hit next. It will take a second to download the information. There are many branches that people are working on (I currently use the config-switching branch for several reasons) but if you just want to get in and play, you can just click on “trunk” and then click next. Lucky for you, the people who maintain moses include eclipse project files, so it is quite easy to get set up with eclipse. Anyway, you may want to change the project name, or you can leave it as trunk, it doesn’t really matter. Make sure it sets it up as a C/C++ project, or things won’t work out right later. It should be an empty C++ project. Then click finish. It’ll take a few minutes to download. Once it’s done, all is not quite ready.
To Build: I went to Project and de-selected “Build Automatically” so I could tell it when to build (compile).There are a few things done be done before we can build. The makefiles that tell eclipse what to compile aren’t even there yet, we have to generate them. It’s quite easy, however. Open up a console and navigate to the directory where your moses code is. Then run:
./regenerate-makefiles.sh
This will take a few seconds and will generate your makefiles based on how your computer is configured. The only problem I ran into was this: possibly undefined macro: AC_PROG_LIBTOOL and I was able to fix it easily by installing libtool (in ubuntu: sudo apt-get install libtool -thanks to this site for the info) and then tried again and it worked fine. Now, this next step will separate those who just want to get their homework done from those who really want to use moses. You have your makefiles, but you also need to tell moses where to find a few things. If you already have the phrase tables, that is the data required to train the moses (statistical) machine translation engine, then you can simply type
./configure
and let it run. If you want to make your own phrase tables, you’ll need to install either srilm or irstlm, or both. These are separate pieces of software that do a lot of the data building necessary to make moses work (why reinvent the wheel?). Moses is nice to be compatible with different kinds, so pick the one you want. Installation for both can be tricky (I found irstlm much easier), but doable. Perhaps in a later post I’ll explain how to install them. Until then, be happy with the little bit that moses comes with.Now, go back to eclipse, right-click on your project and hit refresh. Then, click on Project-Properties->C/C++ build. I deselected “Generate Makefiles automatically” and then click on “Workspace” and just then clicked on the root workspace folder and clicked okay (something like ${workspace_loc:/moses} showed in the Build directory field, where moses was the name of my project). This tells it to look in the moses folder for a Makefile, which we generated earlier.Now, press Ctrl+B and it will take a few minutes to build. It’s compiling C and C++ code using make, so eclipse really isn’t doing much but calling it for you. You can click on the Console tab in the lower part of eclipse to see what it’s doing at any given time
To Run: When it’s done compiling, you can give it a test. Have fun. Just kidding, this is how you try it out: notice that you have a new list called “Binaries” in your project explorer. Expand that and you’ll see everything that was just compiled. Right-click on moses and run as local C/C++ Application. Then it will run, but not really. It just spits out the help information because you provided no command line arguments. The problem is, we don’t have any phrase tables for it to use to actually do translation. The moses website provides a sample one for testing to see if your compile worked. You can download it here:
http://www.statmt.org/moses/download/sample-models.tgz
Now, by no means is this going to be what you use to actually do some translating. This is just a tiny sample that utilizes the moses MT system, but with a very small amount of training data and only select phrases to translate. Sorry, if you want data, you’ll have to make your own using some parallel corpora (something I hope to discuss later).Once you downloaded the sample_models.tgz file, you can open a command window and navigate to where it is and then run:tar -xzf sample-models.tgzand then go into the new sample-models directory that it just made. Then go into the phrase-tables directory. This is what you need. Open up the moses.ini file with an editor and change the line under [ttable-file] to the path where it currently is (in my case it was on my desktop under Desktop/sample-models/phrase-table/phrase-table) and then save it and close.We’re getting close. Now, you need to note where the moses.ini file is on your computer. Now, go to eclipse and then click on the green “Run” button (looks like a “Play” button”) make sure you hit the down-arrow part, and then click on Run Configurations. Under C/C++ Application, you’ll see moses (it’s there because you tried to run it before). Select it and go to the Arguments tab on the right. Then type in the following:-f {path to the moses.ini file}In my case it was something like:
-f /home/something/Desktop/phrase-table/phrase-model/moses.ini
That’s all you need. Now, click “Apply” and then “Run” and then you’ll notice that it gets into motion. After a few seconds of running, it stops. At this point it is waiting for input. As this is a German-English sample phrase table, you can type:das ist ein kleines hauspress enter, and you should see the translation:this is a small houseThat’s it. You’ve successfully used moses to translate something. Congratulations. Now, to actually use moses in a big way is up to you. You can look into the boost library and work with multi-threading, or you can get the srilm and create your own phrase tables to feed to moses to do your own translations. There is a lot you can do, so check the website and see what’s available.
Appendix:You may need some other things, so I included them here without descriptive steps:
If you need tcl (srilm uses tcl):
.sudo apt-get install tcl tcllib tcl-devTCL_INCLUDE, TCL_LIBRARY: to whatever is needed to find the Tcl header files and library. If Tcl is not available, set NO_TCL=X and leave the above variables empty.
I had to copy the /usr/include/tcl8.5/ files to the srilm/misc/src dir
Also, exclude LanguageModelRandLM from compile!
I also put the srilm directory in the same as moses
To get boost:
http://cl.aist-nara.ac.jp/~eric-n/ubuntu-nlp/dists/jaunty/all//etc/ld.so.conf
boost:sudo apt-get install libboost-date-time-dev libboost-date-time1.34.1 libboost-dev libboost-doc libboost-filesystem-dev libboost-filesystem1.34.1 libboost-graph-dev libboost-graph1.34.1 libboost-iostreams-dev libboost-iostreams1.34.1 libboost-program-options-dev libboost-program-options1.34.1 libboost-python-dev libboost-python1.34.1 libboost-regex-dev libboost-regex1.34.1 libboost-signals-dev libboost-signals1.34.1 libboost-test-dev libboost-test1.34.1 libboost-thread-dev libboost-thread1.34.1tUse flag for compiler:-std=c++0xhttp://www.52nlp.com/moses-support-digest-moses-compilation-problem-on-fedora-11/in sphinx, i removed the randlm related flags to the compilehttp://www.statmt.org/moses_steps.htmlbinarize: (make sure you use the .gz compressed version)processPhraseTableprocessLexicalTable