LREC
The International Language Resource and Evaluation Conference took place at the end of May this year in Marrakesh, Morocco. I was able to go for the same research we did on second language proficiency testing. We presented a poster in one of the poster sessions and had a lot of interested people ask many questions.
There was a big difference between the conference goers here and the ones at CALICO. The CALICO conference sported mostly educators looking for ways to improve language teaching in the classroom where LREC focused more on natural language processing. There would be more software engineers and linguists rather than educators. There were talks in the range from very in-depth statistical theory to corpora. I mostly sat in on what people were doing with machine translation or the Japanese language.
Now a word on corpora. For some naive reason, I thought that we had a pretty good amount of corpora for most purposes, like POS tagging, word chunking, parsing, etc. But, from this conference, I found that many organizations are working on new corpora all the time. There are general corpora like the Wall Street Journal spoken English to more specific corpora like the utterances of drunk people. Corpora is huge in NLP whether it’s statistical NLP or otherwise. The big corpora repositories are the LDC in the United States and ELRA in Europe. There are a few in Asia, as well. The problem is most useful corpora isn’t freely available. You can either 1. contribute or 2. pay for membership to get corpora. They will give corpora for free, but not typically to a hobbyist individual. They like to let universities use the data and they like to know why. That doesn’t mean the individual can’t have fun, he/she just has to be more creative.
Big companies like Microsoft presented some things at the conference, as well. Companies use NLP more and more these days even if they aren’t a specific NLP company like, say, Nuance. Microsoft can use NLP in MS Word. I worked for a company where we worked on developing a way to make a part of speech tagger to automatically tag new dialogs so someone wouldn’t have to go in and do it by hand- something that didn’t necessarily affect the end user. Cell phone companies, car companies, and many different software companies are using NLP more and more. This conference may not have the bleeding edge of NLP technology of our time, but it is a great conference for seeing what’s going on in the field and possibly finding a job doing NLP.
Leave a comment
You must be logged in to post a comment.