Skip to main content

Install Train and Test Moses Machine Translation Toolkit

Following are few easy steps with my experience with installing moses toolkit

I am doing it for ubuntu 64 bit (linux) system you can make an analogy for other installations.

1) Download the binary moses from here http://www.statmt.org/moses/?n=Moses.Packages In my case I used this - http://www.statmt.org/~jie/linux/moses-2.1-1/moses_2.1-1_amd64.deb


2) Install that with following command in the relevant directory
 sudo dpkg -i http://www.statmt.org/~jie/linux/moses-2.1-1/moses_2.1-1_amd64.deb

This install moses and all other relevant tools like GIZA and Language Modeler (IRSTLM) in "/opt" directory. You can copy that to wherever you want or leave it as it is there.

3)  Now goto /moses folder in /opt and move all contents of /giza++-v1.0.7 to a new folder /tools in /moses. You can avoid this, but then need to make some changes in training script below, so don't avoid it for the sake of introduction.

4) Creating Language Model: Your environment is set up. Now create a language model using target language corpus using following shell script. You are in your workdirectory say - YF

/opt/moses/irstlm-5.70.04/bin/add-start-end.sh \
   < YF/monolingualFilepath.hin \
   > LMV1.fl

export IRSTLM=/opt/moses/irstlm-5.70.04; 

/opt/moses/irstlm-5.70.04/bin/build-lm.sh \
   -i LMV1.fl                  \
   -t ./tmp  -p -s improved-kneser-ney -o LMV2.fl

/opt/moses/irstlm-5.70.04/bin/compile-lm  \
   --text=yes \
   LMV2.fl.gz \
   MYLM.lm

/opt/moses/bin/build_binary \
   MYLM.lm \
   MyLm.lm.bin.hi

I you don't want to worry much about what is happening, name your monolingual input data file for LM creation as - monolingualFilepath.hin and put in the directory where this script will be run i.e. YF. And if you understand what's it. Please go ahead and do as as you want the files to be named.

5) The above step creates new files move all new files to a new folder and name it /lm. Your language model is ready of which you will be using the MyLM.lm.bin.hin version.

Check you built binarized LM for Hindi i.e. MyLM.lm.bin.hin with:

echo "यह हिंदी वाक्य है  क्या ?" | /opt/moses/bin/query MyLm.lm.bin.hi 

It gives some numbers per word and that means you are done with LM creation.

6) Training the English to Hindi SMT system: Keep your bi-lingual training corpus in a folder called /corpus inside /YF and name files traincorpus.hin and traincorpus.eng. And then run the following script to train a English to Hindi translation system.

/opt/moses/scripts/training/train-model.perl \
    -cores 4 \
    -root-dir /YF \
    -corpus YF/corpus/traincorpus -f eng -e hin \
    -alignment grow-diag-final-and \
    -reordering msd-bidirectional-fe \
    -lm 0:3:YF/lm/MyLm.lm.bin.hi:8 \
    -external-bin-dir /opt/moses/tools >& YF/trainingLog.out


 7) It will take some time depending on the training data size. Once it creates the moses.ini, we can say that the training is done. Now you can try the trained system by executing following command -

/opt/moses/bin/moses -f YF/model/moses.ini 

It will ask for the English sentence and will output the Hindi. 
And we are DONE!










Comments

Popular posts from this blog

Publishing business basics

Basic Steps: 1. Decide name for the company 2. Register the company with ministry - you will need an attorney (Lawyer for that) 3. Register with Registrar of News Papers in India if it's a magazine/News paper.  4. Study the relevant acts in general or get them known from the lawyer 5. Start publishing Following are details regarding the same (not that well written) : ----- Some starts and books; * Start Your Own Self-Publishing Business (Entrepreneur Magazine's Start Up) by Entrepreneur Press  * How To Start And Run A Small Book Publishing Company: A Small Business Guide To Self-Publishing And Independent Publishing by Peter I. Hupalo  * Art & Science Of Book Publishing by Herbert S., Jr. Bailey  * This Business of Books: A Complete Overview of the Industry from Concept Through Sales by Claudia Suzanne Raja Rammohun Roy National Agency for ISBN West Block-I, Wing-6, 2nd Floor, Sector -I, R.K. Puram, New Delhi-110066 Some new things and t...

ती अशीच त्याला भेटली, अगदी अचानक आलेल्या पावसासारखी

खूप पावसाळे गेले. पण अगदी अलीकडे पर्यंत त्याला पाऊस कधीच आवडलेला नव्हता. पाऊस म्हंटल की त्याला भीती वाटायची, नसती कट कट वाटायची. भिजणे तर त्याला कधीच नको वाटायचे, उगाच सर्दी ला आमंत्रण. पण मागच्या पवसाळ्यात ढग असे आले आणि पाऊस इतका पडला की तो पावसाच्या प्रेमातच पडला. पाऊस आधीही पडायचा, कदाचित असाच, इतकाच किंवा जास्त ही. पण या वेळेस मात्र पाऊस एकटा आला नाही, त्याच्या आयुष्यात सोबत घेऊन आला तिच्या केसांचा ओला सुगंध. ती अशीच त्याला भेटली, अगदी अचानक आलेल्या मुंबईतल्या पावसासारखी, चिंब भिजलेली. त्याला ती आवडली आणि म्हणून पाऊस ही. ती त्याला पुन्हा भेटली, न ठरवून, अशीच पुन्हा एकदा अचानक पावसाराखीच. ते बोलले, थांबले, विसावले. ती समोर आली तेंव्हा पाऊस नव्हताच, ती अजून थोडी जवळ आली तेंव्हा ही पाऊस नव्हता. तो तिच्याकडे सारखा पहातच होता, पण नजर चोरून. आयुष्यात पहिल्यांदा त्याने पावसाकडे येण्याची विनवणी केली. आशेच्या नजरेने त्याने ढगांकडे पाहिले. ढग होते, पण अजून पाऊस मात्र नव्हता. तरी ही आजूबाजूंच्या झाडात, डोंगरात, फुलात आणि पाखरात ओलावा होताच. त्याच्या विनवनीने अखेर तो आला. रिम-झिम, रिम-झिम...

NVDIA AI India

 NVDIA AI some Insights after digesting the event with some Diwali Delicacies! I had good time at the NVIDIA AI Summit held at the Jio World Center in Mumbai. It felt like every company working on artificial intelligence in India was present, either as an exhibitor or an attendee. Some sessions were so packed that people were standing, and even then, many more were trying to get in. Much of the discussion revolved around Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG). These are undoubtedly central topics in AI today, drawing significant attention. I’m a huge admirer of RAG, but I still wouldn’t go as far as to say that “LLM (+RAG) is AI.” Although no one at the conference explicitly said this, it felt implied over the three days of sessions. I may be wrong, but I sensed a push for these technologies. Not just from NVIDIA, but from any hardware supplier, there’s an incentive to promote anything that drives demand for their solutions. NVIDIA’s GPUs are a backbo...