155x Filetype PDF File size 0.31 MB Source: acl-bg.org
Multi-Lingual Phrase-Based Statistical Machine Translation for Arabic-English AhmedBastawisy and MohamedElmahdy ComputerScience Department GermanUniversity in Cairo, Cairo, Egypt ahmed.bastawisy@student.guc.edu.eg, mohamed.elmahdy@guc.edu.eg Abstract Arabic people. MSA is used in news broadcasts, newspapers, formal speech, books, movies subti- In this paper, we implement a multi- tling, and whenever the target audience or readers lingual Statistical Machine Translation comefromdifferent nationalities. However, MSA (SMT) system for Arabic-English Trans- is not the natural language for everyday life com- lation. Arabic Text can be categorized munications and on social networks. In fact, di- into standard and dialectal Arabic. These alectal Arabic is usually used in this case. two forms of Arabic differ significantly. A major problem in all Arabic Natural Lan- Different mono-lingual and multi-lingual guage Processing tasks, and in particular Statis- hybrid SMT approaches are compared. tical Machine Translation (SMT) is the existence Mono-lingual systems do always result of the Arabic dialects. There exist significant syn- in better translation accuracy in one Ara- tactic, morphological, and lexical differences be- bic form and poor accuracy in the other. tweenMSAandthedifferentArabicdialects. That Multi-lingual SMTmodelsthataretrained is why they are sometimes considered as com- with pooled parallel MSA/dialectal data pletely different languages (Soudi et al., 2012; result in better accuracy. However, since Elmahdyetal., 2012) the available parallel MSA data are much There were big efforts exerted to improve larger compared to dialectal data, multi- Arabic-English SMT, most of these efforts were lingual models are biased to MSA. We focusedonMSAratherthandialectalArabic. This propose in the work, a multi-lingual com- is mainly due to the fact that the vast majority of binationofdifferentmono-lingualsystems available parallel Arabic data are for MSA, whilst using an Arabic form classifier. The out- relatively sparse and limited parallel data are avail- come of the classier directs the system able for dialectal Arabic (Alqudsi et al., 2014). to use the appropriate mono-lingual mod- Totackle the problem of dialectal Arabic paral- els (standard, dialectal, or mixture). Test- lel data sparsity, in many previous, they have nor- ing the different SMT systems shows that malized dialectal words/phrases into correspond- the proposed classifier-based SMT sys- ing MSA equivalents. This normalization, or piv- tem outperforms mono-lingual and data- oting, is basically a rule-based approach to para- pooled multi-lingual systems. phrase dialectal words into MSA. This normal- 1 Introduction ization would allow the usage of existing MSA SMTsystems(SalloumandHabash,2013;Sawaf, The Arabic language is the largest still living 2010). Semitic language. Arabic is spoken by more than In (Zbib et al., 2012), instead of relying on nor- 350millionpeoplearoundtheworld. Itisalsoone malization or pivoting, they have collected extra ofthefiveofficiallanguagesoftheUnitedNations, dialectal Arabic parallel in combination to exist- and the first official language of twenty-two coun- ing MSA data. Results showed that the proposed tries knownbytheArabworld. Arabicisalsoused pooling technique has improved translation accu- asasecondlanguageformorethan1.2billionpeo- racy for dialectal Arabic. However, MSA transla- ple. tion accuracy has slightly decreased. Modern Standard Arabic (MSA) is currently Because of the complex morphological nature considered the formal Arabic variety across all of Arabic, some prior work, as in (Lee, 2004), fo- 86 Proceedings of Recent Advances in Natural Language Processing, pages 86–89, Varna, Bulgaria, Sep 4–6 2017. https://doi.org/10.26615/978-954-452-049-6_013 cused on MSAmorphologicalanalysis to improve The English language model is used to estimate Arabic SMT. the prior probability in all of the proposed SMT The aim of this work is to build a Multilingual techniques. Arabic SMTsystemthat supports MSA as well as The three translation models have been tested dialectal Arabic. Another goal is that the addition with the three testing sets (MSA, dialectal, of dialectal Arabic should not affect MSA trans- MSA+dialectal). As shown in Table 1, the MSA lation accuracy. Moreover, since available MSA model has resulted in BLEU score of 34.8, 2.6, data are always larger than dialectal data, the sys- and 18.7 on MSA, dialectal, and MSA+dialectal temshould not be biased to MSA. testing sets. It is clear that the MSA model per- In this paper, we propose training three differ- forms poorly on dialectal Arabic data. Using di- ent Arabic SMT models. One model for MSA, alectal Arabic model, the results were 4.1, 15.9, another system for Dialectal Arabic, and the last and 10.0 on MSA, dialectal, and MSA+dialectal one is a hybrid model that is trained with a data respectively. It is clear that the dialectal model pool of parallel Arabic-English for MSA and di- performs better on dialectal data, and performs alectal Arabic. A pre-classifier is built to choose poorly with MSA data. The hybrid model has re- the appropriate model to be used. sulted in a better acceptable accuracy across both MSA and dialectal Arabic. The hybrid model 2 Translation Models has resulted in 33.2, 12.3, and 22.8 BLEU for Throughout this work, all translation models were MSA, dialectal, and MSA+dialectal respectively. built using Giza Aligner and Moses SMT engine The hybrid model seemed to be a little bit bi- (Philipp et al., 2007). Three translation models ased towards MSA as the relative decrease in the havebeencreated: MSA-Englishmodel,dialectal- accuracy was -4.6% relative the MSA baseline English model, and hybrid-English model. To model, and -22.6% relative to the dialectal base- train the MSA-English translation model, a par- line model. allel dataset of 26M words was utilized from the Translation Parallel data type ISI Arabic-English Automatically Extracted Par- model MSA Dialect. MSA+Dialect. allel Text corpus (Dragos and Daniel, 2007). An MSA 34.8 2.6 18.7 independent MSA-English evaluation set of 300K Dialectal 4.1 15.9 10.0 words was used to tune the model. A MSA- Hybrid 33.2 12.3 22.8 English test set of 300K words is used to evaluate MSA-Englishtranslation accuracy. Table 1: BLEU score for the different SMT sys- Totrain the dialectal-English translation model, tems on MSA, dialectal, and MSA+dialectal data. a parallel dataset of 2.7M words was utilized from the Arabic-Dialect/English Parallel Text corpus 3 Classification-Based Translation (Technologies et al., 2012) (notice the huge dif- ference between the size of available MSA and Although before adding the classifier, MSA and dialectal data). An independent dialectal-English Dialectal Arabic-English SMT systems accuracy evaluation set of 300K words was used to tune the were poor across the different variants, the hybrid model. A dialectal-English test set of 300K words systemthatwastrainedwithbothMSAanddialec- is used to evaluate MSA-English translation accu- tal data has resulted in better accuracy. However, racy. the aim of the Classification-Based Translation is The hybrid translation model has been trained to further improve the accuracy across both dialec- by pooling both training sets of MSA and dialec- tal and MSA, and to overcome the bias problem of tal parallel data that consists of 26M MSA words the hybrid model. and 2.7M dialectal words. Model tuning was per- Two classification techniques have been used, formed using the two evaluation sets of MSA and the first technique is to classify input Arabic text dialectal Arabic. into two classes Standard and Dialectal, and ac- Astatistical tri-gram language model is trained cordingly translate them with the appropriate sys- for English. Language model training set con- tem. ThesecondtechniqueistoclassifyinputAra- sists of 688M words from 2011 and 2012 articles bic text into three classes Standard, Hybrid and (News Crawl) that is described in (Sofia, 2013). Dialectal, and then use the appropriate system ac- 87 cordingly. ments which have scored more than the threshold A tri-gram MSA language model is built for -3.7. The second group contains the dialectal Ara- the sake of classification. More than 355M words bic classified segments which have scored below from the Arabic Gigaword corpus (Parker et al., or equal the score threshold -3.7. 2011) were used to train a MSA language model. After that, each classified Arabic text file was The MSA language model is used in text clas- translated with the corresponding SMT system, sification by scoring every input sentence by the and then all translations were evaluated with the language model. Sentences with high log likeli- BLEUscoretest. hood are classified as MSA, whilst sentences with low log likelihood are classified as Dialectal. 3.2 SecondClassification Technique 3.1 First Classification Techniques In this technique, instead of having a sharp thresh- old between MSA and dialectal classes, we have In the techniques, text segments are classified into created a windowwiththeoptimalthresholdinthe twocategories: MSA or dialectal. Two-passes op- middle. Any sentence with a score that lies in this timizationsearchwasmadetofindtheoptimallan- window is classified with a third class. That class guage model scoring threshold between MSA and is labeled the mixture class. It is assumed that dialectal classes. anysentenceinthisclass (very close to the thresh- In the first pass, a coarse search was performed old) might contain a mixture of dialectal and MSA by varying classification threshold from 0.0 to - words, which is a common case on social media 10.0 with a coarse step of 1.0. For each iteration, for instance. The optimal window range has been classification accuracy is evaluated. The initial op- found to be from -2.7 to -5.45. The three classes timal threshold was found to be -4.0 which has re- in this case are: Dialectal, MSA, and mixture. sulted in classification accuracy of 95.58% on the The test set is classified into three file groups, evaluation sets of MSA and dialectal Arabic. the first group contains MSA sentences, which has In the second optimization pass, a fine step scored more than the window upper bound -2.7, search was performed around the initial -4.0 the second group has the Hybrid Arabic classi- threshold with a variable value of -3.0 to -5.0 with fiedsentences, whichhasscorewithinthewindow a step of 0.1. Figure 1 shows classifier’s accuracy from -2.7 to -5.45, the third group has the Dialec- test with a fine step of 0.1 (x = x−0.1). Asshown tal classified sentences, which has scored less than in the graph, the optimal threshold is -3.7 which the window lower bound -5.45. has resulted in classification accuracy of 96.64%. After that, each classified Arabic text file was Thus, threshold of -3.7 has been used. translated with the corresponding SMT system, and then all translations were evaluated with the BLEUscoretest. 4 Experimental Results The two classification-based translation tech- niques have been tested on a test set that combines bothtestingsetsofMSA(300Kwords)anddialec- tal Arabic (300K words). Thefirst classification technique has resulted in a BLEUtranslation accuracy of 29.1 absolute out- performing the hybrid model with a relative in- crease in the accuracy of 27.6% as shown in Table 2. Figure 1: Fine tuning graph for MSA/dialectal Thesecondclassificationtechniquehasresulted classification threshold. in a BLEU translation accuracy of 29.0 absolute outperforming the hybrid model with a relative in- In this technique, the classifier works on classi- crease of 27.2%. fying the test set and generating two file groups, AsshowninTable 2, both techniques have sig- the first group contains the MSA classified seg- nificantly improved translation accuracy in com- 88 parison to all of the three baseline systems. This Stefan Munteanu Dragos and Marcu Daniel. 2007. means that introducing a pre-classification stage ISI Arabic-English Automatically Extracted Parallel might be a helpful step in improving the perfor- Text LDC2007T08. Web Download. Philadelphia: manceofArabicmachinetranslation systems. Linguistic Data Consortium. The BLEU score is slightly better in the first Mohamed Elmahdy, Rainer Gruhn, and Wolfgang classification technique than the second one with Minker. 2012. Novel Techniques for Dialectal Ara- an absolute difference of 0.1. This implies that it bic Speech Recognition. Springer-Verlag New York, is enoughtoclassifyinputArabictextintojusttwo 1 edition. categories instead of three. Young-Suk Lee. 2004. Morphological analysis for sta- tistical machine translation. In Proceedings of HLT- Technique BLEU Relative NAACL 2004: Short Papers. Association for Com- putational Linguistics, pages 57–60. Hybrid Model 22.8 baseline Parker, Robert, et al. 2011. Arabic Gigaword Fifth Classifier-based 1 29.1 +27.6% Edition (LDC2011T11). Linguistic Data Consor- Classifier-based 2 29.0 +27.2% tium. Table 2: Translation accuracy on MSA+dialectal KoehnPhilipp, Hoang Hieu, et al. 2007. Moses: Open parallel data for the hybrid model, classifier-based Source Toolkit for Statistical Machine Translation. Annual Meeting of the Association for Computa- technique 1, and classifier-based technique 2. tional Linguistics (ACL). 5 Conclusions Wael Salloum and Nizar Habash. 2013. Dialectal Ara- bic to English machine translation: Pivoting through modern standard Arabic. In HLT-NAACL. pages This paper has focused mainly on enhancing the 348–358. accuracy of SMT across MSA and dialectal Ara- HassanSawaf.2010. Arabicdialect handling in hybrid bic. Three baseline Arabic-English SMT systems machine translation. In Proceedings of the confer- were built: MSA, dialectal, and Hybrid. MSA ence of the association for machine translation in system resulted in significantly low accuracy on the americas (amta), denver, colorado. dialectal data, whilst dialectal system resulted in Sofia. 2013. News Crawl (articles from 2011 and low accuracy on MSA data. The hybrid system 2012). web Download. Shared Task: Machine performed with a better average accuracy across Translation. both MSAanddialectal data. Abdelhadi Soudi, Ali Farghaly, Gunter Neumann, and In order to classify input text into the correct Rabih Zbib. 2012. Challenges for Arabic Machine variety of Arabic (dialectal or MSA), two classi- Translation. Natural Language Processing 9. Ben- fication techniques have been proposed. The first jamins, John. technique classifies the testing data into two cate- Raytheon BBN Technologies, Linguistic Data Con- gories, one to be translated with the MSA model, sortium, and Sakhr Software. 2012. Arabic- and the other to be translated with the dialectal Dialect/English Parallel Text (LDC2012T09). Lin- model. The second technique classifies the testing guistic Data Consortium. data into three classes, one to be translated with Rabih Zbib, Erika Malchiodi, Jacob Devlin, David the MSA model, one to be translated with the hy- Stallard, Spyros Matsoukas, Richard Schwartz, John brid model, and the last one to be translated with Makhoul, Omar F Zaidan, and Chris Callison- the dialectal model. Burch. 2012. Machine translation of arabic dialects. Both techniques have significantly improved In Proceedings of the 2012 conference of the north translation accuracy on a balanced testing set that american chapter of the association for computa- tional linguistics: Human language technologies. containsequalamountsofMSAanddialectaldata. Association for Computational Linguistics, pages The first technique resulted in a slightly better 49–59. BLEUscorethanthesecondclassification one. References ArwaAlqudsi, Nazlia Omar, and Khalid Shaker. 2014. Arabic machine translation: a survey. Artificial In- telligence Review 42(4):549–572. 89
no reviews yet
Please Login to review.