157x Filetype PDF File size 1.59 MB Source: www.itm-conferences.org
ITM Web of Conferences 40, 03026 (2021) https://doi.org/10.1051/itmconf/20214003026 ICACC-2021 Various Approaches of Machine Translation for Marathi to English Language 1,3,∗ 2,∗∗ 3,∗∗∗ 3,∗∗∗∗ Nilesh Shirsath , Aniruddha Velankar , and Ranjeet Patil Dr.Shilpa Shinde 1Department of Computer Engineering, Ramrao Adik Institute of Technology, India Abstract.MachineTranslation(MT)isagenerictermforcomputerisedsystemsthatgeneratetranslationsfrom one natural language to another, with or without human intervention. Text may be used to examine knowledge, andturningthatinformationintopictures helps people to communicate and acquire information.There seems to be a lot of work conducted on translating English to Hindi, Tamil, Bangla and other languages. The important parts of translation are to provide translated sentences with correct words and proper grammar. There has been a comprehensive review of 10 primary publications used in research. Two separate approaches are proposed, one uses rule based approach and other uses neural-machine translation approach to translate basic Marathi phrases to English. While designed primarily for Marathi-English language pairs, the design can be applied to other language pairs with a similar structure. 1 Introduction and Microsoft translators. Translated data is stored by the owners of the platform and may later be reused. MachineTranslation(MT)isacommonnameforcomput- • Notifying the Client about MT Use - Whether a transla- erized systems which are responsible for generating, with tion company should notify customers about the use of or without human assistance, translations from one natural MTfortheirprojectsis a point of debate in the industry. language into another.It is part of Natural Language Pro- Many are in pursuit of informing the customer of the cessing (NLP) where translation from the source language use of MT and others may not disclose the use of MT. to the target language is conducted, preserving the same If you have questions about MT use, be sure to ask your meaning of the phrase To help them make text and speech provider. into another language, humans can use Machine Transla- tion Systems. The program can run without any human in- tervention. The conventional approach is achieved for the 1.1 Objective MT to translate large quantities of knowledge involving The main Objective of the project is to implement a terms that could not be interpreted. The MT performance tool which translates Marathi sentences to English with- level can differ considerably , MT programs need ”train- out changing the meaning of the sentence using the Rule ing to improve the quality of the outcome in the relevant Based and Neural Based System domain and language pair. AsMTcanbeclassifiedintodifferentcategories 1.2 Motivation • Rule-based Systems: uses a combination of language and grammar rules Asthesteadyprogressinthefieldoftechnology,theInter- • Statistical Systems: learn to translate by analyzing large net’s growth has also increased at a tremendous rate. With amounts of data globalization, the official language of the globe has been English. In Marathi literature, there are approximately 71 • Neural Machine Translations (NMT): learn to translate million Marathi speaking individuals and various works. through one large neural network (multiple processing However, Marathi language is comprehensible for a very devices modeled on brain) smallgroupofpeoplesoasystemisproposed,whichtakes Different MTproviders like Google translator, Yandex the input Marathi sentence and translates it into English Translator etc. provide some translation tools with ethics which is an understandable language. to customer like: • Confidentiality – There is no confidentiality in the con- 2 RELATEDWORK tent translated by online MT platforms such as Google Asurveyoftheresearch done for data summarization and ∗e-mail: Nileshshirsath2389@gmail.com the currently existing systems, give the following results. ∗∗e-mail: aniruddhav25@gmail.com ∗∗∗e-mail: patilranjeet3699@gmail.com A. Transmuter: An Approach to Rule-based English ∗∗∗∗e-mail: shilpa.shinde@rait.ac.in to Marathi Machine Translation.[1] Creative Commons License 4.0 © The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Attribution (http://creativecommons.org/licenses/by/4.0/). ITM Web of Conferences 40, 03026 (2021) https://doi.org/10.1051/itmconf/20214003026 ICACC-2021 The basic method used to implement this frame- meaningofawordbyaddingthepropersuffixtoitac- work is Rule Based Machine Translation.The model cordingtothestructureofthesentence.Theimplemen- is built on a generalised approach based on the cat- tation of the Inflection for English to Marathi Transla- egories/domains to which a term belongs.The proper tion is presented in this paper. spelling of words The target language was created Theinflection of nouns, pronouns, verbs, and ad- based on the parse tree’s basic traversals.The archi- jectives in a sentence is determined by the other words tecture is partly applied in the context of a Machine and their qualities. The rules for inflecting the above Translation method. Parts-of-Speech are presented in this document. A generalized approach based on the cate- E. Marathi to English Neural Machine Translation gories/domains to which a word belongs is the with Near perfect corpus and transformers [5] model.In order to achieve better quality translations, the number of rules formed is high for the target Totranslate Marathi sentences into English, the device language generation.The consistency of the translation usestheNeuralMachineTranslation(NMT)method.It of this method depends on The size of the knowledge focuses on transformer based architecture.All of the base of grammar. If the size exceeds this threshold, transformers’ findings were equivalent to Google due to contradictory laws, the consistency can de- Cloud API-V2. In MAE and RMSE, it also outper- crease. formed Google API. This study suggests that this is the case. B. Script Translation System For Devanagari To En- From the results and examples it is observed that glish. [2] the proposedtransformer-basedmodelwasabletoout- perform Google Translation with limited but almost The proposed scheme will translate more than one correct parallel corpus.The system produced satisfac- Devanagari word to English using the rule-based ap- tory results by making use of word piece tokenizer but proach. This is accomplished by understanding the in order to improve the performance sentence piece to- different parts of speech in Marathi phrases. In a bilin- kenizer must be implemented and the results need to gual dictionary, tokenizing and describing each word’s be compared. English meaning and obtaining a clear translation. F. Challenges in Rule based machine translation from The machine translation based on the rule in- MarathitoEnglish[6] cludes generating a number of rules and handling In the field of machine translation, translation diver- their exceptions as well.Compared to dictionary-based genceisadifficultproblemtosolve. Forproperunder- methods that include word-to-word translations, rule- standing and identification of divergence issues in ma- based machine translation provides better translation chine translation, a thorough investigation is needed. quality.Given the number of laws to be used in the It’s a time-consuming process to use Rule Based Ma- scheme, flawless translations for each and every sen- chine Translation. The development of a large number tence will not be done. of laws necessitates a great deal of human effort.To ac- C. Part of Speech Tagger for Marathi Language.[3] complish these translations, a set of rules must be cre- The rule-based element of the speech tagger, which ated. uses a set of handwritten rules to apply words to all They explained the different forms of divergence potential tags.The system uses a morphological ana- patterns in the Marathi and English language pair in lyzer to identify the root word and compares it to the this article. In addition, these divergence trends must corpus to assign appropriate tags.Where an expression be identified and classified.This method’s consistency has more than one suffix, grammar rules are used to is determined by the scale of the grammatical informa- reduce ambiguity.Dictionaries are required in order to tion base. The scale and depth of the information base assign appropriate tags to each expression. growsasmoreexceptionalcasesaretreated. Whenthe The basic standard of useful instructions aids scale of the information base grows, so does the preci- in avoiding ambiguity.Because of the lack of cor- sion, up to a certain point.If the size exceeds a certain pus for statistical analysis, POS tagging is difficult threshold, the accuracy can suffer as a result of con- for the Marathi language. When there is a broad tradictory laws. As a result, a scheme relying on this range of meaningful rules to prevent disambiguation, method must strike a balance between precision and the rules-based POS tagger will achieve greater accu- the number of exceptions it can accommodate. racy.Additional meaningful rules need to be provided G. A survey on LSTM memristive neural network ar- to improve the performance of the system chitectures and application [7] D. Inflection Rules for English to Marathi The first generation of machine translation methods Translations[4] used dictionary-based methods to do word-to-word translations. Its flaws prompted the development of the When it comes to getting the right translation, inflec- secondgeneration,whichusedrule-basedandtransfer- tion is crucial.Inflection is the process of changing the based techniques. 2 ITM Web of Conferences 40, 03026 (2021) https://doi.org/10.1051/itmconf/20214003026 ICACC-2021 It has been discovered that rule-based machine According to the findings of the studies, bidirec- translation necessitates the development of a large tional Recurrent Neural Networks with encoding algo- number of rules as well as the treatment of their ex- rithms have a higher accuracy than the other Recur- ceptions. The system is feasible up to a point, but this rent Neural Network models. Single Recurrent Neural approach would have higher translation accuracy.The Network models, such as the basic RNN, only RNN emphasis of this paper is on Marathi to English trans- withencoding,onlybidirectionalRNN,andRNNwith lation based on rules. encoder and decoder, have lower accuracy than RNN TheMarathitoEnglishtranslationsystemwillaid models with encoder and decoder. in the automation of the process of translating docu- K. BLEU:aMethodforAutomaticEvaluationofMa- ments and scripts, as well as the reduction of manual chine Translation [11] translation work.In some sentence translations, there According to the report, BLEU will speed up the MT may be some disambiguation. The translation rules, RDcyclebyallowingresearcherstoquicklyzeroinon on the other hand, will be framed in such a way that useful modelling concepts.A new statistical investiga- genericsentencesorsentencesfromotherdomainswill tion of BLEU’s association with human judgement for be translated. translation into English from four distinct languages H. ABaseline Neural Machine Translation System for (Arabic, Chinese, French, and Spanish) representing Indian Languages [8] three different language families supports this view- The research provides a Neural Machine Translation point. systemforIndianlanguagesthatisbothsimpleandef- The strength of BLEU is that it has a strong cor- fective.It establishes a firm baseline for future research relation with human assessments since it averages out by demonstrating the viability of numerous language individual sentence judgement errors over a test cor- pairs. pusrather than attempting to discover the exact human Even if there aren’t enough resources, this judgement for each sentence: quantity leads to qual- method, which uses cutting-edge machine learning ity.Finally, because MT can be thought of as the gen- techniques, produces competitive outcomes for many erationofnaturallanguagefromatextualenvironment, languagepairs.Theyinvestigatethemultilingualteach- the BLEUmightbeusedtoassessMTtasks. ing scenario for the Indian language, employing a L. Sequence to Sequence Learning with Neural Net- variety of tried-and-true strategies to get competitive works[12] results.These tasks entail calculating embeddings for later classification or sentiment analysis tasks.Indian On a large-scale MT challenge, it was demonstrated languageswillbenefitfromtheabilitytotransferlearn- that a big deep LSTM with a limited vocabulary and ing from well-tested embeddings, such as English. essentially no assumptions about problem structure can outperform a traditional SMT-based system with I. On the Properties of Neural Machine Translation: anunlimitedvocabulary. Given the success of the sim- Encoder-Decoder Approaches [9] ple LSTM technique on MT, it should work well on a Anencoder and a decoder are frequently used in neu- variety of other sequence learning problems as long as ral machine translation models.From a variable-length there is adequate training data. input text, the encoder extracts a fixed-length repre- It was determined that finding a problem encod- sentation, from which the decoder creates an accurate ing with the highest number of short-term dependen- translation. cies is critical since it makes the learning problem con- Thispaperanalysesthepropertiesofneuralmachine siderably easier.They were unable to train a standard translation using two models: RNN Encoder–Decoder RNN on the non-reversed translation problem using and a newly designed gated recursive convolutional this method, but they believed that when the source neural network in this research.They show that neu- sentences were reversed, a standard RNN should be ral machine translation performs rather well on short easily trainable. phrases with few unknown terms, but that its perfor- mancerapidlydegradesasthesentencelengthgrows.It also demonstrates that the suggested gated recursive convolutional net-work automatically learns the gram- 2.1 Limitations of Existing System matical structure of a phrase • As Marathi language is less researched and progressed J. Neural Machine Translation using Recurrent Neu- on the grammar and translation front the existing sys- ral Network [10] tems fail to incorporate all the rules necessary for trans- The constructed system integrates Recurrent Neural lation. Network models to achieve the maximum accuracies • In the Machine Learning approach only a few transla- in 10 epochs.It was discovered that using many mod- tion models are implemented which are not providing els at the same time resulted in a better model with satisfactory and correct translation in the testing phases improved overall accuracy for the system. due to the lack of the data. 3 ITM Web of Conferences 40, 03026 (2021) https://doi.org/10.1051/itmconf/20214003026 ICACC-2021 3 METHODOLOGY 3.1 Proposed Work 3.1.1 Rule Based Machine Translation:- A system is proposed using rule-based machine transla- tion which follows a sequential procedure of taking input Marathisentencesfollowedbythemachinetranslationop- erations such as Tokenization, Tagging etc. Which on the successful implementation tries to predict the most accu- rate translation of the given sentence. 3.1.2 Neural Machine Translation:- Figure 1. System Design of Rule Based Machine Translation AnEncoderDecodermodelwillused,whichhelpstoana- lyzes corpus of data and by understanding pattern between themandforgenerate the translations. 3. POStagging: Part of speech of each token is deter- mined. 3.2 Techniques 4. Stemming: From the token root word is stemmed 3.2.1 Rule Based Machine Translation:- out for translation. Rule-based translation mostly depends on different built- 5. Translation: Translation of Marathi words is found in linguistic rules and millions of bilingual dictionaries using a bilingual dictionary for each pair. Translations are done on vast and revolu- 6. Morphological Generation: Grammatically cor- tionary linguistic rules. Automatic translation systems rect words are generated according to the suffixes. are focused on linguistic knowledge about the source and target languages, essentially derived from dictionaries and 7. Sentence Reordering: Morphologically Generated grammars (unilingual, bilingualor multilingual) covering words are reordered according to the written gram- the key semantic, morphological and syntactic regularities marrules. of each language. 8. Target Text: Translated English Sentence is ob- An RBMT method produces output sentences (in tained. some target language) for input sentences(in some source language) based on morphological, syntactic, and se- 3.3.2 Neural Machine Translation:- mantic study of both the source language and the target involved in a specific task of translation. 3.2.2 Neural Machine Translation:- In NMTmodel, a single system can be trained directly on source and target text. Unlike other system NMT works cohesively to maximize its performance and it also used vector representation for words and internal state. The NMT uses a bidirectional recurrent neural network,also calledanencoder,toprocessasourcesentenceintovectors for a second recurrent neural network, called the decoder, topredictwordsinthetargetlanguage. Thisprocess,while differingfromphrase-basedmodelsinmethod,provetobe Figure 2. System Design of Neural Machine Translation comparable in speed and accuracy 3.3 Design of the System 1. Input:Sourcesentencewhichistobetranslatedinto 3.3.1 Rule Based Machine Translation:- an integer encoded form. 1. Source Text: Takes Marathi sentences as an input 2. Embedding:In this layer we will map each integer text. encoded word to an vector which will act as input for next layer.There are various word embedding 2. Tokenization: Sentence is broken down into token techniques which map(embed) a word into a fixed words. length vector. 4
no reviews yet
Please Login to review.