Language Pdf 99125 | Itmconf Icacc2021 03026

Partial capture of text on file.
             ITM Web of Conferences 40, 03026 (2021)                                                                                                                                      https://doi.org/10.1051/itmconf/20214003026
            ICACC-2021
                           Various Approaches of Machine Translation for Marathi to English Language
                                                       1,3,∗                                         2,∗∗                                    3,∗∗∗                                   3,∗∗∗∗
                           Nilesh Shirsath                    , Aniruddha Velankar                         , and Ranjeet Patil                        Dr.Shilpa Shinde
                           1Department of Computer Engineering, Ramrao Adik Institute of Technology, India
                                              Abstract.MachineTranslation(MT)isagenerictermforcomputerisedsystemsthatgeneratetranslationsfrom
                                              one natural language to another, with or without human intervention. Text may be used to examine knowledge,
                                              andturningthatinformationintopictures helps people to communicate and acquire information.There seems to
                                              be a lot of work conducted on translating English to Hindi, Tamil, Bangla and other languages. The important
                                              parts of translation are to provide translated sentences with correct words and proper grammar. There has been
                                              a comprehensive review of 10 primary publications used in research. Two separate approaches are proposed,
                                              one uses rule based approach and other uses neural-machine translation approach to translate basic Marathi
                                              phrases to English. While designed primarily for Marathi-English language pairs, the design can be applied to
                                              other language pairs with a similar structure.
                           1 Introduction                                                                                                          and Microsoft translators. Translated data is stored by
                                                                                                                                                   the owners of the platform and may later be reused.
                           MachineTranslation(MT)isacommonnameforcomput-                                                                       • Notifying the Client about MT Use - Whether a transla-
                           erized systems which are responsible for generating, with                                                               tion company should notify customers about the use of
                           or without human assistance, translations from one natural                                                              MTfortheirprojectsis a point of debate in the industry.
                           language into another.It is part of Natural Language Pro-                                                               Many are in pursuit of informing the customer of the
                           cessing (NLP) where translation from the source language                                                                use of MT and others may not disclose the use of MT.
                           to the target language is conducted, preserving the same                                                                If you have questions about MT use, be sure to ask your
                           meaning of the phrase To help them make text and speech                                                                 provider.
                           into another language, humans can use Machine Transla-
                           tion Systems. The program can run without any human in-
                           tervention. The conventional approach is achieved for the                                                           1.1 Objective
                           MT to translate large quantities of knowledge involving                                                             The main Objective of the project is to implement a
                           terms that could not be interpreted. The MT performance                                                             tool which translates Marathi sentences to English with-
                           level can diﬀer considerably , MT programs need ”train-                                                             out changing the meaning of the sentence using the Rule
                           ing to improve the quality of the outcome in the relevant                                                           Based and Neural Based System
                           domain and language pair.
                                  AsMTcanbeclassiﬁedintodiﬀerentcategories                                                                     1.2 Motivation
                           • Rule-based Systems: uses a combination of language
                               and grammar rules                                                                                               Asthesteadyprogressintheﬁeldoftechnology,theInter-
                           • Statistical Systems: learn to translate by analyzing large                                                        net’s growth has also increased at a tremendous rate. With
                               amounts of data                                                                                                 globalization, the oﬃcial language of the globe has been
                                                                                                                                               English. In Marathi literature, there are approximately 71
                           • Neural Machine Translations (NMT): learn to translate                                                             million Marathi speaking individuals and various works.
                               through one large neural network (multiple processing                                                           However, Marathi language is comprehensible for a very
                               devices modeled on brain)                                                                                       smallgroupofpeoplesoasystemisproposed,whichtakes
                                  Diﬀerent MTproviders like Google translator, Yandex                                                          the input Marathi sentence and translates it into English
                           Translator etc. provide some translation tools with ethics                                                          which is an understandable language.
                           to customer like:
                           • Conﬁdentiality – There is no conﬁdentiality in the con-                                                           2 RELATEDWORK
                               tent translated by online MT platforms such as Google                                                           Asurveyoftheresearch done for data summarization and
                                ∗e-mail: Nileshshirsath2389@gmail.com                                                                          the currently existing systems, give the following results.
                              ∗∗e-mail: aniruddhav25@gmail.com
                             ∗∗∗e-mail: patilranjeet3699@gmail.com                                                                             A. Transmuter: An Approach to Rule-based English
                           ∗∗∗∗e-mail: shilpa.shinde@rait.ac.in                                                                                      to Marathi Machine Translation.[1]
                                                                                                                                                                                                  Creative Commons                                License 4.0
                  © The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the                                                                                                   Attribution
                      
                 (http://creativecommons.org/licenses/by/4.0/). 
       ITM Web of Conferences 40, 03026 (2021)                                                     https://doi.org/10.1051/itmconf/20214003026
       ICACC-2021
                  The basic method used to implement this frame-                meaningofawordbyaddingthepropersuﬃxtoitac-
                  work is Rule Based Machine Translation.The model              cordingtothestructureofthesentence.Theimplemen-
                  is built on a generalised approach based on the cat-          tation of the Inﬂection for English to Marathi Transla-
                  egories/domains to which a term belongs.The proper            tion is presented in this paper.
                  spelling of words The target language was created                 Theinﬂection of nouns, pronouns, verbs, and ad-
                  based on the parse tree’s basic traversals.The archi-         jectives in a sentence is determined by the other words
                  tecture is partly applied in the context of a Machine         and their qualities. The rules for inﬂecting the above
                  Translation method.                                           Parts-of-Speech are presented in this document.
                       A generalized approach based on the cate-            E. Marathi to English Neural Machine Translation
                  gories/domains to which a word belongs is the                 with Near perfect corpus and transformers [5]
                  model.In order to achieve better quality translations,
                  the number of rules formed is high for the target             Totranslate Marathi sentences into English, the device
                  language generation.The consistency of the translation        usestheNeuralMachineTranslation(NMT)method.It
                  of this method depends on The size of the knowledge           focuses on transformer based architecture.All of the
                  base of grammar. If the size exceeds this threshold,          transformers’ ﬁndings were equivalent to Google
                  due to contradictory laws, the consistency can de-            Cloud API-V2. In MAE and RMSE, it also outper-
                  crease.                                                       formed Google API. This study suggests that this is
                                                                                the case.
               B. Script Translation System For Devanagari To En-                   From the results and examples it is observed that
                  glish. [2]                                                    the proposedtransformer-basedmodelwasabletoout-
                                                                                perform Google Translation with limited but almost
                  The proposed scheme will translate more than one              correct parallel corpus.The system produced satisfac-
                  Devanagari word to English using the rule-based ap-           tory results by making use of word piece tokenizer but
                  proach. This is accomplished by understanding the             in order to improve the performance sentence piece to-
                  diﬀerent parts of speech in Marathi phrases. In a bilin-      kenizer must be implemented and the results need to
                  gual dictionary, tokenizing and describing each word’s        be compared.
                  English meaning and obtaining a clear translation.         F. Challenges in Rule based machine translation from
                       The machine translation based on the rule in-            MarathitoEnglish[6]
                  cludes generating a number of rules and handling              In the ﬁeld of machine translation, translation diver-
                  their exceptions as well.Compared to dictionary-based         genceisadiﬃcultproblemtosolve. Forproperunder-
                  methods that include word-to-word translations, rule-         standing and identiﬁcation of divergence issues in ma-
                  based machine translation provides better translation         chine translation, a thorough investigation is needed.
                  quality.Given the number of laws to be used in the            It’s a time-consuming process to use Rule Based Ma-
                  scheme, ﬂawless translations for each and every sen-          chine Translation. The development of a large number
                  tence will not be done.                                       of laws necessitates a great deal of human eﬀort.To ac-
               C. Part of Speech Tagger for Marathi Language.[3]                complish these translations, a set of rules must be cre-
                  The rule-based element of the speech tagger, which            ated.
                  uses a set of handwritten rules to apply words to all             They explained the diﬀerent forms of divergence
                  potential tags.The system uses a morphological ana-           patterns in the Marathi and English language pair in
                  lyzer to identify the root word and compares it to the        this article. In addition, these divergence trends must
                  corpus to assign appropriate tags.Where an expression         be identiﬁed and classiﬁed.This method’s consistency
                  has more than one suﬃx, grammar rules are used to             is determined by the scale of the grammatical informa-
                  reduce ambiguity.Dictionaries are required in order to        tion base. The scale and depth of the information base
                  assign appropriate tags to each expression.                   growsasmoreexceptionalcasesaretreated. Whenthe
                       The basic standard of useful instructions aids           scale of the information base grows, so does the preci-
                  in avoiding ambiguity.Because of the lack of cor-             sion, up to a certain point.If the size exceeds a certain
                  pus for statistical analysis, POS tagging is diﬃcult          threshold, the accuracy can suﬀer as a result of con-
                  for the Marathi language.    When there is a broad            tradictory laws. As a result, a scheme relying on this
                  range of meaningful rules to prevent disambiguation,          method must strike a balance between precision and
                  the rules-based POS tagger will achieve greater accu-         the number of exceptions it can accommodate.
                  racy.Additional meaningful rules need to be provided      G. A survey on LSTM memristive neural network ar-
                  to improve the performance of the system                      chitectures and application [7]
              D.     Inﬂection   Rules    for  English    to  Marathi           The ﬁrst generation of machine translation methods
                  Translations[4]                                               used dictionary-based methods to do word-to-word
                                                                                translations. Its ﬂaws prompted the development of the
                  When it comes to getting the right translation, inﬂec-        secondgeneration,whichusedrule-basedandtransfer-
                  tion is crucial.Inﬂection is the process of changing the      based techniques.
                                                                         2
       ITM Web of Conferences 40, 03026 (2021)                                                 https://doi.org/10.1051/itmconf/20214003026
       ICACC-2021
                      It has been discovered that rule-based machine             According to the ﬁndings of the studies, bidirec-
                 translation necessitates the development of a large        tional Recurrent Neural Networks with encoding algo-
                 number of rules as well as the treatment of their ex-      rithms have a higher accuracy than the other Recur-
                 ceptions. The system is feasible up to a point, but this   rent Neural Network models. Single Recurrent Neural
                 approach would have higher translation accuracy.The        Network models, such as the basic RNN, only RNN
                 emphasis of this paper is on Marathi to English trans-     withencoding,onlybidirectionalRNN,andRNNwith
                 lation based on rules.                                     encoder and decoder, have lower accuracy than RNN
                      TheMarathitoEnglishtranslationsystemwillaid           models with encoder and decoder.
                 in the automation of the process of translating docu-   K. BLEU:aMethodforAutomaticEvaluationofMa-
                 ments and scripts, as well as the reduction of manual      chine Translation [11]
                 translation work.In some sentence translations, there      According to the report, BLEU will speed up the MT
                 may be some disambiguation. The translation rules,         RDcyclebyallowingresearcherstoquicklyzeroinon
                 on the other hand, will be framed in such a way that       useful modelling concepts.A new statistical investiga-
                 genericsentencesorsentencesfromotherdomainswill            tion of BLEU’s association with human judgement for
                 be translated.                                             translation into English from four distinct languages
              H. ABaseline Neural Machine Translation System for            (Arabic, Chinese, French, and Spanish) representing
                 Indian Languages [8]                                       three diﬀerent language families supports this view-
                 The research provides a Neural Machine Translation         point.
                 systemforIndianlanguagesthatisbothsimpleandef-                  The strength of BLEU is that it has a strong cor-
                 fective.It establishes a ﬁrm baseline for future research  relation with human assessments since it averages out
                 by demonstrating the viability of numerous language        individual sentence judgement errors over a test cor-
                 pairs.                                                     pusrather than attempting to discover the exact human
                      Even if there aren’t enough resources, this           judgement for each sentence: quantity leads to qual-
                 method, which uses cutting-edge machine learning           ity.Finally, because MT can be thought of as the gen-
                 techniques, produces competitive outcomes for many         erationofnaturallanguagefromatextualenvironment,
                 languagepairs.Theyinvestigatethemultilingualteach-         the BLEUmightbeusedtoassessMTtasks.
                 ing scenario for the Indian language, employing a       L. Sequence to Sequence Learning with Neural Net-
                 variety of tried-and-true strategies to get competitive    works[12]
                 results.These tasks entail calculating embeddings for
                 later classiﬁcation or sentiment analysis tasks.Indian     On a large-scale MT challenge, it was demonstrated
                 languageswillbeneﬁtfromtheabilitytotransferlearn-          that a big deep LSTM with a limited vocabulary and
                 ing from well-tested embeddings, such as English.          essentially no assumptions about problem structure
                                                                            can outperform a traditional SMT-based system with
               I. On the Properties of Neural Machine Translation:          anunlimitedvocabulary. Given the success of the sim-
                 Encoder-Decoder Approaches [9]                             ple LSTM technique on MT, it should work well on a
                 Anencoder and a decoder are frequently used in neu-        variety of other sequence learning problems as long as
                 ral machine translation models.From a variable-length      there is adequate training data.
                 input text, the encoder extracts a ﬁxed-length repre-           It was determined that ﬁnding a problem encod-
                 sentation, from which the decoder creates an accurate      ing with the highest number of short-term dependen-
                 translation.                                               cies is critical since it makes the learning problem con-
                    Thispaperanalysesthepropertiesofneuralmachine           siderably easier.They were unable to train a standard
                 translation using two models: RNN Encoder–Decoder          RNN on the non-reversed translation problem using
                 and a newly designed gated recursive convolutional         this method, but they believed that when the source
                 neural network in this research.They show that neu-        sentences were reversed, a standard RNN should be
                 ral machine translation performs rather well on short      easily trainable.
                 phrases with few unknown terms, but that its perfor-
                 mancerapidlydegradesasthesentencelengthgrows.It
                 also demonstrates that the suggested gated recursive
                 convolutional net-work automatically learns the gram-   2.1 Limitations of Existing System
                 matical structure of a phrase                           • As Marathi language is less researched and progressed
               J. Neural Machine Translation using Recurrent Neu-          on the grammar and translation front the existing sys-
                 ral Network [10]                                          tems fail to incorporate all the rules necessary for trans-
                 The constructed system integrates Recurrent Neural        lation.
                 Network models to achieve the maximum accuracies        • In the Machine Learning approach only a few transla-
                 in 10 epochs.It was discovered that using many mod-       tion models are implemented which are not providing
                 els at the same time resulted in a better model with      satisfactory and correct translation in the testing phases
                 improved overall accuracy for the system.                 due to the lack of the data.
                                                                      3
       ITM Web of Conferences 40, 03026 (2021)                                                           https://doi.org/10.1051/itmconf/20214003026
       ICACC-2021
               3 METHODOLOGY
               3.1 Proposed Work
               3.1.1 Rule Based Machine Translation:-
               A system is proposed using rule-based machine transla-
               tion which follows a sequential procedure of taking input
               Marathisentencesfollowedbythemachinetranslationop-
               erations such as Tokenization, Tagging etc. Which on the
               successful implementation tries to predict the most accu-
               rate translation of the given sentence.
               3.1.2 Neural Machine Translation:-                                Figure 1. System Design of Rule Based Machine Translation
               AnEncoderDecodermodelwillused,whichhelpstoana-
               lyzes corpus of data and by understanding pattern between
               themandforgenerate the translations.                                 3. POStagging: Part of speech of each token is deter-
                                                                                       mined.
               3.2 Techniques                                                       4. Stemming: From the token root word is stemmed
               3.2.1 Rule Based Machine Translation:-                                  out for translation.
               Rule-based translation mostly depends on diﬀerent built-             5. Translation: Translation of Marathi words is found
               in linguistic rules and millions of bilingual dictionaries              using a bilingual dictionary
               for each pair. Translations are done on vast and revolu-             6. Morphological Generation: Grammatically cor-
               tionary linguistic rules.   Automatic translation systems               rect words are generated according to the suﬃxes.
               are focused on linguistic knowledge about the source and
               target languages, essentially derived from dictionaries and          7. Sentence Reordering: Morphologically Generated
               grammars (unilingual, bilingualor multilingual) covering                words are reordered according to the written gram-
               the key semantic, morphological and syntactic regularities              marrules.
               of each language.                                                    8. Target Text: Translated English Sentence is ob-
                   An RBMT method produces output sentences (in                        tained.
               some target language) for input sentences(in some source
               language) based on morphological, syntactic, and se-              3.3.2 Neural Machine Translation:-
               mantic study of both the source language and the target
               involved in a speciﬁc task of translation.
               3.2.2 Neural Machine Translation:-
               In NMTmodel, a single system can be trained directly on
               source and target text. Unlike other system NMT works
               cohesively to maximize its performance and it also used
               vector representation for words and internal state. The
               NMT uses a bidirectional recurrent neural network,also
               calledanencoder,toprocessasourcesentenceintovectors
               for a second recurrent neural network, called the decoder,
               topredictwordsinthetargetlanguage. Thisprocess,while
               diﬀeringfromphrase-basedmodelsinmethod,provetobe                  Figure 2. System Design of Neural Machine Translation
               comparable in speed and accuracy
               3.3 Design of the System                                             1. Input:Sourcesentencewhichistobetranslatedinto
               3.3.1 Rule Based Machine Translation:-                                  an integer encoded form.
                   1. Source Text: Takes Marathi sentences as an input              2. Embedding:In this layer we will map each integer
                      text.                                                            encoded word to an vector which will act as input
                                                                                       for next layer.There are various word embedding
                   2. Tokenization: Sentence is broken down into token                 techniques which map(embed) a word into a ﬁxed
                      words.                                                           length vector.
                                                                             4
The words contained in this file might help you see if this file matches what you are looking for:

...Itm web of conferences https doi org itmconf icacc various approaches machine translation for marathi to english language nilesh shirsath aniruddha velankar and ranjeet patil dr shilpa shinde department computer engineering ramrao adik institute technology india abstract machinetranslation mt isagenerictermforcomputerisedsystemsthatgeneratetranslationsfrom one natural another with or without human intervention text may be used examine knowledge andturningthatinformationintopictures helps people communicate acquire information there seems a lot work conducted on translating hindi tamil bangla other languages the important parts are provide translated sentences correct words proper grammar has been comprehensive review primary publications in research two separate proposed uses rule based approach neural translate basic phrases while designed primarily pairs design can applied similar structure introduction microsoft translators data is stored by owners platform later reused isacommonnam...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area