288x Filetype PDF File size 1.59 MB Source: www.itm-conferences.org
ITM Web of Conferences 40, 03026 (2021) https://doi.org/10.1051/itmconf/20214003026
ICACC-2021
Various Approaches of Machine Translation for Marathi to English Language
1,3,∗ 2,∗∗ 3,∗∗∗ 3,∗∗∗∗
Nilesh Shirsath , Aniruddha Velankar , and Ranjeet Patil Dr.Shilpa Shinde
1Department of Computer Engineering, Ramrao Adik Institute of Technology, India
Abstract.MachineTranslation(MT)isagenerictermforcomputerisedsystemsthatgeneratetranslationsfrom
one natural language to another, with or without human intervention. Text may be used to examine knowledge,
andturningthatinformationintopictures helps people to communicate and acquire information.There seems to
be a lot of work conducted on translating English to Hindi, Tamil, Bangla and other languages. The important
parts of translation are to provide translated sentences with correct words and proper grammar. There has been
a comprehensive review of 10 primary publications used in research. Two separate approaches are proposed,
one uses rule based approach and other uses neural-machine translation approach to translate basic Marathi
phrases to English. While designed primarily for Marathi-English language pairs, the design can be applied to
other language pairs with a similar structure.
1 Introduction and Microsoft translators. Translated data is stored by
the owners of the platform and may later be reused.
MachineTranslation(MT)isacommonnameforcomput- • Notifying the Client about MT Use - Whether a transla-
erized systems which are responsible for generating, with tion company should notify customers about the use of
or without human assistance, translations from one natural MTfortheirprojectsis a point of debate in the industry.
language into another.It is part of Natural Language Pro- Many are in pursuit of informing the customer of the
cessing (NLP) where translation from the source language use of MT and others may not disclose the use of MT.
to the target language is conducted, preserving the same If you have questions about MT use, be sure to ask your
meaning of the phrase To help them make text and speech provider.
into another language, humans can use Machine Transla-
tion Systems. The program can run without any human in-
tervention. The conventional approach is achieved for the 1.1 Objective
MT to translate large quantities of knowledge involving The main Objective of the project is to implement a
terms that could not be interpreted. The MT performance tool which translates Marathi sentences to English with-
level can differ considerably , MT programs need ”train- out changing the meaning of the sentence using the Rule
ing to improve the quality of the outcome in the relevant Based and Neural Based System
domain and language pair.
AsMTcanbeclassifiedintodifferentcategories 1.2 Motivation
• Rule-based Systems: uses a combination of language
and grammar rules Asthesteadyprogressinthefieldoftechnology,theInter-
• Statistical Systems: learn to translate by analyzing large net’s growth has also increased at a tremendous rate. With
amounts of data globalization, the official language of the globe has been
English. In Marathi literature, there are approximately 71
• Neural Machine Translations (NMT): learn to translate million Marathi speaking individuals and various works.
through one large neural network (multiple processing However, Marathi language is comprehensible for a very
devices modeled on brain) smallgroupofpeoplesoasystemisproposed,whichtakes
Different MTproviders like Google translator, Yandex the input Marathi sentence and translates it into English
Translator etc. provide some translation tools with ethics which is an understandable language.
to customer like:
• Confidentiality – There is no confidentiality in the con- 2 RELATEDWORK
tent translated by online MT platforms such as Google Asurveyoftheresearch done for data summarization and
∗e-mail: Nileshshirsath2389@gmail.com the currently existing systems, give the following results.
∗∗e-mail: aniruddhav25@gmail.com
∗∗∗e-mail: patilranjeet3699@gmail.com A. Transmuter: An Approach to Rule-based English
∗∗∗∗e-mail: shilpa.shinde@rait.ac.in to Marathi Machine Translation.[1]
Creative Commons License 4.0
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Attribution
(http://creativecommons.org/licenses/by/4.0/).
ITM Web of Conferences 40, 03026 (2021) https://doi.org/10.1051/itmconf/20214003026
ICACC-2021
The basic method used to implement this frame- meaningofawordbyaddingthepropersuffixtoitac-
work is Rule Based Machine Translation.The model cordingtothestructureofthesentence.Theimplemen-
is built on a generalised approach based on the cat- tation of the Inflection for English to Marathi Transla-
egories/domains to which a term belongs.The proper tion is presented in this paper.
spelling of words The target language was created Theinflection of nouns, pronouns, verbs, and ad-
based on the parse tree’s basic traversals.The archi- jectives in a sentence is determined by the other words
tecture is partly applied in the context of a Machine and their qualities. The rules for inflecting the above
Translation method. Parts-of-Speech are presented in this document.
A generalized approach based on the cate- E. Marathi to English Neural Machine Translation
gories/domains to which a word belongs is the with Near perfect corpus and transformers [5]
model.In order to achieve better quality translations,
the number of rules formed is high for the target Totranslate Marathi sentences into English, the device
language generation.The consistency of the translation usestheNeuralMachineTranslation(NMT)method.It
of this method depends on The size of the knowledge focuses on transformer based architecture.All of the
base of grammar. If the size exceeds this threshold, transformers’ findings were equivalent to Google
due to contradictory laws, the consistency can de- Cloud API-V2. In MAE and RMSE, it also outper-
crease. formed Google API. This study suggests that this is
the case.
B. Script Translation System For Devanagari To En- From the results and examples it is observed that
glish. [2] the proposedtransformer-basedmodelwasabletoout-
perform Google Translation with limited but almost
The proposed scheme will translate more than one correct parallel corpus.The system produced satisfac-
Devanagari word to English using the rule-based ap- tory results by making use of word piece tokenizer but
proach. This is accomplished by understanding the in order to improve the performance sentence piece to-
different parts of speech in Marathi phrases. In a bilin- kenizer must be implemented and the results need to
gual dictionary, tokenizing and describing each word’s be compared.
English meaning and obtaining a clear translation. F. Challenges in Rule based machine translation from
The machine translation based on the rule in- MarathitoEnglish[6]
cludes generating a number of rules and handling In the field of machine translation, translation diver-
their exceptions as well.Compared to dictionary-based genceisadifficultproblemtosolve. Forproperunder-
methods that include word-to-word translations, rule- standing and identification of divergence issues in ma-
based machine translation provides better translation chine translation, a thorough investigation is needed.
quality.Given the number of laws to be used in the It’s a time-consuming process to use Rule Based Ma-
scheme, flawless translations for each and every sen- chine Translation. The development of a large number
tence will not be done. of laws necessitates a great deal of human effort.To ac-
C. Part of Speech Tagger for Marathi Language.[3] complish these translations, a set of rules must be cre-
The rule-based element of the speech tagger, which ated.
uses a set of handwritten rules to apply words to all They explained the different forms of divergence
potential tags.The system uses a morphological ana- patterns in the Marathi and English language pair in
lyzer to identify the root word and compares it to the this article. In addition, these divergence trends must
corpus to assign appropriate tags.Where an expression be identified and classified.This method’s consistency
has more than one suffix, grammar rules are used to is determined by the scale of the grammatical informa-
reduce ambiguity.Dictionaries are required in order to tion base. The scale and depth of the information base
assign appropriate tags to each expression. growsasmoreexceptionalcasesaretreated. Whenthe
The basic standard of useful instructions aids scale of the information base grows, so does the preci-
in avoiding ambiguity.Because of the lack of cor- sion, up to a certain point.If the size exceeds a certain
pus for statistical analysis, POS tagging is difficult threshold, the accuracy can suffer as a result of con-
for the Marathi language. When there is a broad tradictory laws. As a result, a scheme relying on this
range of meaningful rules to prevent disambiguation, method must strike a balance between precision and
the rules-based POS tagger will achieve greater accu- the number of exceptions it can accommodate.
racy.Additional meaningful rules need to be provided G. A survey on LSTM memristive neural network ar-
to improve the performance of the system chitectures and application [7]
D. Inflection Rules for English to Marathi The first generation of machine translation methods
Translations[4] used dictionary-based methods to do word-to-word
translations. Its flaws prompted the development of the
When it comes to getting the right translation, inflec- secondgeneration,whichusedrule-basedandtransfer-
tion is crucial.Inflection is the process of changing the based techniques.
2
ITM Web of Conferences 40, 03026 (2021) https://doi.org/10.1051/itmconf/20214003026
ICACC-2021
It has been discovered that rule-based machine According to the findings of the studies, bidirec-
translation necessitates the development of a large tional Recurrent Neural Networks with encoding algo-
number of rules as well as the treatment of their ex- rithms have a higher accuracy than the other Recur-
ceptions. The system is feasible up to a point, but this rent Neural Network models. Single Recurrent Neural
approach would have higher translation accuracy.The Network models, such as the basic RNN, only RNN
emphasis of this paper is on Marathi to English trans- withencoding,onlybidirectionalRNN,andRNNwith
lation based on rules. encoder and decoder, have lower accuracy than RNN
TheMarathitoEnglishtranslationsystemwillaid models with encoder and decoder.
in the automation of the process of translating docu- K. BLEU:aMethodforAutomaticEvaluationofMa-
ments and scripts, as well as the reduction of manual chine Translation [11]
translation work.In some sentence translations, there According to the report, BLEU will speed up the MT
may be some disambiguation. The translation rules, RDcyclebyallowingresearcherstoquicklyzeroinon
on the other hand, will be framed in such a way that useful modelling concepts.A new statistical investiga-
genericsentencesorsentencesfromotherdomainswill tion of BLEU’s association with human judgement for
be translated. translation into English from four distinct languages
H. ABaseline Neural Machine Translation System for (Arabic, Chinese, French, and Spanish) representing
Indian Languages [8] three different language families supports this view-
The research provides a Neural Machine Translation point.
systemforIndianlanguagesthatisbothsimpleandef- The strength of BLEU is that it has a strong cor-
fective.It establishes a firm baseline for future research relation with human assessments since it averages out
by demonstrating the viability of numerous language individual sentence judgement errors over a test cor-
pairs. pusrather than attempting to discover the exact human
Even if there aren’t enough resources, this judgement for each sentence: quantity leads to qual-
method, which uses cutting-edge machine learning ity.Finally, because MT can be thought of as the gen-
techniques, produces competitive outcomes for many erationofnaturallanguagefromatextualenvironment,
languagepairs.Theyinvestigatethemultilingualteach- the BLEUmightbeusedtoassessMTtasks.
ing scenario for the Indian language, employing a L. Sequence to Sequence Learning with Neural Net-
variety of tried-and-true strategies to get competitive works[12]
results.These tasks entail calculating embeddings for
later classification or sentiment analysis tasks.Indian On a large-scale MT challenge, it was demonstrated
languageswillbenefitfromtheabilitytotransferlearn- that a big deep LSTM with a limited vocabulary and
ing from well-tested embeddings, such as English. essentially no assumptions about problem structure
can outperform a traditional SMT-based system with
I. On the Properties of Neural Machine Translation: anunlimitedvocabulary. Given the success of the sim-
Encoder-Decoder Approaches [9] ple LSTM technique on MT, it should work well on a
Anencoder and a decoder are frequently used in neu- variety of other sequence learning problems as long as
ral machine translation models.From a variable-length there is adequate training data.
input text, the encoder extracts a fixed-length repre- It was determined that finding a problem encod-
sentation, from which the decoder creates an accurate ing with the highest number of short-term dependen-
translation. cies is critical since it makes the learning problem con-
Thispaperanalysesthepropertiesofneuralmachine siderably easier.They were unable to train a standard
translation using two models: RNN Encoder–Decoder RNN on the non-reversed translation problem using
and a newly designed gated recursive convolutional this method, but they believed that when the source
neural network in this research.They show that neu- sentences were reversed, a standard RNN should be
ral machine translation performs rather well on short easily trainable.
phrases with few unknown terms, but that its perfor-
mancerapidlydegradesasthesentencelengthgrows.It
also demonstrates that the suggested gated recursive
convolutional net-work automatically learns the gram- 2.1 Limitations of Existing System
matical structure of a phrase • As Marathi language is less researched and progressed
J. Neural Machine Translation using Recurrent Neu- on the grammar and translation front the existing sys-
ral Network [10] tems fail to incorporate all the rules necessary for trans-
The constructed system integrates Recurrent Neural lation.
Network models to achieve the maximum accuracies • In the Machine Learning approach only a few transla-
in 10 epochs.It was discovered that using many mod- tion models are implemented which are not providing
els at the same time resulted in a better model with satisfactory and correct translation in the testing phases
improved overall accuracy for the system. due to the lack of the data.
3
ITM Web of Conferences 40, 03026 (2021) https://doi.org/10.1051/itmconf/20214003026
ICACC-2021
3 METHODOLOGY
3.1 Proposed Work
3.1.1 Rule Based Machine Translation:-
A system is proposed using rule-based machine transla-
tion which follows a sequential procedure of taking input
Marathisentencesfollowedbythemachinetranslationop-
erations such as Tokenization, Tagging etc. Which on the
successful implementation tries to predict the most accu-
rate translation of the given sentence.
3.1.2 Neural Machine Translation:- Figure 1. System Design of Rule Based Machine Translation
AnEncoderDecodermodelwillused,whichhelpstoana-
lyzes corpus of data and by understanding pattern between
themandforgenerate the translations. 3. POStagging: Part of speech of each token is deter-
mined.
3.2 Techniques 4. Stemming: From the token root word is stemmed
3.2.1 Rule Based Machine Translation:- out for translation.
Rule-based translation mostly depends on different built- 5. Translation: Translation of Marathi words is found
in linguistic rules and millions of bilingual dictionaries using a bilingual dictionary
for each pair. Translations are done on vast and revolu- 6. Morphological Generation: Grammatically cor-
tionary linguistic rules. Automatic translation systems rect words are generated according to the suffixes.
are focused on linguistic knowledge about the source and
target languages, essentially derived from dictionaries and 7. Sentence Reordering: Morphologically Generated
grammars (unilingual, bilingualor multilingual) covering words are reordered according to the written gram-
the key semantic, morphological and syntactic regularities marrules.
of each language. 8. Target Text: Translated English Sentence is ob-
An RBMT method produces output sentences (in tained.
some target language) for input sentences(in some source
language) based on morphological, syntactic, and se- 3.3.2 Neural Machine Translation:-
mantic study of both the source language and the target
involved in a specific task of translation.
3.2.2 Neural Machine Translation:-
In NMTmodel, a single system can be trained directly on
source and target text. Unlike other system NMT works
cohesively to maximize its performance and it also used
vector representation for words and internal state. The
NMT uses a bidirectional recurrent neural network,also
calledanencoder,toprocessasourcesentenceintovectors
for a second recurrent neural network, called the decoder,
topredictwordsinthetargetlanguage. Thisprocess,while
differingfromphrase-basedmodelsinmethod,provetobe Figure 2. System Design of Neural Machine Translation
comparable in speed and accuracy
3.3 Design of the System 1. Input:Sourcesentencewhichistobetranslatedinto
3.3.1 Rule Based Machine Translation:- an integer encoded form.
1. Source Text: Takes Marathi sentences as an input 2. Embedding:In this layer we will map each integer
text. encoded word to an vector which will act as input
for next layer.There are various word embedding
2. Tokenization: Sentence is broken down into token techniques which map(embed) a word into a fixed
words. length vector.
4
no reviews yet
Please Login to review.