280x Filetype PDF File size 0.35 MB Source: www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616
Hindi-English Neural Machine Translation Using
Attention Model
Charu Verma, Aarti Singh, Swagata Seal, Varsha Singh, Iti Mathur
Abstract: Translation is the technique in which system translate text from source natural language to target natural language, so that the original
message is retained in target language. Deep Neural Networks are capable models that achieved malicious achievement on challenging learning tasks
such as visual object recognition and speech recognition and work well whenever large amount of training sets are available. This paper represent
Hindi to English machine translation at Hindi-English parallel corpus in which supervised learning algorithm applied with attention model and in which
one Recurrent Neural Network map the input sequence to a vector in fixed dimensionality, and another Recurrent Neural Network decode the target
sequence from the vector and show how neural machine translation is better way to translate the data from source language to target language.
Index Terms: Machine Translation, Deep Learning, Neural Machine Translation, LSTM
—————————— ——————————
1. INTRODUCTION 2 PROCEDURE FOR PAPER SUBMISSION
To solve a particular problem, people need to discuss or share Loung et al. [1] showed how attention based techniques are
their ideas, but language understanding is a big gap. Machine improving the quality of Neural Machine Translation (NMT)
translation provides access to information written in an models. Cho at al. [2] elaborated the different properties of
unknown language, to resolve lower level barriers in encoder-decoder model used in NMT systems. Wu et al. [3]
communication, to increase productivity. Translation can also explained the working of the Google’s NMT system they
be performed by humans, who provide perfect translations so showed show a translation process is done from end to end.
why there is need of machine translation when it provides Sennrich et al. [4] showed how NMT system performance
inferior translation quality of the text with ambiguous words degrades when out of vocabulary words are found in the text.
and sentences? Human translation is very expensive and hard They also showed the approach of dealing with this kind of
to find (Require Knowledge of both source and target problem. Loung et al. [5] addressed the problem of deal with
languages) when machine translation is Less Expensive as rare words in text while performing experiments with NMT
compared to Humans and can be found at a click of a button system. Tu et al. [6] discussed the issue in model coverage of
by every device like laptop, mobile, tabs. Currently systems an NMT system. Sennrich et al. [7] showed how the system
are able to attain input a text in one language and give output can be improved by using more monolingual data. Further,
as text in other language, there are more than hundred Sennrich et al. [8] showed the working of their NMT system.
machine translation technology providers for example ‘Google Joshi et al [9] developed a mechanism to write in Hindi using
Translate’ is powerful translation service developed by Google English. They used statistical machine learning to predict a
to support more than hundred languages text and documents word when some of the initial characters are typed. Using this
conversion, ‘Yandex Translate’ is a web service provided by Joshi et al. [10] also developed an Example Based Machine
Yandex, used to translate ninety-five languages words, whole Translation System. Joshi et al. [11] also evaluated the system
texts, phrases and entire text of website only by getting its developed. They also compared the performance of this
URL, ‘IBM-Watson’ translator translate documents from one system with other popularly available MT engines. Gupta et al.
language to another while preserving file formatting and file [12] developed a rule-based stemmer for Urdu. They
types included: MS office, PDF, TXT, HTML, JSON, XML and developed several rules to implement this stemmer. They
Open office. These technologies use deep learning to improve further used this stemmer in evaluation of some English-Urdu
their accuracy and speed and provide good interface so user MT systems [13]. Singh et al. [14] developed a POS tagger for
can easily use. In this paper, we have shown the experiments Marathi using Statistical Machine Learning. Bhalla et al. [15]
that we have done for training and and evaluation of our developed a procedure of transliteration of name entities from
Neural Machine Translation (NMT). The rest of the paper is English to Punjabi. Joshi et al [16] evaluated several open
structured as follows: Section 2 reviews the literature. Section domain MT engines. Gupta et al. [17] did the same for
explains our proposed model. Section 4 shows the evaluation English-Urdu MT engines. Singh et al. [18] developed a POS
performed on our system and section 5 concludes the papers. tagger for Marathi using supervised learning. Joshi et al. [19]
———————————————— further developed a technique to using machine learning in
Charu Verma is member technical staff Next Generation evaluating MT engines. Tyagi et al. [20] [21] developed an
Technologies Research Foundation, India. E-mail: approach of translating complex English sentences by first
vermacharu284@gmail.com simplifying them and then translating into Hindi. Yogi et al. [22]
Aarti Singh is member technical staff Next Generation Technologies developed an approach to identify candidate translation which
Research Foundation, India. E-mail: say2aru19@gmail.com are good for post editing. Gupta et al. [23] further extended
Swagata Seal is member technical staff Next Generation their stemmer by adding derivational rules to the inflectional
Technologies Research Foundation, India. E-mail: stemmer. Asopa et al. [24] developed mechanism for chunking
swagata.sita@gmail.com Hindi sentences using a rule-based approach. Gupta et al. [25]
Varsha Singh is member technical staff Next Generation developed a rule based lemmatizer for Urdu which was an
Technologies Research Foundation, India. E-mail:
varshasingh773@gmail.com extension to their stemmer. Kumar et al. [26] developed
Iti Mathur is an Associate Professor in Department of Computer several machine learning based classifiers for identifying
Science, Banasthali Vidyapith, India. E-mail: different senses to a word in Hindi. Joshi et al. [27] developed
mathur_iti@rediffmail.com
2710
IJSTR©2019
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616
a mechanism to estimate the quality of English-Hindi MT model. Singh and Joshi [54] developed a rule based approach
engines. Chopra et al. [28] [29] developed a name entity for identifying anaphora in Hindi Discourses. Sinha et al. [55]
recognition and tagging tool for Hindi using several machine developed a sentiment analyzer for Facebook post using the
learning approaches. Gupta et al. [30] developed a POS methods developed by Gupta et al. Sharma et al. [56] [57]
tagger for Urdu using machine learning approach. Mathur et used some of the markov model-based approaches used by
al. [31] developed an ontology matching evaluation using tool Singh et al. to develop their association classification model.
which used the MT engine developed by Joshi et al. Chopra et Similar approaches we used by Goyal et al. [58] [59] for their
al. [32] developed a mechanism for rewriting English sentence models.
and then translating them into Hindi. This significantly
improved the performance of their MT engine. Joshi et al. [33] 3 PROPOSED MODEL
investigated some approaches to classifying documents and
further suggested an approach for effective classification of 3.1 Supervised Machine Learning
text documents. Singh et al. [34] developed an approach to Supervised learning is function of machine leaning in which
automatically generate transfer grammar rules. This approach data coming into pairs as input and output. In supervised
significantly improved the development process of their learning input could be anything like sensor meas-urements,
transfer-based MT engine. Singh et al. [35] developed an pictures, email or messages and output may be label, any
approach for text processing of Hindi documents using deep real numbers, in some cases vectors or in other structure
neural networks. They further developed this approach to mine (example: negative or positive, dog or cat, spam or not
textual data from web documents [36]. Singh et al. [37] spam, right or wrong).
developed a translation memory tool which worked as a sub-
system in their transfer-based MT system. This further {(xi, yi)}i =1 to N
improved the accuracy of their system. Gupta et al. [38] further
showed how fuzzy logic can be used in developing NLP In this given equation, element xi among N is a feature vector
applications. Gupta et al. [39] used several NLP tools in (is a vector in which each dimension j = 1, . . . , D contains a
preprocessing the tweets that they extracted from web. They value that describes the example in some way that value
found that this approach improves the accuracy of their called feature and denote x(j).) and yi is label of that xi input.
machine learning model which classifies the tweets. Gupta et For example x(1) is an input which represents a person, then
al. [40] developed an approach which helped in identification the first feature x(1) contain gender, the second feature x(2)
and classification of multiword expressions from Urdu contain weight in kg, x(3) contain height in cm so on, in which
documents. Nathani et al. [41] developed a rule based x(1) input’s x(1), x(2), x(3) called feature vector. This paper
inflectional stemmer for Sindhi which was written in represent Hindi to English machine translation on Hindi-
Devanagari script. Asopa et al. [42] developed a shallow English parallel corpus in which supervised learning algorithm
parser for Hindi using conditional random fields. Gupta et al. applied on attention model in which one Recurrent Neural
[43] showed the use of machine learning approached in Network map the input sequence to a vector in fixed dimen-
developing NLP applications. Gupta et al. [44] used fuzzy sionality, and another Recurrent Neural Network decode the
operations in analyzing sentiments of tweets on several topics. target sequence from the vector.
This approach showed very promising results over traditional
approaches. Sharma and Joshi [45] developed a rule based 3.2 Preprocessing Step
word sense disambiguation approach for Hindi. It gave an Preprocessing of data is necessary before training the
accuracy of 73%. Katyayan and Joshi [46] studied various network. Generally real word data are incomplete, noisy and
approaches of correct identification of sarcastic phrases in inconsistence to overcome data to these problem data need to
English documents. Gupta and Joshi [47] showed show tweets preprocess. first clean the text by removing spaces and other
can be classified using NLP techniques. They showed how unnecessary symbol of the sentences. Network not
negative sentences can be handled using NLP approaches. understands the text format so, conversion of text into vector
Shree et al. [48] showed how there is difference between Hindi is necessary. In sequence to sequence translation every word
and English languages what problems the current state of the in a sentence should need contain a unique identity, represent
art MT system face while translating text. Ahmed et al. [49] each word in a language as a one-hot vector or giant vector
showed how MT system can be developed by using an contain zero except one in the whole vector. In the given
intermediate language which is related to both the languages. example, the sentence contains several words which shows a
They developed a Arabic-Hindi MT system using Urdu as the vector every sentence in the corpus contain SOS, which
intermediate language. They further performed the same study represents starting of the sentence and EOS represents End
using English and found that if we have a large sized corpus of the sentence. An example of this is shown in figure 1.
then English which in unrelated to Arabic and Hindi, can be
used for developed a MT system [50]. Seal and Joshi [51]
developed a rule based inflectional stemmer for Assamese.
This system showed very good results. Singh and Joshi [52]
showed the developed of POS taggers for Hindi using different
markov models. They concluded that hidden markov model-
based tagger produced the best results among several markov
based POS taggers. Pandey et al. [53] showed how NLP
approached can help in develop a better ranking model for
web documents. They used particle swam optimization and
NLP approaches in improving the performance of their ranking
2711
IJSTR©2019
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616
TABLE 1
Evaluation Results of BLEU at Document Level
NMT with
Baseline Attention
NMT Model
Doc1 0.375784 0.552926
Doc2 0.338189 0.507185
Doc3 0.363287 0.506252
Doc4 0.358607 0.533271
00 Doc5 0.361307 0.515314
03 04 05 07 08 01
0
0 Table 2 shows the results of evaluation done by human
0 0 06 annotators. Table 3 shows the correlation between these
02 studies.
TABLE 2
Where = < 0 0 0 0 1 0 0…………….> Results of Human Evaluation at Document Level
NMT with
Fig. 1: Assignment of Weights Baseline Attention
NMT Model
3.3 Training of NMT using Attention Model Doc1 0.501443 0.552926
Doc2 0.394018 0.507185
We developed Neural Machine Translation by Jointly learning Doc3 0.333809 0.506252
to Align and Translate. Here, attention, reads as a neural Doc4 0.428654 0.533271
extension of Encoder-Decoder model. Encoder – Decoder Doc5 0.456911 0.515314
model contain several limitations which is resolved by
Attention. neural network work on vectors, so it compress all TABLE 3
important information of source sentence in encoder- decoder Pearson Correlation Between Human and BLEU Evaluation
approach this make neural network difficult to work with long Metrics for all Engines
sentences, for mainly those sentences which are longer Engine Correlation Score
compare to training corpus sentence. This is shown in figure Human-BLEU
1In decoding phrase at every time step t , first take the as Baseline 0.467728
input the hidden state h at the top layer of the stacking LSTM. NMT
t NMT with
To capture relevant source side information for find the current Attention 1.0
target word y content vector c is used and share the Model
t t
subsequence steps.. when model know how the context
vector ct is derived, then given the target hidden ht and the 5 CONCULSION
source side context vector ct, a simple concatenation layer In this paper, we showed the development of English-Hindi MT
which combine the information from both vectors and provide using Neural Approach. We tested the developed Engines
a attention hidden state as follows: through 500 sentences. For this, we did both human and
automatic evaluation. In the automatic evaluation, we found
h = tanh(W [c h]) that BLEU was producing better results for Attention Based
t c t; t
Model which was an improvement over the Baseline Model.
then we feed attention vector ht with the help of softmax layer
which provide productivity as: REFERENCES
[1] Luong, M.T., Pham, H. and Manning, C.D., 2015. Effective
P(y y , x) = softmax(W ) approaches to attention-based neural machine translation.
t
no reviews yet
Please Login to review.