Language Pdf 99108 | Hindi English Neural Machine Translation Using Attention Model

Partial capture of text on file.
             INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019                                               ISSN 2277-8616 
            
                Hindi-English Neural Machine Translation Using 
                                                                  Attention Model 
                                                                                            
                                                Charu Verma, Aarti Singh, Swagata Seal, Varsha Singh, Iti Mathur 
            
           Abstract: Translation is the technique in which system translate text from source natural language to target natural language, so that the original 
           message is retained in target language. Deep Neural Networks are capable models that achieved malicious achievement on challenging learning tasks 
           such as visual object recognition and speech recognition and work well whenever large amount of training sets are available.  This paper represent 
           Hindi to English machine translation at Hindi-English parallel corpus in which supervised learning algorithm applied with attention model and in which 
           one Recurrent Neural Network map the input sequence to a vector in fixed dimensionality, and another Recurrent Neural Network decode the target 
           sequence from the vector and show how neural machine translation is better way to translate the data from source language to target language. 
            
           Index Terms: Machine Translation, Deep Learning, Neural Machine Translation, LSTM   
                                                                ——————————                    —————————— 
                                                                                             
            
           1.  INTRODUCTION                                                                  2 PROCEDURE FOR PAPER SUBMISSION 
           To solve a particular problem, people need to discuss or share                    Loung et al. [1] showed how attention based techniques are 
           their ideas, but language understanding is a big gap. Machine                     improving  the  quality  of  Neural  Machine  Translation  (NMT) 
           translation  provides  access  to  information  written  in  an                   models. Cho at al. [2] elaborated the different properties of 
           unknown  language,  to  resolve  lower  level  barriers  in                       encoder-decoder model used in NMT systems. Wu et al. [3] 
           communication, to increase productivity. Translation can also                     explained  the  working  of  the  Google’s  NMT  system  they 
           be performed by humans, who provide perfect translations so                       showed show a translation process is done from end to end. 
           why  there  is  need  of  machine  translation  when  it  provides                Sennrich  et  al.  [4]  showed  how  NMT  system  performance 
           inferior  translation  quality  of  the  text  with  ambiguous  words             degrades when out of vocabulary words are found in the text. 
           and sentences? Human translation is very expensive and hard                       They also showed the approach of dealing with this kind of 
           to  find  (Require  Knowledge  of  both  source  and  target                      problem. Loung et al. [5] addressed the problem of deal with 
           languages) when machine translation is Less Expensive as                          rare  words  in  text  while  performing  experiments  with  NMT 
           compared to Humans and can be found at a click of a button                        system. Tu et al. [6] discussed the issue in model coverage of 
           by every device like laptop, mobile, tabs. Currently systems                      an NMT system. Sennrich et al. [7] showed how the system 
           are  able to attain input a text in one language and give output                  can  be  improved  by  using  more  monolingual  data.  Further, 
           as  text  in  other  language,  there  are  more  than  hundred                   Sennrich et al. [8] showed the working of their NMT system. 
           machine translation technology providers for example ‘Google                      Joshi et al [9] developed a mechanism to write in Hindi using 
           Translate’ is powerful translation service developed by Google                    English.  They  used  statistical  machine  learning  to  predict  a 
           to support more than hundred languages text and documents                         word when some of the initial characters are typed. Using this 
           conversion, ‘Yandex Translate’ is a web service provided by                       Joshi et al. [10] also developed an Example Based Machine 
           Yandex, used to translate ninety-five languages words, whole                      Translation System. Joshi et al. [11] also evaluated the system 
           texts,  phrases  and  entire  text  of  website  only  by  getting  its           developed.  They  also  compared  the  performance  of  this 
           URL, ‘IBM-Watson’ translator  translate  documents  from  one                     system with other popularly available MT engines. Gupta et al. 
           language to another while preserving file formatting and file                     [12]  developed  a  rule-based  stemmer  for  Urdu.  They 
           types included: MS office, PDF, TXT, HTML, JSON, XML and                          developed  several  rules  to  implement  this  stemmer.  They 
           Open office. These technologies use deep learning to improve                      further used this stemmer in evaluation of some English-Urdu 
           their accuracy and speed and provide good interface so user                       MT systems [13]. Singh et al. [14] developed a POS tagger for 
           can easily use. In this paper, we have shown the experiments                      Marathi using Statistical Machine Learning. Bhalla et al. [15] 
           that  we  have  done  for  training  and  and  evaluation  of  our                developed a procedure of transliteration of name entities from 
           Neural Machine Translation (NMT). The rest of the paper is                        English  to  Punjabi.  Joshi  et  al  [16]  evaluated  several  open 
           structured as follows: Section 2 reviews the literature. Section                  domain  MT  engines.  Gupta  et  al.  [17]  did  the  same  for 
           explains our  proposed model. Section 4 shows the evaluation                      English-Urdu MT engines. Singh et al. [18] developed a POS 
           performed on our system and section 5 concludes the papers.                       tagger for Marathi using supervised learning. Joshi et al. [19] 
                           ————————————————                                                  further  developed  a  technique  to  using  machine  learning  in 
                Charu  Verma  is  member  technical  staff  Next  Generation                evaluating  MT  engines.  Tyagi  et  al.  [20]  [21]  developed  an 
                 Technologies       Research        Foundation,       India.     E-mail:     approach  of  translating  complex  English  sentences  by  first 
                 vermacharu284@gmail.com                                                     simplifying them and then translating into Hindi. Yogi et al. [22] 
                Aarti Singh  is member technical staff Next Generation Technologies         developed an approach to identify candidate translation which 
                 Research Foundation, India. E-mail: say2aru19@gmail.com                     are good for post editing. Gupta et al. [23] further extended 
                Swagata  Seal  is  member  technical  staff  Next  Generation               their stemmer by adding derivational rules to the inflectional 
                 Technologies       Research        Foundation,       India.     E-mail:     stemmer. Asopa et al. [24] developed mechanism for chunking 
                 swagata.sita@gmail.com                                                      Hindi sentences using a rule-based approach. Gupta et al. [25] 
                Varsha  Singh  is  member  technical  staff  Next  Generation               developed a rule based lemmatizer for Urdu which was an 
                 Technologies       Research        Foundation,       India.     E-mail: 
                 varshasingh773@gmail.com                                                    extension  to  their  stemmer.  Kumar  et  al.  [26]  developed 
                Iti  Mathur  is  an  Associate  Professor  in  Department  of  Computer     several  machine  learning  based  classifiers  for  identifying 
                 Science,        Banasthali        Vidyapith,       India.       E-mail:     different senses to a word in Hindi. Joshi et al. [27] developed 
                 mathur_iti@rediffmail.com 
                                                                                                                                                                      2710 
                                                                                    IJSTR©2019 
                                                                                    www.ijstr.org 
            INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019                              ISSN 2277-8616 
           
          a  mechanism  to  estimate  the  quality  of  English-Hindi  MT         model. Singh and Joshi [54] developed a rule based approach 
          engines.  Chopra  et  al.  [28]  [29]  developed  a  name  entity       for identifying anaphora in Hindi Discourses. Sinha et al. [55] 
          recognition and tagging tool for Hindi using several machine            developed a sentiment analyzer for Facebook post using the 
          learning  approaches.  Gupta  et  al.  [30]  developed  a  POS          methods developed by Gupta et al. Sharma et al. [56] [57] 
          tagger for Urdu using machine learning approach. Mathur et              used some of the markov model-based approaches used by 
          al. [31] developed an ontology matching evaluation using tool           Singh et al. to develop their association classification model. 
          which used the MT engine developed by Joshi et al. Chopra et            Similar approaches we used by Goyal et al. [58] [59] for their 
          al. [32] developed a mechanism for rewriting English sentence           models. 
          and  then  translating  them  into  Hindi.  This  significantly          
          improved the performance of their MT engine. Joshi et al. [33]          3 PROPOSED MODEL 
          investigated some approaches to classifying documents and                
          further  suggested  an  approach  for  effective  classification  of    3.1 Supervised Machine Learning 
          text  documents. Singh et al. [34] developed an approach to             Supervised learning is function of machine leaning in which 
          automatically generate transfer grammar rules. This approach            data  coming  into  pairs  as  input  and  output.  In  supervised 
          significantly  improved  the  development  process  of  their           learning input could be anything like sensor meas-urements, 
          transfer-based  MT  engine.  Singh  et  al.  [35]  developed  an        pictures,  email or messages and output may be  label, any 
          approach for text processing of Hindi documents using deep              real  numbers,    in  some  cases  vectors  or  in  other  structure 
          neural networks. They further developed this approach to mine           (example:    negative    or  positive,  dog  or    cat,  spam  or  not 
          textual  data  from  web  documents  [36].  Singh  et  al.  [37]        spam, right or wrong). 
          developed a translation memory tool which worked as a sub-                 
          system  in  their  transfer-based  MT  system.  This  further                                                     {(xi, yi)}i =1 to N 
          improved the accuracy of their system. Gupta et al. [38] further           
          showed  how  fuzzy  logic  can  be  used  in  developing  NLP           In this given equation, element xi among N is a feature vector 
          applications.  Gupta  et  al.  [39]  used  several  NLP  tools  in      (is a vector in which each dimension j = 1, . . . , D contains a 
          preprocessing the tweets that they extracted from web. They             value  that  describes  the  example  in  some  way  that  value 
          found  that  this  approach  improves  the  accuracy  of  their         called feature and denote x(j).) and yi is label of that xi input. 
          machine learning model which classifies the tweets. Gupta et            For example x(1) is an input which represents a person, then 
          al. [40] developed an approach which helped in identification           the first feature x(1)  contain gender, the second feature x(2) 
          and  classification  of  multiword  expressions  from  Urdu             contain weight in kg, x(3) contain height in cm so on,  in which 
          documents.  Nathani  et  al.  [41]  developed  a  rule  based           x(1) input’s  x(1), x(2), x(3)  called feature vector. This paper 
          inflectional  stemmer  for  Sindhi  which  was  written  in             represent  Hindi  to  English  machine  translation  on  Hindi-
          Devanagari  script.  Asopa  et  al.  [42]  developed  a  shallow        English parallel corpus in which supervised learning algorithm 
          parser for Hindi using conditional random fields. Gupta et al.          applied  on  attention  model  in  which  one  Recurrent  Neural 
          [43]  showed  the  use  of  machine  learning  approached  in           Network  map the input sequence to a vector in fixed dimen-
          developing  NLP  applications.  Gupta  et  al.  [44]  used  fuzzy       sionality, and another Recurrent Neural Network decode the 
          operations in analyzing sentiments of tweets on several topics.         target sequence from the vector. 
          This approach showed very promising results over traditional             
          approaches. Sharma and Joshi [45] developed a rule based                3.2  Preprocessing Step 
          word  sense  disambiguation  approach  for  Hindi.  It  gave  an        Preprocessing  of  data  is  necessary  before  training  the 
          accuracy  of  73%.  Katyayan  and  Joshi  [46]  studied  various        network. Generally real word data are incomplete, noisy and 
          approaches  of  correct  identification  of  sarcastic  phrases  in     inconsistence to overcome data to these problem data need to 
          English documents. Gupta and Joshi [47] showed show tweets              preprocess. first clean the text by removing spaces and other 
          can be classified  using  NLP  techniques.  They  showed  how           unnecessary  symbol  of  the  sentences.  Network  not 
          negative sentences can be handled using NLP approaches.                 understands the text format so, conversion of text into vector 
          Shree et al. [48] showed how there is difference between Hindi          is necessary. In sequence to sequence translation every word 
          and English languages what problems the current state of the            in a sentence should need contain a unique identity, represent 
          art  MT system face while translating text. Ahmed et al. [49]           each word in a language as a one-hot vector or giant vector 
          showed  how  MT  system  can  be  developed  by  using  an              contain  zero  except  one  in  the  whole  vector.  In  the  given 
          intermediate language which is related to both the languages.           example, the sentence contains several words which shows a 
          They developed a Arabic-Hindi MT system using Urdu as the               vector  every  sentence  in  the  corpus  contain  SOS,  which 
          intermediate language. They further performed the same study            represents starting of the sentence and EOS represents End 
          using English and found that if we have a large sized corpus            of the sentence. An example of this is shown in figure 1. 
          then English which in unrelated to Arabic and Hindi, can be              
          used for developed a MT system [50]. Seal and Joshi [51]                 
          developed a rule based inflectional stemmer for Assamese.                
          This system showed very good results. Singh and Joshi [52]               
          showed the developed of POS taggers for Hindi using different            
          markov models. They concluded that hidden markov model-                  
          based tagger produced the best results among several markov              
          based  POS  taggers.  Pandey  et  al.  [53]  showed  how  NLP            
          approached can help in develop a better ranking model for                
          web documents. They used particle swam optimization and                  
          NLP approaches in improving the performance of their ranking             
                                                                                                                                                  2711 
                                                                          IJSTR©2019 
                                                                          www.ijstr.org 
                                           INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019                                                                                                                                                                                                                                                                                                                                                                                      ISSN 2277-8616 
                                     
                                                                                                                                                                                                                                                                                                                                                                                                                                       TABLE 1 
                                                                                                                                                                                                                                                                                                                                                   Evaluation Results of BLEU at Document Level 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     NMT with 
                                                                                                                                                                                                                                                                                                                                                                                                                                    Baseline                                          Attention 
                                                                                                                                                                                                                                                                                                                                                                                                                                           NMT                                            Model 
                                                                                                                                                                                                                                                                                                                                                                                           Doc1                                      0.375784                                           0.552926 
                                                                                                                                                                                                                                                                                                                                                                                           Doc2                                      0.338189                                           0.507185 
                                                                                                                                                                                                                                                                                                                                                                                           Doc3                                      0.363287                                           0.506252 
                                                                                                                                                                                                                                                                                                                                                                                           Doc4                                      0.358607                                           0.533271 
                           00                                                                                                                                                                                                                                                                                                                                                              Doc5                                      0.361307                                           0.515314 
                                                                              03                             04                             05                                           07                               08                                          01                                                                    
                                                                                                                                               0
                                                                                                                0                                                                                                                                                                                                                Table  2  shows  the  results  of  evaluation  done  by  human 
                                                                                                             0                              0                  06                                                                                                                                                                annotators.  Table  3  shows  the  correlation  between  these 
                                                    02                                                                                                                                                                                                                                                                           studies.  
                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                                                                                                                                 TABLE 2 
                                                           Where       = < 0 0 0 0                                                                                                      1 0 0…………….>                                                                                                                                                      Results of Human Evaluation at Document Level 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      NMT with 
                                                                                                        Fig. 1: Assignment of Weights                                                                                                                                                                                                                                                                                                  Baseline                                        Attention 
                                                                                                                                                                                                                                                                                                                                                                                                                                              NMT                                           Model 
                                    3.3  Training of NMT using Attention Model                                                                                                                                                                                                                                                                                                             Doc1                                         0.501443                                         0.552926 
                                                                                                                                                                                                                                                                                                                                                                                           Doc2                                         0.394018                                         0.507185 
                                    We developed Neural Machine Translation by Jointly learning                                                                                                                                                                                                                                                                                            Doc3                                         0.333809                                         0.506252 
                                    to  Align  and  Translate.  Here,  attention,  reads  as  a  neural                                                                                                                                                                                                                                                                                    Doc4                                         0.428654                                         0.533271 
                                    extension  of  Encoder-Decoder  model.  Encoder  –  Decoder                                                                                                                                                                                                                                                                                            Doc5                                          0.456911                                        0.515314 
                                    model  contain  several  limitations  which  is  resolved  by                                                                                                                                                                                                                                                                                                                                                          
                                    Attention. neural network work on vectors, so it compress all                                                                                                                                                                                                                                                                                                                                      TABLE 3 
                                    important information of source sentence in encoder- decoder                                                                                                                                                                                                                              Pearson Correlation Between Human and BLEU Evaluation 
                                    approach this make neural network difficult to work with long                                                                                                                                                                                                                                                                                                           Metrics for all Engines 
                                    sentences,  for  mainly  those  sentences  which  are  longer                                                                                                                                                                                                                                                                                                        Engine                                               Correlation Score 
                                    compare to training corpus sentence. This is shown in figure                                                                                                                                                                                                                                                                                                                                                                    Human-BLEU 
                                    1In decoding phrase at every time step t , first take the as                                                                                                                                                                                                                                                                                                      Baseline                                                              0.467728 
                                    input the hidden state h at the top layer of the stacking LSTM.                                                                                                                                                                                                                                                                                                          NMT 
                                                                                                                                        t                                                                                                                                                                                                                                                           NMT with 
                                    To capture relevant source side information for find the current                                                                                                                                                                                                                                                                                                 Attention                                                                         1.0 
                                    target  word  y content  vector  c is  used  and  share  the                                                                                                                                                                                                                                                                                                          Model 
                                                                                                    t                                                                                  t                                                                                                                                
                                    subsequence  steps..    when  model  know  how    the  context 
                                    vector ct is derived, then given the target hidden ht and the                                                                                                                                                                                                                      5  CONCULSION 
                                    source side context vector ct,  a simple concatenation layer                                                                                                                                                                                                                       In this paper, we showed the development of English-Hindi MT 
                                    which combine the information from both vectors and provide                                                                                                                                                                                                                        using  Neural  Approach.  We  tested  the  developed  Engines 
                                    a attention hidden state as follows:                                                                                                                                                                                                                                               through  500  sentences.  For  this,  we  did  both  human  and 
                                                                                                                                                                                                                                                                                                                       automatic evaluation. In the  automatic evaluation, we found 
                                                                                                                                  h = tanh(W [c h])                                                                                                                                                                    that  BLEU was producing better results for Attention Based 
                                                                                                                                        t                                          c         t;       t
                                                                                                                                                                                                                                                                                                                       Model which was an improvement over the Baseline Model.  
                                    then we feed attention vector ht with the help of softmax layer                                                                                                                                                                                                                     
                                    which provide productivity as:                                                                                                                                                                                                                                                     REFERENCES 
                                                                                                                                                                                                                                                                                                                       [1]  Luong, M.T., Pham, H. and Manning, C.D., 2015. Effective 
                                                                                                              P(y y , x) = softmax(W                                                                                          )                                                                                                          approaches to attention-based neural machine translation. 
                                                                                                                            t
The words contained in this file might help you see if this file matches what you are looking for:

...International journal of scientific technology research volume issue november issn hindi english neural machine translation using attention model charu verma aarti singh swagata seal varsha iti mathur abstract is the technique in which system translate text from source natural language to target so that original message retained deep networks are capable models achieved malicious achievement on challenging learning tasks such as visual object recognition and speech work well whenever large amount training sets available this paper represent at parallel corpus supervised algorithm applied with one recurrent network map input sequence a vector fixed dimensionality another decode show how better way data index terms lstm introduction procedure for submission solve particular problem people need discuss or share loung et al showed based techniques their ideas but understanding big gap improving quality nmt provides access information written an cho elaborated different properties unknown r...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area