jagomart
digital resources
picture1_Language Pdf 98397 | Mtsummit Poster4


 200x       Filetype PDF       File size 0.30 MB       Source: aclanthology.org


File: Language Pdf 98397 | Mtsummit Poster4
divergence patterns in machine translation between hindi and english r mahesh k sinha anil thakur indian institute of technology kanpur indian institute of technology kanpur rmk iitk ac in anilt ...

icon picture PDF Filetype PDF | Posted on 21 Sep 2022 | 3 years ago
Partial capture of text on file.
                       Divergence Patterns in Machine Translation between Hindi and English  
                                      R. Mahesh K. Sinha                                         Anil Thakur 
                             Indian Institute of Technology Kanpur                  Indian Institute of Technology Kanpur 
                                          rmk@iitk.ac.in                                        anilt@iitk.ac.in
                   
                                                                                  noted (Dorr, 1994) that certain types of translation 
                                       Abstract                                   divergences are universal in the sense that they 
                  The issue of translation divergence is an                       exist across the languages whereas certain other 
                  important research topic in the area of                         types of translation divergences are specific to a 
                  machine translation. An exhaustive study of                     pair of translation languages. Therefore, the 
                  the divergence issues in MT is necessary for                    translation divergences need to be studied from 
                  their proper classification and resolution. In                  both across-language and language-specific 
                  the literature on MT, scholars have examined                    perspectives. In this paper, we examine Hindi and 
                  the issue and have proposed ways for their                      English translation language pair largely from the 
                  classification and resolution (Dorr 1993,                       perspective of identifying the language-specific 
                  1994). However, the topic still needs further                   divergecnes. Hindi and English differ in many 
                  exploration to identify different sources of 
                                                                                                                                            2
                  translation divergence in different pairs of                    respects and hence this translation language pair  
                  translation languages. In this paper, we discuss                presents a rich source for the study of translation 
                  translation patterns between Hindi and English                  divergence in MT. These languages also show 
                  of different types of constructions with a view                 significant differences from the point of view of 
                  to identifying the potential topics of the                      socio-cultural perspectives that need to be properly 
                  translation divergences. We take Dorr’s (1993,                  examined. In this paper, we discuss different 
                  1994) classification of translation divergence                  aspects of Hindi and English grammars that 
                  as the base to examine the different topics of 
                  translation divergence in Hindi and English.                    involve potential areas of translation divergences 
                  The primary goal of the paper is to point out                   in Hindi and English MT. We discuss divergence 
                  different types of translation divergences in                   issues in Hindi-English machine translation and the 
                  Hindi and English MT that have not been                         same translation pair is then examined for reverse 
                  discussed in the existing literature.                           translation from English-Hindi so as to examine 
               1     Introduction                                                 the nature of the divergence in each case.  
                  The issue of translation divergence is a complex                  In the existing literature, the issue of translation 
               topic in machine translation (MT). The translation                 divergence for Hindi and English MT has not been 
               divergence can be defined in terms of language-to-                 exhaustively examined. Gupta et al (2003) and 
               language differences in the respective grammars.                   Dave et al (2001) discuss some of the translation 
               Thus a divergence occurs when a sentence in                        divergences pertaining to English-Hindi MT and 
               language L  translates into a sentence in L  in a                  Hindi-English MT. Dave et al (2001) discusses the 
                             1                                     2              issue within the UNL-based Interlingua approach 
                                     1
               very different form  (Dorr 1994: 12). The topic has                and only some of the obvious types of divergences 
               been studied from different perspectives and a                     have been discussed. These works do not explore 
               number of approaches have been proposed to                         further areas of divergence. Some of the obvious 
               handle them. It is crucial for any MT system to                    divergence types such as thematic divergence, 
               identify the nature of translation divergences and                 dative divergence and movement divergence have 
               resolve them so as to obtain correct translation.                  not been discussed at all. Although the authors 
               The translation divergences occur at different                     point out divergences resulting from the pro-drop 
               levels and affect the quality of the translation                   phenomenon in Hindi and the occurrence of 
               according to the degree of complexity involved in                  pleonastic subjects in English, they do not examine 
               a particular translation divergence. It has also been              the issue in detail to capture the implications of 
                                                                                                                                         
                                                                                                                           
                  1 It should be noted that what constitutes a translation          2 One of the reviewers has correctly pointed out that 
               divergence is not dependent upon the translation                   languages other than Hindi which are equally distant 
               strategy used for machine translation. This is contrary to         from English can be assumed to exhibit similar or more 
               the views expressed by one of the reviewers. In this               such divergences as discussed in this paper. Natural 
               paper we have taken this definition of divergence and              languages are very complex and no research on 
               presented structural differences both in forward and               translation divergence can be said to be exhaustive, 
               reverse directions irrespective of the MT strategy.                particularly at this stage of research.  
                                                                           346
                                                                           the perspective of the universal grammar. The 
              these language-specific features for other types of 
              divergences. Also, some of the examples that have            classification captures the major grammatical 
              been discussed under head-swapping divergence                issues in translation divergence across languages. 
              such as promotional and demotional divergences               However, it also misses a number of points that 
              need to be re-looked for their proper                        pertain to a particular set of translation languages. 
              categorization. For instance, on (as in “the play is         The issue of divergence between a set of languages 
              on” => khel cal rahaa hE {play go PROG be.PR}                is associated with a number of factors ranging from 
              has been taken as an adverbial element in English            linguistic to socio- and psycho-linguistic aspects of 
              which has a verbal realization in Hindi. However,            the languages involved. Although Dorr’s 
              if we recognize this use of ‘on’ (meaning in Hindi           classification takes into account many of the major 
              as ‘caalu’) as an adjectival element, the divergence         linguistic factors associated with translation 
              no longer exists. The Hindi translation (khel caalu          divergence, there still remains a number of points 
              hE {play on be.PR} for the English sentence (“the            related to both linguistic and extra-linguistic 
              play is on.”) is equally valid and a commonly used           factors that may exist in different sets of translation 
              sentence. Gupta et al (2003) discusses only a few            languages. Furthermore the parameters of the 
              cases of divergence to present rules for unification         classification does not take into account subtle 
              of translation divergences in English-Hindi MT.              semantic factors to the extent they are relevant for 
              Thus we notice that the existing works are far from          the classification of translation divergences in 
              exhaustive both from the point of view of                    various languages. Without going into a detailed 
              classification and resolution of different translation       discussion of the different classes and categories of 
              divergences in the context of Hindi-English MT.              translation divergences as proposed in Dorr (1993, 
                 In section 2, we discuss different sources of             1994), we discuss English and Hindi translation 
              translation divergences in Hindi and English MT.             examples that present new sources and topics of 
              Section 3 presents a brief outline of strategy used          translation divergence in English-Hindi and Hindi-
              in dealing with these divergences in our MT                  English MT. 
              system followed by the concluding remarks.                   2.1     Non-Configurational Nature of Hindi  
              2    Translation Divergence: Classification and                 English is a configurational language that 
                   Further Issues                                          follows a rigid word order pattern as opposed to 
                Dorr (1993) categorizes translation divergences            Hindi which is relatively less rigid and exhibit free 
              into two broad types. They are: (A) Syntactic                word order variation. This is one of the major 
              Divergences, (B) Lexical-semantic Divergences.               sources of divergence between a pair of natural 
              They are further subcategorized as follows:                  languages. In Dorr’s classification, word order 
              (A) Syntactic Divergence: i. Constituent order               related translation divergences have been discussed 
              divergence, ii. Adjunction divergence, iii.                  under syntactic divergence. Dave el al (2001) 
              Preposition-stranding divergence, iv. Movement               extends Dorr’s classification to English-Hindi 
              divergence, v. Null subject divergence, vi. Dative           translation pair but do not discuss the implications 
              divergence, and vii. Pleonastic divergence                   of the word order facts at all. For instance, one of 
              (B) Lexical-semantic Divergence: i. Thematic                 the implications of the word order related 
              divergence, ii. Promotional divergence, iii.                 divergence can be noticed with respect to the 
              Demotional divergence, iv. Structural divergence             interpretation of the question particle ‘kyaa’ (Sinha 
              v. Conflational divergence, vi. Categorial                   et al. 2005c) in Hindi. ‘kyaa’ can be used both as a 
              divergence, and vii. Lexical divergence                      marker of interrogative pronoun in content 
                In Dorr (1994), she has examined the structure of          question sentences and as a question particle in 
              the lexical-semantic divergences and proposed a              yes-no question sentences. Besides certain other 
              LCS-based approach for their resolution. This                factors such as the category of the verb, it is the 
              classification takes into account various sources of         position of occurrence of ‘kyaa’ that indicates its 
              differences between a set of translation language            interpretation one way or the other. The particle 
              and captures a large sets of translation divergences.        ‘kyaa’ in the sentence-initial and sentence-final 
              The classification is based on the Government and            positions are generally interpreted as question 
              Binding framework (Chomsky 1986, Jackendoff                  particle rather than as an interrogative pronoun, as 
              1990) of linguistic theory which assumes a deep              is evident from the examples shown in (1). 
              structure to capture the surface structure variations.           
              The deep structure functions as the universal                (1) a. aap kyaa paDh rahe hEN? {you what read  
              structure, i.e. applicable across languages. Thus                     PROG be.PR} => What are you reading? 
              both the classification and the resolution of the                 b. kyaa aap paDh rahe hEN? {QP you read  
              translation divergences are largely discussed from                    PROG be.PR} => Are you reading? 
                                                                      347
                  The examples in (1)3 show subtle implications                  English. In the case of the reverse translation from 
               with respect to the word order facts in Hindi.                    English to Hindi no divergence is encountered.  
               Replicative and Echo Words                                        2.2    Expressive Elements 
                 Hindi, like most of the other South Asian 
               languages, exhibits the phenomenon of replication                   Expressive words exist in all natural languages 
               (Sinha et al. 2005d) of the lexical items to express              and pose difficulty in processing, particularly in 
               different grammatical functions. The English                      mapping onto another language. The reason is that 
               counterparts of these Hindi constructions do not                  these words do not have exact parallel in another 
               resort to replicative structure. This distinction often           language. Thus the word dhaRaam is only distantly 
               results into a change in the syntactic category of                mapped by ‘bump’ in English, as in (6).  
               the relevant elements. For instance, we notice that               (6) vah dhaRaam se girii. {she ‘dhaRaam’ with  
               in Hindi, as in (2), the replication of the verb (in                    fell} => She fell with a ‘bump’. 
               participial form) denote an adverbial function of                   The expressive words usually originate from the 
               cause. The English counterpart of this function is                sound associated with the semantics of the action 
               realized by a gerundive prepositional phrase.                     verb and can be adverbial or verbalized action-
               (2) vah bolate bolate thak gayaa. {he speak                       verbs such as ‘tap-tapaanaa’ (drip), ‘khat-
                    speak tired got} => He got tired of speaking.                khataanaa’ (knock) etc. One may argue this to be 
                 In this example, the replicative element bolate                 just a lexical gap but indeed it is not so. However, 
               bolate is an adverbial clause which is realized                   some of these words can be handled in the lexicon 
               lexically in Hindi and is mapped in English                       but as in many cases the mapping also involves 
               structurally. The reverse translation for this                    structural changes, the issue involves a wider scope 
               example set does not involve divergence4, as in (3).              of interpretation.  
               (3) He got tired of speaking. => vah bolane se                    2.3    Asymmetry in NP and Existential Clauses 
                    thak gayaa. {he speak of tired got} 
                 Another typological feature exhibited by all the                  The issue of divergence related to the difference 
               Indian languages is the occurrence of echo words                  in the determiner systems of English and Hindi 
               where a lexical word is partially replicated to                   NPs has not been examined in the existing 
               denote a wide range of meanings with subtle                       literature on divergence. English has (in)definite 
               semantic constraints. The examples in (4-5) are                   articles that mark the (in)definiteness of the noun 
               illustrative.                                                     phrase overtly whereas Hindi lacks an overt article 
               (4) caay vaay pii kar jaaiye. {tea EW drink CPP                   system and different devices are used to realize the 
                     go} => Have some snacks before going.                       (in)definiteness of a noun phrase in Hindi. For 
               (5) ise Thiik se jaaNc vaaNc lo. {this properly                   instance, mapping onto articles a-an/the in English 
                     examine EW take} => Please examine it                       is not lexically realizable from Hindi (e.g. laRakaa 
                     properly. <=> ise Thiik se jaaNc liijiye.                   aayaa => The/*a boy came.). In this connection, 
                 The echo words generally have no lexical status                 another point of divergence between Hindi and 
               in the lexicon of the language. However, whenever                 English related to there-  and  it-sentences in 
               an echo word is identical with a lexical word, it                 English is worth examining. In English, there- and 
               affects the interpretation of the preceding lexicon.              it-constructions are used to denote existential 
               In (4), the use of an echo word ‘vaay’ along with                 sentences (besides others). Hindi does not have a 
               the main word ‘caay (tea) gives the sense of light                pleonastic subject construction and the contrast 
               refreshment. However, this is not a possible sense                between existential and non-existential (mostly 
               in which an echo word is used in (5). Here the                    definite) sentences is realized by several other 
               main verb jaNcanaa  ‘examine’ occurs with an                      ways such as the movement of the noun phrase 
               echo word that has only an emphatic (or extension)                from its canonical position and the use of 
               function but it cannot be exactly expressed in                    demonstrative elements. Let us look at the 
                                                                                 examples in (7-8). 
                                                                                 (7) kamare meN saaNp hE. {room in snake  
               3 ACC:Accusative Case, AFF:Affirmative, CAUS:Causative, 
               CONT:Continuative Aspect, CPP:Conjunctive Participial                  be.PR} => There is a snake in the room. 
               Particle, DAT:Dative Case, DIT:Ditransitive, ERG: Ergative        (8) saaNp kamare meN hE. {snake room in  
               Case,  EW:Echo Word, FU:Future Tense, GER: Gerund,                      be.PR} =>The snake is in the room.  
               HAB:Habitual Aspect, IMP:Imperfective Aspect, IMPR:                 We notice that the bare noun phrase saaNp 
               Imperative Mood, INT:Interrogative,  OPT:Optative Mood,           ‘snake’ in (7) and (8) is mapped by indefinite and 
               PASS:Passive Particle, PR:Present Tense, PST:Past Tense,          definite noun phrases in English. However, the 
               QP:Question Particle, SUBJ:Subjunctive Mood, TRS: 
               Transitive, VPRT:Verbal Participle.                               only difference between these two Hindi sentences 
                  4 In case of multiple possible translations, if any one        is the respective positions of the subject NP and 
               the translations exhibit the same grammatical structure,          the (place) adverbial phrase. When we look at the 
               it is considered as a case of no divergence. 
                                                                          348
              reverse translation of the same translation                  morphology on the verb in both the tenses. 
              sentence, the nature of divergence is different.             However, this habitual aspect in English is realized 
              (9) There is a snake in the room. => kamare                  by the use of a phrasal verb in the case of the past 
                    meN  ek  saaNp hE. {room in a snake be.PR}             tense (12) and by the use of an adverbial word 
                Hindi does not have a counterpart of “there-               ‘often’ in the case of the present (and future) tense 
              construction” and the Hindi grammar has to resort            (14). Thus the adverbial element in Hindi is 
              to a number of devices such as shifting of the               optional whereas the one in English cannot be 
              relevant elements and deletion of ‘there’ to obtain          optional. In (14), we notice that the non-
              the equivalent of the English sentence, as in (9).           terminative aspect is realized by verbal 
              2.4    Tense, Moods and Aspects (TAM)                        morphology in Hindi whereas English uses a 
                                                                           phrasal structure to realize this aspect.  
                Another important source of translation                    (12) raam aayaa karataa thaa. {Ram come  
              divergence in Hindi and English MT is associated                    CONT be.PST} => Ram used to come. 
              with the difference in the manifestation of different        (13) raam (aksar) aayaa karataa hE.  
              tense, moods and aspectual properties of the verb                   {Ram often come CONT be.PR} => Ram  
              in these languages. For instance, Hindi uses a                      *(often) comes.  
              certain type of passive construction that marks a            (14) raam bolataa rahaa. {Ram speak CONT}  
              kind of non-volition function. The English                           => Ram kept on speaking. 
              counterparts of such Hindi sentences are only                   In certain types of conditional clauses in Hindi, 
              partially able to express the exact meaning.                 there is optainality between present and future/past 
              (10) raam se galatii ho gaii. {Ram by mistake                tenses. But the English counterparts of these Hindi 
                     be PASS} => Ram made a mistake. <=>                   sentences always require the verb to occur in the 
                      raam-ne galatii kii.                                 present tense.  
                The possible English counterpart of the Hindi              (15) yadi tum dillii jaate ho / jaoge to tum  
              sentence in (10) is far from the actual sense in                    kaamyaab hoge.{if you Delhi go FU / PST  
              which the Hindi impersonal passive has been used.                   then you successful be.FU} => If you go to  
              The literal sense will be somewhat like: ‘a mistake                 Delhi you will be successful. <=> yadi tum  
              got made by Ram unintentionally’. Thus the                          dillii jaataa ho to tum kaamyaab hogaa. {if  
              reverse translation for the same translation                        you Delhi go then you successful be.FU} 
              sentence from English to Hindi involves far more                The reverse translation from English to Hindi 
              complex procedure 5 . A somewhat similar                     will produce only the source Hindi sentence that 
              dimension of divergence between Hindi and                    has the verb in the present tense form and hence 
              English is manifested with respect to the negative           will not involve any translation divergence.  
              impersonal passive constructions in Hindi and the            2.5    Role of Conjunctions and Particles  
              way they are realized in English. 
              (11) raam se calaa nahiiN jaataa. {Ram by                       Another source of divergence between Hindi and 
                     walk not PASS} => Ram cannot walk. <=>                English can be located in the case of the use of 
                      raam  cal nahiiN sakataa.                            different conjunctions and particles in Hindi. We 
                In this case, too, no translation divergence occurs        take examples involving some of these particles in 
              in the case of the reverse translation and the source        Hindi such as ki,  na, and yaa. The translation 
              Hindi sentence cannot be obtained.                           divergence between Hindi and English related to ki 
                In Hindi, some of the aspectual features of the            is quite complex (Sinha and Thakur, 2005b). ki is 
              verb are realized by verbal inflection whereas               mainly used as a sentence complementizer, but can 
              English resorts to different non-inflectional ways           also be used to indicate alternate conjunction in an 
              such as phrasal verb or an adverbial element or a            affirmative sentence (16) and an interrogative 
              prepositional phrase with gerund as the head, to             sentence (18) in Hindi.  
              realize them. For instance, in (12-13), the aspectual        (16) siitaa mujhase milii na ki usase. {Sita me  
              property is identical in both the sentences and the                 met not him} => Sita met me not him.  
              difference is located only in tense. The habitual            (17) raam paDhataa hE ki sotaa hE? {Ram read  
              aspect of the tense is reflected by inflectional                    PROG be.PR or sleep PRPG be.PR} => Does  
                                                                                  Ram study or sleep? <=> kyaa raam  
              5One of the reviewers has argued that such a claim                  paDhataa hE yaa sotaa hE.  
              makes no sense as it can only be made in relation to a          In another instance, yaa (‘or’) is a coordinate 
              given system. The point we are making here is that it is     conjunction particle in Hindi that conjoins two 
              not possible to derive a sentence to sentence translation    clauses or phrases. However, it can denote a 
              whatever be the MT system. A translation can be only         different function in Hindi depending on the 
              in the form of a number of sentences ‘explaining’ the        punctuation mark used in the relevant sentence. 
              situation. 
                                                                      349
The words contained in this file might help you see if this file matches what you are looking for:

...Divergence patterns in machine translation between hindi and english r mahesh k sinha anil thakur indian institute of technology kanpur rmk iitk ac anilt noted dorr that certain types abstract divergences are universal the sense they issue is an exist across languages whereas other important research topic area specific to a exhaustive study pair therefore issues mt necessary for need be studied from their proper classification resolution both language literature on scholars have examined perspectives this paper we examine proposed ways largely perspective identifying however still needs further divergecnes differ many exploration identify different sources pairs respects hence discuss presents rich source these also show constructions with view significant differences point potential topics socio cultural properly take s aspects grammars as base involve areas primary goal out not been same then reverse discussed existing so introduction nature each case complex has can defined terms e...

no reviews yet
Please Login to review.