200x Filetype PDF File size 0.30 MB Source: aclanthology.org
Divergence Patterns in Machine Translation between Hindi and English R. Mahesh K. Sinha Anil Thakur Indian Institute of Technology Kanpur Indian Institute of Technology Kanpur rmk@iitk.ac.in anilt@iitk.ac.in noted (Dorr, 1994) that certain types of translation Abstract divergences are universal in the sense that they The issue of translation divergence is an exist across the languages whereas certain other important research topic in the area of types of translation divergences are specific to a machine translation. An exhaustive study of pair of translation languages. Therefore, the the divergence issues in MT is necessary for translation divergences need to be studied from their proper classification and resolution. In both across-language and language-specific the literature on MT, scholars have examined perspectives. In this paper, we examine Hindi and the issue and have proposed ways for their English translation language pair largely from the classification and resolution (Dorr 1993, perspective of identifying the language-specific 1994). However, the topic still needs further divergecnes. Hindi and English differ in many exploration to identify different sources of 2 translation divergence in different pairs of respects and hence this translation language pair translation languages. In this paper, we discuss presents a rich source for the study of translation translation patterns between Hindi and English divergence in MT. These languages also show of different types of constructions with a view significant differences from the point of view of to identifying the potential topics of the socio-cultural perspectives that need to be properly translation divergences. We take Dorr’s (1993, examined. In this paper, we discuss different 1994) classification of translation divergence aspects of Hindi and English grammars that as the base to examine the different topics of translation divergence in Hindi and English. involve potential areas of translation divergences The primary goal of the paper is to point out in Hindi and English MT. We discuss divergence different types of translation divergences in issues in Hindi-English machine translation and the Hindi and English MT that have not been same translation pair is then examined for reverse discussed in the existing literature. translation from English-Hindi so as to examine 1 Introduction the nature of the divergence in each case. The issue of translation divergence is a complex In the existing literature, the issue of translation topic in machine translation (MT). The translation divergence for Hindi and English MT has not been divergence can be defined in terms of language-to- exhaustively examined. Gupta et al (2003) and language differences in the respective grammars. Dave et al (2001) discuss some of the translation Thus a divergence occurs when a sentence in divergences pertaining to English-Hindi MT and language L translates into a sentence in L in a Hindi-English MT. Dave et al (2001) discusses the 1 2 issue within the UNL-based Interlingua approach 1 very different form (Dorr 1994: 12). The topic has and only some of the obvious types of divergences been studied from different perspectives and a have been discussed. These works do not explore number of approaches have been proposed to further areas of divergence. Some of the obvious handle them. It is crucial for any MT system to divergence types such as thematic divergence, identify the nature of translation divergences and dative divergence and movement divergence have resolve them so as to obtain correct translation. not been discussed at all. Although the authors The translation divergences occur at different point out divergences resulting from the pro-drop levels and affect the quality of the translation phenomenon in Hindi and the occurrence of according to the degree of complexity involved in pleonastic subjects in English, they do not examine a particular translation divergence. It has also been the issue in detail to capture the implications of 1 It should be noted that what constitutes a translation 2 One of the reviewers has correctly pointed out that divergence is not dependent upon the translation languages other than Hindi which are equally distant strategy used for machine translation. This is contrary to from English can be assumed to exhibit similar or more the views expressed by one of the reviewers. In this such divergences as discussed in this paper. Natural paper we have taken this definition of divergence and languages are very complex and no research on presented structural differences both in forward and translation divergence can be said to be exhaustive, reverse directions irrespective of the MT strategy. particularly at this stage of research. 346 the perspective of the universal grammar. The these language-specific features for other types of divergences. Also, some of the examples that have classification captures the major grammatical been discussed under head-swapping divergence issues in translation divergence across languages. such as promotional and demotional divergences However, it also misses a number of points that need to be re-looked for their proper pertain to a particular set of translation languages. categorization. For instance, on (as in “the play is The issue of divergence between a set of languages on” => khel cal rahaa hE {play go PROG be.PR} is associated with a number of factors ranging from has been taken as an adverbial element in English linguistic to socio- and psycho-linguistic aspects of which has a verbal realization in Hindi. However, the languages involved. Although Dorr’s if we recognize this use of ‘on’ (meaning in Hindi classification takes into account many of the major as ‘caalu’) as an adjectival element, the divergence linguistic factors associated with translation no longer exists. The Hindi translation (khel caalu divergence, there still remains a number of points hE {play on be.PR} for the English sentence (“the related to both linguistic and extra-linguistic play is on.”) is equally valid and a commonly used factors that may exist in different sets of translation sentence. Gupta et al (2003) discusses only a few languages. Furthermore the parameters of the cases of divergence to present rules for unification classification does not take into account subtle of translation divergences in English-Hindi MT. semantic factors to the extent they are relevant for Thus we notice that the existing works are far from the classification of translation divergences in exhaustive both from the point of view of various languages. Without going into a detailed classification and resolution of different translation discussion of the different classes and categories of divergences in the context of Hindi-English MT. translation divergences as proposed in Dorr (1993, In section 2, we discuss different sources of 1994), we discuss English and Hindi translation translation divergences in Hindi and English MT. examples that present new sources and topics of Section 3 presents a brief outline of strategy used translation divergence in English-Hindi and Hindi- in dealing with these divergences in our MT English MT. system followed by the concluding remarks. 2.1 Non-Configurational Nature of Hindi 2 Translation Divergence: Classification and English is a configurational language that Further Issues follows a rigid word order pattern as opposed to Dorr (1993) categorizes translation divergences Hindi which is relatively less rigid and exhibit free into two broad types. They are: (A) Syntactic word order variation. This is one of the major Divergences, (B) Lexical-semantic Divergences. sources of divergence between a pair of natural They are further subcategorized as follows: languages. In Dorr’s classification, word order (A) Syntactic Divergence: i. Constituent order related translation divergences have been discussed divergence, ii. Adjunction divergence, iii. under syntactic divergence. Dave el al (2001) Preposition-stranding divergence, iv. Movement extends Dorr’s classification to English-Hindi divergence, v. Null subject divergence, vi. Dative translation pair but do not discuss the implications divergence, and vii. Pleonastic divergence of the word order facts at all. For instance, one of (B) Lexical-semantic Divergence: i. Thematic the implications of the word order related divergence, ii. Promotional divergence, iii. divergence can be noticed with respect to the Demotional divergence, iv. Structural divergence interpretation of the question particle ‘kyaa’ (Sinha v. Conflational divergence, vi. Categorial et al. 2005c) in Hindi. ‘kyaa’ can be used both as a divergence, and vii. Lexical divergence marker of interrogative pronoun in content In Dorr (1994), she has examined the structure of question sentences and as a question particle in the lexical-semantic divergences and proposed a yes-no question sentences. Besides certain other LCS-based approach for their resolution. This factors such as the category of the verb, it is the classification takes into account various sources of position of occurrence of ‘kyaa’ that indicates its differences between a set of translation language interpretation one way or the other. The particle and captures a large sets of translation divergences. ‘kyaa’ in the sentence-initial and sentence-final The classification is based on the Government and positions are generally interpreted as question Binding framework (Chomsky 1986, Jackendoff particle rather than as an interrogative pronoun, as 1990) of linguistic theory which assumes a deep is evident from the examples shown in (1). structure to capture the surface structure variations. The deep structure functions as the universal (1) a. aap kyaa paDh rahe hEN? {you what read structure, i.e. applicable across languages. Thus PROG be.PR} => What are you reading? both the classification and the resolution of the b. kyaa aap paDh rahe hEN? {QP you read translation divergences are largely discussed from PROG be.PR} => Are you reading? 347 The examples in (1)3 show subtle implications English. In the case of the reverse translation from with respect to the word order facts in Hindi. English to Hindi no divergence is encountered. Replicative and Echo Words 2.2 Expressive Elements Hindi, like most of the other South Asian languages, exhibits the phenomenon of replication Expressive words exist in all natural languages (Sinha et al. 2005d) of the lexical items to express and pose difficulty in processing, particularly in different grammatical functions. The English mapping onto another language. The reason is that counterparts of these Hindi constructions do not these words do not have exact parallel in another resort to replicative structure. This distinction often language. Thus the word dhaRaam is only distantly results into a change in the syntactic category of mapped by ‘bump’ in English, as in (6). the relevant elements. For instance, we notice that (6) vah dhaRaam se girii. {she ‘dhaRaam’ with in Hindi, as in (2), the replication of the verb (in fell} => She fell with a ‘bump’. participial form) denote an adverbial function of The expressive words usually originate from the cause. The English counterpart of this function is sound associated with the semantics of the action realized by a gerundive prepositional phrase. verb and can be adverbial or verbalized action- (2) vah bolate bolate thak gayaa. {he speak verbs such as ‘tap-tapaanaa’ (drip), ‘khat- speak tired got} => He got tired of speaking. khataanaa’ (knock) etc. One may argue this to be In this example, the replicative element bolate just a lexical gap but indeed it is not so. However, bolate is an adverbial clause which is realized some of these words can be handled in the lexicon lexically in Hindi and is mapped in English but as in many cases the mapping also involves structurally. The reverse translation for this structural changes, the issue involves a wider scope example set does not involve divergence4, as in (3). of interpretation. (3) He got tired of speaking. => vah bolane se 2.3 Asymmetry in NP and Existential Clauses thak gayaa. {he speak of tired got} Another typological feature exhibited by all the The issue of divergence related to the difference Indian languages is the occurrence of echo words in the determiner systems of English and Hindi where a lexical word is partially replicated to NPs has not been examined in the existing denote a wide range of meanings with subtle literature on divergence. English has (in)definite semantic constraints. The examples in (4-5) are articles that mark the (in)definiteness of the noun illustrative. phrase overtly whereas Hindi lacks an overt article (4) caay vaay pii kar jaaiye. {tea EW drink CPP system and different devices are used to realize the go} => Have some snacks before going. (in)definiteness of a noun phrase in Hindi. For (5) ise Thiik se jaaNc vaaNc lo. {this properly instance, mapping onto articles a-an/the in English examine EW take} => Please examine it is not lexically realizable from Hindi (e.g. laRakaa properly. <=> ise Thiik se jaaNc liijiye. aayaa => The/*a boy came.). In this connection, The echo words generally have no lexical status another point of divergence between Hindi and in the lexicon of the language. However, whenever English related to there- and it-sentences in an echo word is identical with a lexical word, it English is worth examining. In English, there- and affects the interpretation of the preceding lexicon. it-constructions are used to denote existential In (4), the use of an echo word ‘vaay’ along with sentences (besides others). Hindi does not have a the main word ‘caay (tea) gives the sense of light pleonastic subject construction and the contrast refreshment. However, this is not a possible sense between existential and non-existential (mostly in which an echo word is used in (5). Here the definite) sentences is realized by several other main verb jaNcanaa ‘examine’ occurs with an ways such as the movement of the noun phrase echo word that has only an emphatic (or extension) from its canonical position and the use of function but it cannot be exactly expressed in demonstrative elements. Let us look at the examples in (7-8). (7) kamare meN saaNp hE. {room in snake 3 ACC:Accusative Case, AFF:Affirmative, CAUS:Causative, CONT:Continuative Aspect, CPP:Conjunctive Participial be.PR} => There is a snake in the room. Particle, DAT:Dative Case, DIT:Ditransitive, ERG: Ergative (8) saaNp kamare meN hE. {snake room in Case, EW:Echo Word, FU:Future Tense, GER: Gerund, be.PR} =>The snake is in the room. HAB:Habitual Aspect, IMP:Imperfective Aspect, IMPR: We notice that the bare noun phrase saaNp Imperative Mood, INT:Interrogative, OPT:Optative Mood, ‘snake’ in (7) and (8) is mapped by indefinite and PASS:Passive Particle, PR:Present Tense, PST:Past Tense, definite noun phrases in English. However, the QP:Question Particle, SUBJ:Subjunctive Mood, TRS: Transitive, VPRT:Verbal Participle. only difference between these two Hindi sentences 4 In case of multiple possible translations, if any one is the respective positions of the subject NP and the translations exhibit the same grammatical structure, the (place) adverbial phrase. When we look at the it is considered as a case of no divergence. 348 reverse translation of the same translation morphology on the verb in both the tenses. sentence, the nature of divergence is different. However, this habitual aspect in English is realized (9) There is a snake in the room. => kamare by the use of a phrasal verb in the case of the past meN ek saaNp hE. {room in a snake be.PR} tense (12) and by the use of an adverbial word Hindi does not have a counterpart of “there- ‘often’ in the case of the present (and future) tense construction” and the Hindi grammar has to resort (14). Thus the adverbial element in Hindi is to a number of devices such as shifting of the optional whereas the one in English cannot be relevant elements and deletion of ‘there’ to obtain optional. In (14), we notice that the non- the equivalent of the English sentence, as in (9). terminative aspect is realized by verbal 2.4 Tense, Moods and Aspects (TAM) morphology in Hindi whereas English uses a phrasal structure to realize this aspect. Another important source of translation (12) raam aayaa karataa thaa. {Ram come divergence in Hindi and English MT is associated CONT be.PST} => Ram used to come. with the difference in the manifestation of different (13) raam (aksar) aayaa karataa hE. tense, moods and aspectual properties of the verb {Ram often come CONT be.PR} => Ram in these languages. For instance, Hindi uses a *(often) comes. certain type of passive construction that marks a (14) raam bolataa rahaa. {Ram speak CONT} kind of non-volition function. The English => Ram kept on speaking. counterparts of such Hindi sentences are only In certain types of conditional clauses in Hindi, partially able to express the exact meaning. there is optainality between present and future/past (10) raam se galatii ho gaii. {Ram by mistake tenses. But the English counterparts of these Hindi be PASS} => Ram made a mistake. <=> sentences always require the verb to occur in the raam-ne galatii kii. present tense. The possible English counterpart of the Hindi (15) yadi tum dillii jaate ho / jaoge to tum sentence in (10) is far from the actual sense in kaamyaab hoge.{if you Delhi go FU / PST which the Hindi impersonal passive has been used. then you successful be.FU} => If you go to The literal sense will be somewhat like: ‘a mistake Delhi you will be successful. <=> yadi tum got made by Ram unintentionally’. Thus the dillii jaataa ho to tum kaamyaab hogaa. {if reverse translation for the same translation you Delhi go then you successful be.FU} sentence from English to Hindi involves far more The reverse translation from English to Hindi complex procedure 5 . A somewhat similar will produce only the source Hindi sentence that dimension of divergence between Hindi and has the verb in the present tense form and hence English is manifested with respect to the negative will not involve any translation divergence. impersonal passive constructions in Hindi and the 2.5 Role of Conjunctions and Particles way they are realized in English. (11) raam se calaa nahiiN jaataa. {Ram by Another source of divergence between Hindi and walk not PASS} => Ram cannot walk. <=> English can be located in the case of the use of raam cal nahiiN sakataa. different conjunctions and particles in Hindi. We In this case, too, no translation divergence occurs take examples involving some of these particles in in the case of the reverse translation and the source Hindi such as ki, na, and yaa. The translation Hindi sentence cannot be obtained. divergence between Hindi and English related to ki In Hindi, some of the aspectual features of the is quite complex (Sinha and Thakur, 2005b). ki is verb are realized by verbal inflection whereas mainly used as a sentence complementizer, but can English resorts to different non-inflectional ways also be used to indicate alternate conjunction in an such as phrasal verb or an adverbial element or a affirmative sentence (16) and an interrogative prepositional phrase with gerund as the head, to sentence (18) in Hindi. realize them. For instance, in (12-13), the aspectual (16) siitaa mujhase milii na ki usase. {Sita me property is identical in both the sentences and the met not him} => Sita met me not him. difference is located only in tense. The habitual (17) raam paDhataa hE ki sotaa hE? {Ram read aspect of the tense is reflected by inflectional PROG be.PR or sleep PRPG be.PR} => Does Ram study or sleep? <=> kyaa raam 5One of the reviewers has argued that such a claim paDhataa hE yaa sotaa hE. makes no sense as it can only be made in relation to a In another instance, yaa (‘or’) is a coordinate given system. The point we are making here is that it is conjunction particle in Hindi that conjoins two not possible to derive a sentence to sentence translation clauses or phrases. However, it can denote a whatever be the MT system. A translation can be only different function in Hindi depending on the in the form of a number of sentences ‘explaining’ the punctuation mark used in the relevant sentence. situation. 349
no reviews yet
Please Login to review.