310x Filetype PDF File size 0.30 MB Source: aclanthology.org
Divergence Patterns in Machine Translation between Hindi and English
R. Mahesh K. Sinha Anil Thakur
Indian Institute of Technology Kanpur Indian Institute of Technology Kanpur
rmk@iitk.ac.in anilt@iitk.ac.in
noted (Dorr, 1994) that certain types of translation
Abstract divergences are universal in the sense that they
The issue of translation divergence is an exist across the languages whereas certain other
important research topic in the area of types of translation divergences are specific to a
machine translation. An exhaustive study of pair of translation languages. Therefore, the
the divergence issues in MT is necessary for translation divergences need to be studied from
their proper classification and resolution. In both across-language and language-specific
the literature on MT, scholars have examined perspectives. In this paper, we examine Hindi and
the issue and have proposed ways for their English translation language pair largely from the
classification and resolution (Dorr 1993, perspective of identifying the language-specific
1994). However, the topic still needs further divergecnes. Hindi and English differ in many
exploration to identify different sources of
2
translation divergence in different pairs of respects and hence this translation language pair
translation languages. In this paper, we discuss presents a rich source for the study of translation
translation patterns between Hindi and English divergence in MT. These languages also show
of different types of constructions with a view significant differences from the point of view of
to identifying the potential topics of the socio-cultural perspectives that need to be properly
translation divergences. We take Dorr’s (1993, examined. In this paper, we discuss different
1994) classification of translation divergence aspects of Hindi and English grammars that
as the base to examine the different topics of
translation divergence in Hindi and English. involve potential areas of translation divergences
The primary goal of the paper is to point out in Hindi and English MT. We discuss divergence
different types of translation divergences in issues in Hindi-English machine translation and the
Hindi and English MT that have not been same translation pair is then examined for reverse
discussed in the existing literature. translation from English-Hindi so as to examine
1 Introduction the nature of the divergence in each case.
The issue of translation divergence is a complex In the existing literature, the issue of translation
topic in machine translation (MT). The translation divergence for Hindi and English MT has not been
divergence can be defined in terms of language-to- exhaustively examined. Gupta et al (2003) and
language differences in the respective grammars. Dave et al (2001) discuss some of the translation
Thus a divergence occurs when a sentence in divergences pertaining to English-Hindi MT and
language L translates into a sentence in L in a Hindi-English MT. Dave et al (2001) discusses the
1 2 issue within the UNL-based Interlingua approach
1
very different form (Dorr 1994: 12). The topic has and only some of the obvious types of divergences
been studied from different perspectives and a have been discussed. These works do not explore
number of approaches have been proposed to further areas of divergence. Some of the obvious
handle them. It is crucial for any MT system to divergence types such as thematic divergence,
identify the nature of translation divergences and dative divergence and movement divergence have
resolve them so as to obtain correct translation. not been discussed at all. Although the authors
The translation divergences occur at different point out divergences resulting from the pro-drop
levels and affect the quality of the translation phenomenon in Hindi and the occurrence of
according to the degree of complexity involved in pleonastic subjects in English, they do not examine
a particular translation divergence. It has also been the issue in detail to capture the implications of
1 It should be noted that what constitutes a translation 2 One of the reviewers has correctly pointed out that
divergence is not dependent upon the translation languages other than Hindi which are equally distant
strategy used for machine translation. This is contrary to from English can be assumed to exhibit similar or more
the views expressed by one of the reviewers. In this such divergences as discussed in this paper. Natural
paper we have taken this definition of divergence and languages are very complex and no research on
presented structural differences both in forward and translation divergence can be said to be exhaustive,
reverse directions irrespective of the MT strategy. particularly at this stage of research.
346
the perspective of the universal grammar. The
these language-specific features for other types of
divergences. Also, some of the examples that have classification captures the major grammatical
been discussed under head-swapping divergence issues in translation divergence across languages.
such as promotional and demotional divergences However, it also misses a number of points that
need to be re-looked for their proper pertain to a particular set of translation languages.
categorization. For instance, on (as in “the play is The issue of divergence between a set of languages
on” => khel cal rahaa hE {play go PROG be.PR} is associated with a number of factors ranging from
has been taken as an adverbial element in English linguistic to socio- and psycho-linguistic aspects of
which has a verbal realization in Hindi. However, the languages involved. Although Dorr’s
if we recognize this use of ‘on’ (meaning in Hindi classification takes into account many of the major
as ‘caalu’) as an adjectival element, the divergence linguistic factors associated with translation
no longer exists. The Hindi translation (khel caalu divergence, there still remains a number of points
hE {play on be.PR} for the English sentence (“the related to both linguistic and extra-linguistic
play is on.”) is equally valid and a commonly used factors that may exist in different sets of translation
sentence. Gupta et al (2003) discusses only a few languages. Furthermore the parameters of the
cases of divergence to present rules for unification classification does not take into account subtle
of translation divergences in English-Hindi MT. semantic factors to the extent they are relevant for
Thus we notice that the existing works are far from the classification of translation divergences in
exhaustive both from the point of view of various languages. Without going into a detailed
classification and resolution of different translation discussion of the different classes and categories of
divergences in the context of Hindi-English MT. translation divergences as proposed in Dorr (1993,
In section 2, we discuss different sources of 1994), we discuss English and Hindi translation
translation divergences in Hindi and English MT. examples that present new sources and topics of
Section 3 presents a brief outline of strategy used translation divergence in English-Hindi and Hindi-
in dealing with these divergences in our MT English MT.
system followed by the concluding remarks. 2.1 Non-Configurational Nature of Hindi
2 Translation Divergence: Classification and English is a configurational language that
Further Issues follows a rigid word order pattern as opposed to
Dorr (1993) categorizes translation divergences Hindi which is relatively less rigid and exhibit free
into two broad types. They are: (A) Syntactic word order variation. This is one of the major
Divergences, (B) Lexical-semantic Divergences. sources of divergence between a pair of natural
They are further subcategorized as follows: languages. In Dorr’s classification, word order
(A) Syntactic Divergence: i. Constituent order related translation divergences have been discussed
divergence, ii. Adjunction divergence, iii. under syntactic divergence. Dave el al (2001)
Preposition-stranding divergence, iv. Movement extends Dorr’s classification to English-Hindi
divergence, v. Null subject divergence, vi. Dative translation pair but do not discuss the implications
divergence, and vii. Pleonastic divergence of the word order facts at all. For instance, one of
(B) Lexical-semantic Divergence: i. Thematic the implications of the word order related
divergence, ii. Promotional divergence, iii. divergence can be noticed with respect to the
Demotional divergence, iv. Structural divergence interpretation of the question particle ‘kyaa’ (Sinha
v. Conflational divergence, vi. Categorial et al. 2005c) in Hindi. ‘kyaa’ can be used both as a
divergence, and vii. Lexical divergence marker of interrogative pronoun in content
In Dorr (1994), she has examined the structure of question sentences and as a question particle in
the lexical-semantic divergences and proposed a yes-no question sentences. Besides certain other
LCS-based approach for their resolution. This factors such as the category of the verb, it is the
classification takes into account various sources of position of occurrence of ‘kyaa’ that indicates its
differences between a set of translation language interpretation one way or the other. The particle
and captures a large sets of translation divergences. ‘kyaa’ in the sentence-initial and sentence-final
The classification is based on the Government and positions are generally interpreted as question
Binding framework (Chomsky 1986, Jackendoff particle rather than as an interrogative pronoun, as
1990) of linguistic theory which assumes a deep is evident from the examples shown in (1).
structure to capture the surface structure variations.
The deep structure functions as the universal (1) a. aap kyaa paDh rahe hEN? {you what read
structure, i.e. applicable across languages. Thus PROG be.PR} => What are you reading?
both the classification and the resolution of the b. kyaa aap paDh rahe hEN? {QP you read
translation divergences are largely discussed from PROG be.PR} => Are you reading?
347
The examples in (1)3 show subtle implications English. In the case of the reverse translation from
with respect to the word order facts in Hindi. English to Hindi no divergence is encountered.
Replicative and Echo Words 2.2 Expressive Elements
Hindi, like most of the other South Asian
languages, exhibits the phenomenon of replication Expressive words exist in all natural languages
(Sinha et al. 2005d) of the lexical items to express and pose difficulty in processing, particularly in
different grammatical functions. The English mapping onto another language. The reason is that
counterparts of these Hindi constructions do not these words do not have exact parallel in another
resort to replicative structure. This distinction often language. Thus the word dhaRaam is only distantly
results into a change in the syntactic category of mapped by ‘bump’ in English, as in (6).
the relevant elements. For instance, we notice that (6) vah dhaRaam se girii. {she ‘dhaRaam’ with
in Hindi, as in (2), the replication of the verb (in fell} => She fell with a ‘bump’.
participial form) denote an adverbial function of The expressive words usually originate from the
cause. The English counterpart of this function is sound associated with the semantics of the action
realized by a gerundive prepositional phrase. verb and can be adverbial or verbalized action-
(2) vah bolate bolate thak gayaa. {he speak verbs such as ‘tap-tapaanaa’ (drip), ‘khat-
speak tired got} => He got tired of speaking. khataanaa’ (knock) etc. One may argue this to be
In this example, the replicative element bolate just a lexical gap but indeed it is not so. However,
bolate is an adverbial clause which is realized some of these words can be handled in the lexicon
lexically in Hindi and is mapped in English but as in many cases the mapping also involves
structurally. The reverse translation for this structural changes, the issue involves a wider scope
example set does not involve divergence4, as in (3). of interpretation.
(3) He got tired of speaking. => vah bolane se 2.3 Asymmetry in NP and Existential Clauses
thak gayaa. {he speak of tired got}
Another typological feature exhibited by all the The issue of divergence related to the difference
Indian languages is the occurrence of echo words in the determiner systems of English and Hindi
where a lexical word is partially replicated to NPs has not been examined in the existing
denote a wide range of meanings with subtle literature on divergence. English has (in)definite
semantic constraints. The examples in (4-5) are articles that mark the (in)definiteness of the noun
illustrative. phrase overtly whereas Hindi lacks an overt article
(4) caay vaay pii kar jaaiye. {tea EW drink CPP system and different devices are used to realize the
go} => Have some snacks before going. (in)definiteness of a noun phrase in Hindi. For
(5) ise Thiik se jaaNc vaaNc lo. {this properly instance, mapping onto articles a-an/the in English
examine EW take} => Please examine it is not lexically realizable from Hindi (e.g. laRakaa
properly. <=> ise Thiik se jaaNc liijiye. aayaa => The/*a boy came.). In this connection,
The echo words generally have no lexical status another point of divergence between Hindi and
in the lexicon of the language. However, whenever English related to there- and it-sentences in
an echo word is identical with a lexical word, it English is worth examining. In English, there- and
affects the interpretation of the preceding lexicon. it-constructions are used to denote existential
In (4), the use of an echo word ‘vaay’ along with sentences (besides others). Hindi does not have a
the main word ‘caay (tea) gives the sense of light pleonastic subject construction and the contrast
refreshment. However, this is not a possible sense between existential and non-existential (mostly
in which an echo word is used in (5). Here the definite) sentences is realized by several other
main verb jaNcanaa ‘examine’ occurs with an ways such as the movement of the noun phrase
echo word that has only an emphatic (or extension) from its canonical position and the use of
function but it cannot be exactly expressed in demonstrative elements. Let us look at the
examples in (7-8).
(7) kamare meN saaNp hE. {room in snake
3 ACC:Accusative Case, AFF:Affirmative, CAUS:Causative,
CONT:Continuative Aspect, CPP:Conjunctive Participial be.PR} => There is a snake in the room.
Particle, DAT:Dative Case, DIT:Ditransitive, ERG: Ergative (8) saaNp kamare meN hE. {snake room in
Case, EW:Echo Word, FU:Future Tense, GER: Gerund, be.PR} =>The snake is in the room.
HAB:Habitual Aspect, IMP:Imperfective Aspect, IMPR: We notice that the bare noun phrase saaNp
Imperative Mood, INT:Interrogative, OPT:Optative Mood, ‘snake’ in (7) and (8) is mapped by indefinite and
PASS:Passive Particle, PR:Present Tense, PST:Past Tense, definite noun phrases in English. However, the
QP:Question Particle, SUBJ:Subjunctive Mood, TRS:
Transitive, VPRT:Verbal Participle. only difference between these two Hindi sentences
4 In case of multiple possible translations, if any one is the respective positions of the subject NP and
the translations exhibit the same grammatical structure, the (place) adverbial phrase. When we look at the
it is considered as a case of no divergence.
348
reverse translation of the same translation morphology on the verb in both the tenses.
sentence, the nature of divergence is different. However, this habitual aspect in English is realized
(9) There is a snake in the room. => kamare by the use of a phrasal verb in the case of the past
meN ek saaNp hE. {room in a snake be.PR} tense (12) and by the use of an adverbial word
Hindi does not have a counterpart of “there- ‘often’ in the case of the present (and future) tense
construction” and the Hindi grammar has to resort (14). Thus the adverbial element in Hindi is
to a number of devices such as shifting of the optional whereas the one in English cannot be
relevant elements and deletion of ‘there’ to obtain optional. In (14), we notice that the non-
the equivalent of the English sentence, as in (9). terminative aspect is realized by verbal
2.4 Tense, Moods and Aspects (TAM) morphology in Hindi whereas English uses a
phrasal structure to realize this aspect.
Another important source of translation (12) raam aayaa karataa thaa. {Ram come
divergence in Hindi and English MT is associated CONT be.PST} => Ram used to come.
with the difference in the manifestation of different (13) raam (aksar) aayaa karataa hE.
tense, moods and aspectual properties of the verb {Ram often come CONT be.PR} => Ram
in these languages. For instance, Hindi uses a *(often) comes.
certain type of passive construction that marks a (14) raam bolataa rahaa. {Ram speak CONT}
kind of non-volition function. The English => Ram kept on speaking.
counterparts of such Hindi sentences are only In certain types of conditional clauses in Hindi,
partially able to express the exact meaning. there is optainality between present and future/past
(10) raam se galatii ho gaii. {Ram by mistake tenses. But the English counterparts of these Hindi
be PASS} => Ram made a mistake. <=> sentences always require the verb to occur in the
raam-ne galatii kii. present tense.
The possible English counterpart of the Hindi (15) yadi tum dillii jaate ho / jaoge to tum
sentence in (10) is far from the actual sense in kaamyaab hoge.{if you Delhi go FU / PST
which the Hindi impersonal passive has been used. then you successful be.FU} => If you go to
The literal sense will be somewhat like: ‘a mistake Delhi you will be successful. <=> yadi tum
got made by Ram unintentionally’. Thus the dillii jaataa ho to tum kaamyaab hogaa. {if
reverse translation for the same translation you Delhi go then you successful be.FU}
sentence from English to Hindi involves far more The reverse translation from English to Hindi
complex procedure 5 . A somewhat similar will produce only the source Hindi sentence that
dimension of divergence between Hindi and has the verb in the present tense form and hence
English is manifested with respect to the negative will not involve any translation divergence.
impersonal passive constructions in Hindi and the 2.5 Role of Conjunctions and Particles
way they are realized in English.
(11) raam se calaa nahiiN jaataa. {Ram by Another source of divergence between Hindi and
walk not PASS} => Ram cannot walk. <=> English can be located in the case of the use of
raam cal nahiiN sakataa. different conjunctions and particles in Hindi. We
In this case, too, no translation divergence occurs take examples involving some of these particles in
in the case of the reverse translation and the source Hindi such as ki, na, and yaa. The translation
Hindi sentence cannot be obtained. divergence between Hindi and English related to ki
In Hindi, some of the aspectual features of the is quite complex (Sinha and Thakur, 2005b). ki is
verb are realized by verbal inflection whereas mainly used as a sentence complementizer, but can
English resorts to different non-inflectional ways also be used to indicate alternate conjunction in an
such as phrasal verb or an adverbial element or a affirmative sentence (16) and an interrogative
prepositional phrase with gerund as the head, to sentence (18) in Hindi.
realize them. For instance, in (12-13), the aspectual (16) siitaa mujhase milii na ki usase. {Sita me
property is identical in both the sentences and the met not him} => Sita met me not him.
difference is located only in tense. The habitual (17) raam paDhataa hE ki sotaa hE? {Ram read
aspect of the tense is reflected by inflectional PROG be.PR or sleep PRPG be.PR} => Does
Ram study or sleep? <=> kyaa raam
5One of the reviewers has argued that such a claim paDhataa hE yaa sotaa hE.
makes no sense as it can only be made in relation to a In another instance, yaa (‘or’) is a coordinate
given system. The point we are making here is that it is conjunction particle in Hindi that conjoins two
not possible to derive a sentence to sentence translation clauses or phrases. However, it can denote a
whatever be the MT system. A translation can be only different function in Hindi depending on the
in the form of a number of sentences ‘explaining’ the punctuation mark used in the relevant sentence.
situation.
349
no reviews yet
Please Login to review.