125x Filetype PDF File size 0.13 MB Source: aclanthology.org
AMetagrammarforVietnameseLTAG 129 AMetagrammarforVietnameseLTAG LêHồngPhương NguyễnThịMinhHuyền AzimRoussanaly LORIA/INRIALorraine HanoiUniversity of Science LORIA/INRIALorraine Nancy, France Hanoi, Vietnam Nancy, France lehong@loria.fr huyenntm@vnu.edu.vn azim@loria.fr Abstract of natural language processing in general and in We present in this paper an initial inves- the task of parsing Vietnamese in particular. No tigation into the use of a metagrammar work on formalizing Vietnamese grammar is re- for explicitly sharing abstract grammati- ported before (Nguyen et al., 2004). In (Lê et cal specifications for the Vietnamese lan- al., 2006), basic declarative structures and comple- guage. Wefirst introduce the essential syn- ment clauses of Vietnamese sentences have been tactic mechanisms of the Vietnamese lan- modeled using about thirty elementary trees, rep- guage. We then show that the basic sub- resenting as many subcategorization frames. We categorization frames of Vietnamese can show in this paper that these basic subcatego- be compactly represented by classes us- rization frames can be compactly represented by ing the XMGformalism(eXtensible Meta- classes in XMG formalism. Grammar). Finally, we report on the im- Wefirst introduce the essential syntactic mech- plementation the first metagrammar pro- anisms of the Vietnamese language. We then show ducing verbal elementary trees recogniz- that the basic subcategorization frames of Viet- ing basic Vietnamese sentences. namese can be compactly represented by classes using the XMG formalism. We then report on the 1 Introduction implementation the first metagrammar producing Metagrammars (MG) have recently emerged as a verbal elementary trees recognizing basic Viet- means to develop wide-coverage LTAG for well- namese sentences, before concluding. studied languages like English, French and Ital- 2 Vietnamese Subcategorizations ian (Candito, 1999; Kinyon, 2003). MGs help avoid redundancy and reduce the effort of gram- As for other isolating languages, the most impor- mardevelopment bymaking useofcommonprop- tant syntactic information source in Vietnamese is erties of LTAG elementary trees. wordorder. Thebasic wordorder isSubject –Verb We present in this paper an initial investiga- – Object. A verb is always placed after the sub- tion into the use of a metagrammar for explic- ject in both predicative and question forms. In a itly sharing abstract grammatical specifications for noun phrase, the main noun precedes the adjec- the Vietnamese language. We use the eXtensible tives and the genitive follows the governing noun. MetaGrammar (XMG) tool which was developed The other syntactic means are function words, byCrabbé(Crabbé,2005;ParmentierandL.Roux, reduplication, and, in the case of spoken language, 2005) to compile a TAG for Vietnamese. The built prosody (Nguyễn et al., 2006). grammar is called vnMG and is made available From the point of view of functional gram- 1 online for free access . mar, the syntactic structure of Vietnamese fol- Only in recent years have Vietnamese re- lows a topic-oriented structure. It belongs to the searchers begun to be involved in the domain topic-prominent languages as described by (Li and 1http://www.loria.fr/∼lehong/tools/vnMG.php Thompson, 1976). In those languages, topics are Proceedings of The Ninth International Workshop on Tree Adjoining Grammars and Related Formalisms Tübingen, Germany. June 6-8, 2008. 130 Le, Nguyen and Roussanaly codedinthesurfacestructure andthey tend tocon- is feeble., Học cũng là làm việc / To study is trol co-referentiality. The topic-oriented “double to work. subject” construction is a basic sentence type. For 2.3 ThirdTypePredicates example, “Cậu ấy khoẻ mạnh, là sinh viên y khoa / He strong, be student medicine”, which means Thethirdtypepredicates arepredicates whichcon- that “Heis strong, he is medicine student”. In Viet- nect directly to their subjects in the declarative namese, passive voice and cleft subject sentences form; however in the negative form, they are con- are rare or non-existent. nected to their subjects by a copula. Predicates of In general, Vietnamese predicates may be clas- this type are usually sified into three types depending on the need of a • A clause: Nó vẫn tên là Quþt. / His name is copula connecting them with their subjects in the still Quþt. declarative and negative forms (Nguyễn, 2004). Complexpredicates canbeconstructed toformco- • A composition of a numeral and a noun: Lê ordinated predicative structures starting from these này mười ngàn đồng. / This pear costs ten basic types of predicates. We present briefly these thousand dongs. three types of Vietnamese predicates in the follow- ing subsections. • A composition of a preposition and a noun: Lúanày của chị Hoa. / This is the rice of Ms. 2.1 First Type Predicates Hoa. Thefirst type predicates are predicates which con- • An expression: Thằng ấy đầu bò đầu bướu nect directly to their subjects without the need of lắm. / That guy is very stubborn. a copula in both of the declarative and negative forms. For example 2.4 Subcategorizations • Declarative form:Tôiđọcsách. /Iamreading In the first grammar LTAG for Vietnamese pre- books. sented in (Lê et al., 2006), each subcategorization is represented by the same structure of elemen- • Negative form: Tôi không đọc sách. / I am not tary trees associcated with a considered predicate. reading books. We view that the suject is subcategorized in the These predicates are assumed by verbal phrases or same way like arguments. The verbs anchor thus adjectival phrases. Thefact that an adjective can be elementary trees composed of a node for the sub- a predicate is a specificity of Vietnamese in com- ject and one or more nodes for each of its essential parison with predicates of occidental languages. In complements. English or French for instance, only verbal phrases Wefollow the de facto standard that in TAG, in can be predicates, adjectives in these languages al- which each subcategorization is represented by a wayssignify properties of subjects and they are al- family of elementary trees. We define families of waysfollowed the verb “to be” in English or “être” verbal elementary trees in the Table 1. in French. We present in the next section a metagrammar that generates this set of elementary trees. 2.2 SecondTypePredicates The second type predicates are predicates which 3 AMetagrammarforVerbalTrees are connected to their subjects by the copula “là” The subcategorizations of elementary trees de- in the declarative form and by copulas “không là” scribe only “canonical” constructions of predica- or“khôngphải”,or“khôngphảilà”inthenegative tive elements without taking into account for rela- form. Predicates of this type are rather rich. They tive or question structures. For the purpose of in- can be: vestigation, we constraint ourselves in developing • Nouns or noun phrases: Tôi là sinh viên. / I at the first stage only the verb spines and argument amstudent. realizations shown in the subcategorizations pre- sented in the previous section. • Verbs, adjectives, verbal phrases or adjecti- We have developed a XMG metagrammar that val phrases: Van xin là yếu đuối. / Begging consists of 11 classes (or tree fragments). The Proceedings of The Ninth International Workshop on Tree Adjoining Grammars and Related Formalisms Tübingen, Germany. June 6-8, 2008. AMetagrammarforVietnameseLTAG 131 Subcategorizations Families Examples S Intransitive N V ngủ/sleep 0 With a nominal N VN đọc/to 0 1 N ↓ PredP complement read 0 With a clausal N VS tin/to be- 0 1 complement lieve tôi V⋄ N1 ↓ With modal com- N V V mong/to 0 0 1 plement wish đọc sách Ditransitive N VN N cho/to 0 1 2 give Figure 1: Declarative transitive structure αn0V n1 Ditransitive with a N VN ON vay/to 0 1 2 preposition borrow Ditransitive with a N V N V lãnh 0 0 1 1 4 Conclusion and Future Work verbal complement đạo/to lead This paper presents an initial investigation into Ditransitive with an N VN A làm/to the use of XMG formalism for developing a first 0 1 adjectival comple- make metagrammar producing a LTAG for Vietnamese ment which recognizes basic verbal constructions. We Movement verbs N V V N ra/to go have shown that the essential subcategorization 0 0 1 1 with a nominal out frames ofVietnamese predicates can be effectively complement encoded by means of XMG classes while retain- Movement verbs N V AV trở nên/to ing basic properties of the realized verbal trees. 0 0 1 with an adjectival become Thisconfirms that various syntactic phenomena of complement Vietnamese can be covered in a Vietnamese MG. Movementditransi- N V N V N chuyển/to The first evaluation of the MG for Vietnamese 0 0 1 1 2 tive transfer is promising but the lexical coverage has to be improved further. Moreover, the grammar cover- Table 1: Subcategorizations of Vietnamese verbs age needs to be revised by refining the constraints of agrammatical syntactic constructions. Although metagrammar is currently able to produce the there are not many tree fragments in the current same set of elementary trees described in Table 1 metagrammar, we find that the current MG over- including intransitive, transitive, ditransitive fami- generates some undesired structures. The MG will lies with and/or without optional complements. As also be extended to deal with constructions not yet an illustration, the declarative transitive structure covered like adjectival and noun phrase construc- in Figure 1 can be defined by combining a canon- tions. We also intend to generate a test suite to doc- ical subject fragment with an active verb and a ument the grammars and perform realistic evalua- canonical object fragment. tions. There is an existing work on the development S + S + S of metagrammars for not frequently studied lan- guages like Korean and Yiddish and their rela- tions to a German grammar (Kinyon, 2006). They N↓ PredP V PredP showed that cross-linguistic generalizations, for example the verb-second phenomenon, can be in- corporated into a multilingual MG. We think that V V N↓ a comparison of the Vietnamese MG with this This combination is conveniently expressed by work would be useful. In particular, a study of the a statement in terms of XMG language as usual: relative position of verbs and arguments of Viet- namese and relate it to this work would be benefi- tial. TransitiveVerb = Subject ∧ ActiveVerb ∧Object: Proceedings of The Ninth International Workshop on Tree Adjoining Grammars and Related Formalisms Tübingen, Germany. June 6-8, 2008. 132 Le, Nguyen and Roussanaly References Marie-Hélène Candito. 1999. Représentation modu- laire et paramétrable de grammaires électroniques lexicalisées : application au franc¸ais et à l’italien. Doctoral Dissertation, Université Paris 7. Benoit Crabbé. 2005. Représentation informatique de grammairesfortement lexicalisées. Doctoral Disser- tation, Université Nancy 2. Nguyễn Thị Minh Huyền, Laurent Romary, Mathias Rossignol and Vũ Xuân Lương. 2006. A Lexicon for VietnameseLanguageProcessing. LanguageRe- sources and Evaluation, Vol. 40, No. 3–4. Kinyon A. and Rambow O. 2003. Using the Meta- Grammar to generate cross-language and cross- framework annotated test-suites. In Proc. LINC- EACL,Budapest. Alexandra Kinyon and Carlos A. Prolo. 2002. A Clas- sification of Grammar DevelopmentStrategies. Pro- ceedingsoftheWorkshoponGrammarEngineering, Taipei, Taiwan. Kinyon, Alexandra and Rambow, Owen and Schef- fler, Tatjana and Yoon, SinWon and Joshi, Aravind K. 2006. The Metagrammar Goes Multilingual: A Cross-Linguistic Look at the V2-Phenomenon. Pro- ceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms, Sydney,Australia Lê Hồng Phương, Nguyễn Thị Minh Huyền, Laurent Romary, Azim Roussanaly. 2006. A Lexicalized Tree-Adjoining Grammar for Vietnamese. Proceed- ings of LREC 2006,Genoa, Italia. Thanh Bon Nguyen, Thi Minh Huyen Nguyen, Lau- rent Romary, Xuan Luong Vu. 2004. Developing Tools and Building Linguistic Resources for Viet- namese Morpho-Syntactic Processing. Proceedings of LREC2004,Lisbon,Portugal. Charles N. Li and Sandra A. Thompson. 1976. Subject and topic: a new typology of language. In Charles N. Li (ed.). Subject and Topic. London/New York: AcademicPress, pp. 457-489.. Yannick Parmentier and Joseph L. Roux. 2005. XMG: a Multi-formalism Metagrammar Framework. Pro- ceedings of the Tenth ESSLLI Student Session. Nguyễn Minh Thuyết and Nguyễn Văn Hiệp. 2004. ThànhphầncâutiếngViệt. NXBGiáodục,HàNội, Vietnam. Proceedings of The Ninth International Workshop on Tree Adjoining Grammars and Related Formalisms Tübingen, Germany. June 6-8, 2008.
no reviews yet
Please Login to review.