jagomart
digital resources
picture1_Processing Pdf 105116 | 95 Paper


 120x       Filetype PDF       File size 0.43 MB       Source: www.lrec-conf.org


File: Processing Pdf 105116 | 95 Paper
challenges and solutions for consistent annotation of vietnamese treebank 1 2 1 2 3 4 quyt nguyen yusuke miyao ha t t le ngan l t nguyen 1thegraduate university for ...

icon picture PDF Filetype PDF | Posted on 24 Sep 2022 | 3 years ago
Partial capture of text on file.
                                   Challenges and Solutions for Consistent Annotation of Vietnamese Treebank
                                                                                                              1&2                                                  1&2                                        3                                                          4
                                                                   QuyT.Nguyen                                         , Yusuke Miyao                                       , Ha T.T. Le , Ngan L.T. Nguyen
                                                                                         1TheGraduate University for Advanced Studies (SOKENDAI), Japan
                                                                                                                          2National Institute of Informatics, Japan
                                                                                                        3University of Social Sciences and Humanities, Vietnam
                                                                                                                4 University of Information Technology, Vietnam
                                                                            quynt@nii.ac.jp, yusuke@nii.ac.jp, trucha.ussh@gmail.com, ngannlt@uit.edu.vn
                                                                                                                                                             Abstract
                              Treebanks are important resources for research in natural language processing, speech recognition, theoretical linguistics, etc. To
                              strengthen the automatic processing of the Vietnamese language, a Vietnamese treebank has been built. However, the quality of this
                              treebank is not satisfactory and is a possible source for the low performance of Vietnamese language processing. We have been building
                              a new treebank for Vietnamese with about 40,000 sentences annotated with three layers: word segmentation, part-of-speech tagging,
                              and bracketing. In this paper, we describe several challenges of Vietnamese language and how we solve them in developing annotation
                              guidelines. We also present our methods to improve the quality of the annotation guidelines and ensure annotation accuracy and
                              consistency. Experiment results show that inter-annotator agreement ratios and accuracy are higher than 90% which is satisfactory.
                              Keywords:Vietnamese Treebank, Consistent Annotation, Challenges and Solutions
                                                                                                                                                                                               Treeing a Vietnamese sentence
                                                                        1.          Introduction                                                                                     Original sentence: 
                              Treebanks–corpora annotated with syntactic structures, are                                                                                                        Nam kểvềtai nạn hôm qua. 
                                                                                                                                                                                                {Nam tells about the yesterday's accident.}
                              importantresourcesforresearchersinnaturallanguagepro-
                              cessing (NLP). Treebanks provide important syntactic in-                                                                                               1.         Word segmentation: 
                              formation in order to improve the quality of NLP tools. To                                                                                                        Nam kể                về          tai_nạn                  hôm_qua                       . 
                                                                                                                                                                                                             to tell      about                 accident                    yesterday
                              strengthen the automatic processing of the Vietnamese lan-                                                                                              2.        POS tagging: 
                              guage, Nguyen et al. (2009) have built a Vietnamese tree-                                                                                                         Nam/Nrkể/Vvvề/Cs tai_nạn/Nn hôm_qua/Nt ./PU 
                              bank, named VLSP treebank, containing 10,000 sentences.
                              However, the quality of the VLSP treebank, including the                                                                                                3.        Bracketing: 
                              quality of the annotation scheme, the annotation guidelines,                                                                                                      (S 
                              andtheannotationprocess,isnotsatisfactoryandisapossi-                                                                                                                       (NP-SBJ (Nr-H Nam)) 
                              ble source for the low performance of Vietnamese language                                                                                                                   (VP (Vv-H kể)
                              processing (Nguyen et al., 2012; Nguyen et al., 2013).                                                                                                                                 (PP-DOB (Cs-H về) 
                              We have been building a new Vietnamese treebank with                                                                                                                                                  (NP (Nn-H tai_nạn)
                                                                                                                                                                                                                                               (NP-TMP (Nt-H hôm_qua))))) 
                              3,000 texts (about 40,000 sentences) covering 14 topics                                                                                                                     (PU .)) 
                              collected from a Vietnamese online newspaper, Thanhnien
                                         1
                              news . Our treebank is annotated with three layers: word
                              segmentation (WS), part-of-speech (POS) tagging, and                                                                                          Figure 1: An example to illustrate process of treeing a Viet-
                                                                                                        2                                                                   namesesentence.
                              bracketing as showed in Figure 1 . We have found that en-
                              suringtheannotationconsistencyandaccuracyisoneofthe
                              most important considerations in the annotation of a tree-                                                                                    with other languages (e.g., English and Chinese) to indi-
                              bank. This requires clear and complete annotation guide-                                                                                      cate that building a high-quality Vietnamese treebank is a
                              lines. The guidelines contain the annotation scheme, con-                                                                                     challenging problem. We also present our methodology to
                              sistent principles to annotate linguistic phenomena,andsuf-                                                                                   tackle the challenges in this section. We then discuss dif-
                              ficient examples. These documents are not only used to                                                                                        ficulties in WS, POS tagging, and bracketing, and how we
                              train annotators but also valuable sources serving the uses                                                                                   solve them in developing the annotation guideline in Sec-
                              of the treebank.                                                                                                                              tion 3, 4, and 5 respectively. Finally, in Section 6, we de-
                              WepreparedthreesetofguidelinesfortheVietnamesetree-                                                                                           scribe our annotation process, how we revise the guidelines
                              bank: WSguidelines, POS tagging guidelines, and bracket-                                                                                      during the annotation process, and methods to ensure the
                              ingguidelines.Inthispaper,Section2describesthegeneral                                                                                         annotation consistency and accuracy.
                              characteristics of the Vietnamese language in comparison                                                                                      This study is not only beneficial for the development of
                                     1http://thanhnien.vn                                                                                                                   computational processing technologies for Vietnamese, a
                                     2Underscore "_" is used to link syllables of Vietnamese multi-                                                                         language spoken by over 90 million people, but also for
                              syllable words. Translation for the Vietnamese word is given as                                                                               similar languages such as Thai, Laos, and so on. This study
                              a subscript. If the Vietnamese word does not have a translatable                                                                              also promotes the computational linguistic studies on how
                              meaning,thesubscript is blank. Translation for a Vietnamese sen-                                                                              to transfer methods developed for a popular language, like
                              tence is given in curly brackets below the original text.                                                                                     English, to a language that has not yet intensively studied.
                                                                                                                                                                1532
                     Meaning: The construction unit is too slow.
                     a)                                    S                                    b)                                  S                              c)                               S
                                                                                                        NP-SBJ             Cp          ADJP-PRD             PU              SPL             Cp            SPL             PU
                              NP-SBJ                  ADJP-PRD                     PU               Nn-H        Vv         thì         R        Aa-H         .               NP             thì          ADJP              .
                                                     R         Aa-H                  .                                   {to be}                                      Nn-H        Vv                  R        Aa-H
                         Nn-H           Vv                                                         Đơn_vị    thi_công                 quá    chậm_chạp
                        Đơn_vị       thi_công       quá     chậm_chạp                                                                                                Đơn_vị     thi_công              quá   chậm_chạp
                         {unit}   {to construct}   {too}      {slow}
                                                       Figure 2: Examples showing ambiguity of annotating a sentence in Vietnamese.
                        2.      Characteristics of Vietnamese language                                                           (Xia, 2000b; Xia, 2000a; Xue et al., 2000), English
                                         andmethodologyforguideline                                                              PennTreebank(Santorini,1990;Biesetal.,1995),and
                                                            preparation                                                          VLSPtreebank (Nguyen et al., 2010b; Nguyen et al.,
                     Unlike Western languages, in which blank spaces denote                                                      2010a; Nguyen et al., 2010c) and adapt them to our
                     worddelimiters, in Vietnamese, blank spaces play the roles                                                  guidelines if possible.
                     of not only word delimiters but also syllable delimiters                                                                                                                          3
                     (Diep, 2005; SCSSV, 1983) that cause difficulties in defin-                                             • During the annotation process, annotators                                  are re-
                                                                                                                                 quested to discuss with us about the constructions that
                     ing words. In addition, unlike English and Japanese, Viet-                                                  they cannot annotate or feel ambiguous. These con-
                     namese is not an inflectional language for which morpho-                                                    structions are important clues to revise the guidelines.
                     logical forms can provide useful clues for word segmen-                                                 • We conduct nine rounds of measurement of inter-
                     tation and POS tagging. While similar problems also oc-                                                     annotator agreement and accuracy, for which two an-
                     cur with Chinese (Xia et al., 2000), annotating Vietnamese                                                  notators annotate the same data. The inconsistencies
                     words may be more difficult, because the modern Viet-                                                       and annotation errors found in each round are impor-
                     namese writing system is based on Latin characters, which                                                   tant clues to improve annotation guidelines and to train
                     represent the pronunciation but not the meaning of words,                                                   annotators again.
                     resulting in many homonyms.
                     Difficulties in Vietnamese occur in not only determining                                            Details of applying these approaches during the process of
                     wordsasmentionedabovebutalsobracketingphrases.One                                                   building the Vietnamese treebank are explained in the fol-
                     of the reasons is that there are many expressions having                                            lowing sections.
                     the same POS sequence but different phrase types in Viet-
                     namese. Other difficulties are caused by the fact that word                                                     3.      Wordsegmentationguidelines
                     order in Vietnamese is very flexible.                                                               3.1.       Challenges of word segmentation
                     Moreover, there is little consensus in community about
                     how to define words, phrases and grammatical structures.                                            Words are the most basic units of a treebank (Sciullo and
                     Though people agree that Vietnamese is the subject-verb-                                            Williams, 1987), and defining words is the first step in
                     object (SVO) language, Figure 2a shows a sentence in Viet-                                          the annotation process. (Xia, 2000b; Xia, 2000a; Sornlert-
                     namese that the head word of the predicate is not a verb.                                           lamvanich et al., 1999). For languages like English, defin-
                     For sentences that do not have the main verb, we can use                                            ing words is almost trivial, because the blank spaces de-
                     the conjunction thì to link the subject and the predicate as                                        note word delimiters. However, it is a difficult problem in
                     shown in Figure 2b. However, when the conjunction thì is                                            Vietnamese even for a native speaker. Although most lin-
                     used, linguists disagree about how to bracket this sentence.                                        guists agree that the Vietnamese language has two types
                     Diep (2005) considered this sentence as a single sentence                                           of words, single-syllable words (single words) and multi-
                     (Figure 2b), where the conjunction thì is used to link the                                          syllable words (compound words), distinguishing between
                     subject and the predicate. SCSSV (1983), in contrast, con-                                          single and multi-syllable words involves much ambiguity.
                     sidered this sentence as a subordinate compound sentence                                            Theambiguities of Vietnamese WS occur for the following
                     (Figure2c)becausetheysaidthattheconjunctionthìisused                                                reasons. First, in Vietnamese, blank spaces play the roles
                     to link two clauses of a subordinate compound sentence.                                             of not only word delimiters but also syllable delimiters.
                     WepreparedtheguidelinesfortheVietnamesetreebankin-                                                  Second, there are no morphological marks to act as impor-
                     cluding three sets: word segmentation guidelines, POS tag-                                          tant clues to identify words. Third, the Vietnamese writ-
                     ging guidelines, and bracketing guidelines. The problems                                            ing system is based on Latin characters, which represent
                     were tackled on the basis of the following approaches:                                              the pronunciation but not the meaning of words. Expres-
                         • We refer to Vietnamese grammar books (SCSSV,                                                  sions that have the same surface form but different word
                             1983; Diep, 2005) and discuss with our collaborators,                                       segmentation appear frequently in Vietnamese. Rows 1 and
                             who are Vietnamese linguistics experts, to solve the                                        2 in Table 1, for instance, show two different segmentation
                             ambiguities and difficulties.                                                                    3Ourtreebankisannotatedbytwoannotatorswhoaregraduate
                         • We study the guidelines of Chinese Penn Treebank                                              linguistics students.
                                                                                                                 1533
                     No.    Expression (A B)                Meaning       WS                     fromwhattheexpressionindicates,A_Bisconsidered
                      1     quần         áo                 clothes       a word
                                 trousers   shirt                                                as a compound word. In contrast, if B has a similar
                      2     quần         áo                 trousers      2words                 meaningtoAB,AandBareconsideredastwowords
                                 trousers   shirt           and shirt
                      3     ăn    nói                       to speak      a word                 (examples 8 and 9 in Table 1).
                              eat    speak
                      4     tìm     kiếm                    to find       a word
                               find      find
                      5     nồi    đồng                     copper pot    2words
                               pot      copper                                                • An expression of one or more Sino-Vietnamese sylla-
                      6     nồi    bằng   đồng              copper pot    3words
                               pot     by      copper                                            bles and an original Vietnamese word, in which the
                      7     đen      đúa                    black         a word
                               black                                                             Sino-Vietnamese syllables are the elements used to
                      8     cá     heopig                   dolphin       a word
                              fish                                                               create the new words, is not considered as a word (ex-
                      9     cá     lia_thia                 betta fish    2words
                              fish        bettafish                                              ample 10 in Table 1).
                     10     nghiên_cứu          viên−er     researcher    2words
                                       research
                     11     nhà     nghiên_cứu              researcher    2words
                               −er             research                                       • Specialclassifier nounsareconsideredassinglewords
                Table 1: Examples to illustrate the principles of word seg-                      (example 11 in Table 1).
                mentation.
                                                                                           It should be noted that these rules do not necessarily con-
                types of the expression quần áo. Fourth, there is little con-              form to the rules used by linguists. For example, Diep
                sistency in segmenting the expressions. For example, some                  (2005) considers the Sino-Vietnamese syllable viên                  in
                linguists consider the expression cá            rô         {anabas}                                                                      −er
                                                          fish     anabas                  example 10 in Table 1 as a component of the compound
                as a compound word but bệnh                  sởi          {measles}        word and considers the special classifier noun nhà               as a
                                                     illness    measles                                                                                −er
                as two words (Hoang, 1998; Diep, 2005). However, these                     single word. We, on the other hand, consider both viên
                expressions have a similar construction: the combination of                                                                                  −er
                                                                                           and nhà−er as single words because we found that they
                a categorization noun4 and a specific noun.                                both have the same grammatical function that is forming
                3.2.    Policy for annotation of word segmentation                         new words. However, in our guidelines, the word types for
                                                                                           which there is little consensus between linguists for seg-
                As mentioned above, our purpose for word segmentation                      menting them are annotated with additional information so
                is to build a treebank for Vietnamese. Therefore, we con-                  that such words can be automatically converted according
                sider a word as the smallest syntactic unit having a com-                  to the need.
                plete meaning and preventing syntactic rules from analyz-
                ing wordstructure (Sciullo and Williams, 1987). On the ba-                        4.    Part-of-speech tagging guidelines
                sis of this word definition, we propose the following rules
                to solve the difficulties in Vietnamese word segmentation:                 4.1.    Challenges of POS tagging
                   • If A and B5 have different meanings and the meaning                   Tagging POSforVietnamesewordsisnotatrivialproblem
                      of the combination form (A_B) is different from the                  because they are not marked with morphological features,
                      split form (A B), we select the form that has a mean-                such as tense, number, gender, etc. While the same prob-
                      ing more appropriate for the context. Examples 1 and                 lem also appears with Chinese, Vietnamese may be more
                      2 in Table 1 show an expression having two different                 difficult, because the Vietnamese writing system is based
                      meanings because of different word segmentation.                     on Latin characters, which represent the pronunciation, but
                   • If A and B have different meanings and A_B has the                    not the meaning of words.
                      same meaning as A or B, the combination form is se-                  Words that have the same surface form and pronunciation
                      lected. The example is given in row 3 of Table 1.                    but different meanings and grammar functions occur fre-
                                                                                           quently in the text. For example, we can understand the
                   • If A and B have the same meaning, the combination                     word mới in accordance with two meanings shown in rows
                      form is selected (example 4 in Table 1).                             1and2ofTable2.Ifweconsidermớiasanadjectivemod-
                                                                                           ifying the preceding word, the noun nghiên_cứuresearch,
                   • If another syllable can be inserted between A and B,                  it means new; The word mới means recently or just if we
                      weselect the split form (examples 5 and 6 in Table 1).               consider it as an adjunct modifying the following word, the
                   • IfAisawordandBisnot(orviceversa),weselectthe                          verb thực_hiệnto conduct.
                      combination form. Example 7 in Table 1 shows that if                 Determining POS of the words having the same surface
                      đúa is considered as a single word, its meaning is un-               form may be more ambiguous because a verb or an adjec-
                      defined. Therefore, it is considered as part of a multi-             tive can appear in the position of a noun as in the case of
                      syllable word.                                                       báo cáo in rows 3 and 4 of Table 2. Solely referring to the
                                                                                           sentence, we do not have any clue to determine if báo cáo
                   • For the expression of a categorization noun (A) and                   belongs to the verb class or noun class. Báo cáo means de-
                      a specific noun (B), if B indicates something different              fend if it is considered as a verb (row 3) and thesis if it is
                                                                                           considered as a noun (row 4).
                    4Categorization nouns indicate general entities, such as cá            Ambiguity of the POS tagging is also caused by the omis-
                                                                                 fish      sion of words which happens frequently in Vietnamese. For
                and cây     .
                        tree
                    5Without loss of generalization, we assume the expression we           example, if a verb or an adjective plays the same roles as
                wanttosegmentisAB,whereAandBcanbesyllablesorwords.                         a noun, it is actually preceded by a special classifier noun
                                                                                     1534
                     No.    Wordincontext                                Word               POS              No.   POS         Meaningoftag          No.    POS         Meaningoftag
                      1     MộtnghiêncứumớithựchiệntạiNhật.              mới             Adjective                 tag                                       tag
                            {AnewreseachconductedinJapan.}                   new                              1    SV      Sino-Vietnamese            17    NA      Noun-adjective
                      2     MộtnghiêncứumớithựchiệntạiNhật.              mới              Adjunct                          syllable                   18    Vcp     Comparative verb
                            {Aresearch has just conducted in Japan.}         just                             2    Nc      Classifier noun            19     Vv     Other verb
                      3     Báocáotốtnghiệpcủacôấyrấttốt.                báo cáo            Verb              3    Ncs     Special classifier noun    20     An     Ordinal number
                            {Her final defense is very good.}            {defense}                            4    Nu      Unit noun                  21     Aa     Other adjective
                      4     Báocáotốtnghiệpcủacôấyrấttốt.                báo cáo           Noun               5    Nun     Administrative unit noun   22     Pd     Demonstrative pronoun
                            {Her thesis is very good.}                   {thesis}                             6    Nw      Quantifier indicating      23     Pp     Other pronoun
                            Việc báo cáo tốt nghiệp của cô ấy rất tốt.   việc báo cáo                                      the whole                  24     R      Adjunct
                      5     {Her final defense is very good.}            {defense}          Verb              7    Num     Number                     25     Cs     Preposition or conjunction
                            Cuốnbáocáotốtnghiệpcủacôấyrấttốt.            cuốn báo cáo                         8    Nq      Other quantifier                         introducing a clause
                      6     {Her thesis is very good.}                   {thesis}          Noun               9    Nr      Proper noun                26     Cp     Other conjunction
                            Bạnsẽđẹpnhấtđêmnay.                                                              10    Nt      Nounoftime                 27     ON     Onomatopoeia
                      7                                                  sẽ               Adjunct
                            {You will be the most beautiful girl tonight.} will                              11    Nn      Other noun                 28     ID     Idioms
                            Tôi sẽ đi Nhật vào tối nay.                                                      12    Ve      Exitting verb              29      E     Exclamation word
                      8                                                  sẽ               Adjunct            13    Vc      Copula "là" verb           30     M      Modifier word
                            {I will go to Japan tonight.}                  will
                                                                                                             14    D       Directional verb           31    FW      Foreign word
                   Table 2: Examples illustrating the challenges of POS tag-                                 15    VA      Verb-adjective             32     X      Unidentified word
                                                                                                             16    VN      Verb-noun                  33     PU     Punctuation
                   ging.                                                                                           Table 3: POS tag set designed for our treebank.
                                                                 6
                   (as the case of báo cáo in rows 5 of Table 2). Otherwise,
                   a noun is preceded by a classifier noun7 (the noun báo cáo                              tag P to annotate all pronouns. However, the pronouns used
                   in row 6 of Table 2 follows the classifier noun cuốn). How-                             to express space or time (demonstrative pronouns) such as
                   ever, such useful nouns are usually omitted in Vietnamese                               này        and đó          can be modifiers of the head nouns in
                   sentences which causes ambiguity of tagging words.                                           this           that
                                                                                                           noun phrases. Personal pronouns, in contrast, always play
                   Some linguists (SCSSV, 1983; Diep, 2005) have claimed                                   the roles of the head words of noun phrases.
                   that POS can be recognized by referring to the adjuncts                                 Therefore, in this work, we created a new POS tag set
                   modifying the words. For example, adjuncts indicating de-                               for Vietnamese. Our criteria to classify the words are also
                   gree and tenses modify adjectives and verbs, respectively.                              based on the combination abilities and the syntactic func-
                   However, this method does not necessarily work suffi-                                   tions of the words, like those of the VLSP treebank. How-
                   ciently with real texts. In practice, many verbs and adjec-                             ever, we referred to the linguistics literature, carefully ana-
                   tives in Vietnamese can be modified by the same adjunct.                                lyzed the roles of words and discussed with our linguistics
                   For example, the adjunct indicating tense, sẽwill shown in                              colleagues to create a new POS tag set for Vietnamese with
                   Table 2 can modify both the adjective đẹpbeautiful (row 7)                              33 tags which are shown in Table 3. Using our POS tags,
                   and the verb đi             (row 8).
                                       to go                                                               wecanrecognizetheroleofawordinaphraseorsentence.
                   Because of the above characteristics of Vietnamese, it is                               For example, the demonstrative pronouns modifying head
                   difficult not only to define the POS tag set but also to tag                            words of noun phases are annotated with the Pd label, and
                   each word in context. In addition, there is still little con-                           personal pronouns that are head words of noun phrases are
                   sensus between linguists as to methodology for classifying                              annotated with the Pp label.
                   words in Vietnamese. For instance, both Diep (2005) and
                   SCSSV (1983) classified the words based on their mean-                                  4.3.     Policy for annotation of part-of-speech
                   ings, their combination ability, and their syntactic func-                              In our POS tagging guidelines, the words are tagged on the
                   tions. However, Diep (2005) considered the words express-                               basis of the following criteria:
                   ing the whole, such as cả                 , tất_cả       , toàn_bộ        , etc.
                                                         all            all              all
                   as pronouns, while SCSSV (1983), in contrast, considered                                   • Combination ability of the word. For example,
                   them as nouns, and Hoang (1998) considered cả as a pro-                                       khó_khăn can be understood as difficulty or difficult.
                   nounandtất_cả as a noun in all contexts.                                                      However, if it is a noun, it cannot combine with the
                                                                                                                 adjunct rất          . If it is an adjective, it cannot combine
                   4.2.     Building part-of-speech tag set                                                                      very
                   In previous work, Nguyen et al. (2009) classified the words                                   with the quantifier những−s/−es.
                   onthebasisoftheircombinationabilityandsyntacticfunc-                                       • Syntactic function of the word. For example, if the
                   tion. They created a POS tag set for Vietnamese includ-                                       quantifier indicating the whole modifies a noun, it will
                   ing a total of 17 tags (except the tags for unknown words                                     beannotatedwithanNwtag.Thequantifierindicating
                   and the punctuation). However, this tag set cannot cover                                      the whole will be annotated with a Pp tag if it is head
                   all the combination abilities as well as the syntactic func-                                  wordofanounphrase.
                   tions of the Vietnamese words. For example, they used the
                       6Việc is a special classifier noun that is understood as -ion,                         • Meaningofthewordinthesentence.Forexample,the
                                                                                                                 combination ability of the verb đi                      and the adjec-
                   -ment, -ing, -ity, -ness, or so on when it comes before verbs or                                                                             to go
                   adjectives. An expression of the special classifier noun việc and a                           tive đẹpbeautiful mentionedaboveisthesame,theyare
                   verb or adjective is understood as a noun in English. For example,                            modified by the adjunct sẽ. They also have the same
                   học_tập means to learn, so to express learning, we can say việc                               syntactic function which is head word of predicates.
                   học_tập.                                                                                      However, their meanings are different: the adjective
                       7Classifier nouns indicate two types of things, animate things                            expresses the quality, and the verb expresses the ac-
                   and inanimate things.                                                                         tion.
                                                                                                   1535
The words contained in this file might help you see if this file matches what you are looking for:

...Challenges and solutions for consistent annotation of vietnamese treebank quyt nguyen yusuke miyao ha t le ngan l thegraduate university advanced studies sokendai japan national institute informatics social sciences humanities vietnam information technology quynt nii ac jp trucha ussh gmail com ngannlt uit edu vn abstract treebanks are important resources research in natural language processing speech recognition theoretical linguistics etc to strengthen the automatic a has been built however quality this is not satisfactory possible source low performance we have building new with about sentences annotated three layers word segmentation part tagging bracketing paper describe several how solve them developing guidelines also present our methods improve ensure accuracy consistency experiment results show that inter annotator agreement ratios higher than which keywords treeing sentence introduction original corpora syntactic structures nam kvtai nn hom qua tells yesterday s accident impo...

no reviews yet
Please Login to review.