jagomart
digital resources
picture1_Volk 2003v


 156x       Filetype PDF       File size 0.18 MB       Source: www.zora.uzh.ch


File: Volk 2003v
zurich open repository and archive university of zurich university library strickhofstrasse 39 ch 8057 zurich www zora uzh ch year 2003 german prepositions and their kin a survey with respect ...

icon picture PDF Filetype PDF | Posted on 23 Sep 2022 | 3 years ago
Partial capture of text on file.
                                       Zurich Open Repository and
                                       Archive
                                       University of Zurich
                                       University Library
                                       Strickhofstrasse 39
                                       CH-8057 Zurich
                                       www.zora.uzh.ch
        Year: 2003
        German prepositions and their kin. A survey with respect to the resolution
                      of PP attachment ambiguities
                            Volk, Martin
        Abstract: This paper surveys German prepositions and their relatives: contracted prepositions, pronomi-
        nal adverbs, and reciprocal pronouns. We elaborate on corpus frequencies for these and on their properties
        with respect to PP attachment. We show that prepositions and contracted prepositions can be handled
        together. They show an overall attachment tendency towards the noun. But pronominal adverbs and
        reciprocal pronouns show an overall attachment tendency towards the verb and therefore must be treated
        separately.
        Posted at the Zurich Open Repository and Archive, University of Zurich
        ZORAURL:https://doi.org/10.5167/uzh-20340
        Conference or Workshop Item
        Originally published at:
        Volk, Martin (2003). German prepositions and their kin. A survey with respect to the resolution of PP
        attachment ambiguities. In: Workshop on The Linguistic Dimensions of Prepositions and their Use in
        Computational Linguistics Formalisms and Applications, Toulouse, 2003.
                    German prepositions and their kin. A survey with respect to the
                                           resolution of PP attachment ambiguities
                                                                      Martin Volk
                                                                Stockholm University
                                                            Department of Linguistics
                                                                SE-10691 Stockholm
                                                                     volk@ling.su.se
                                         Abstract                                  weekly computer science newspaper. In ad-
                   This paper surveys German prepositions                          dition to this training corpus, we prepared
                   and their relatives: contracted prepositions,                   a 3000 sentence corpus with manually an-
                   pronominal adverbs, and reciprocal pro-                         notated syntax trees.            From this treebank
                   nouns. We elaborate on corpus frequencies                       we extracted over 4000 test cases with am-
                   for these and on their properties with respect                  biguously positioned PPs for the evaluation
                   to PP attachment. We show that prepo-                           of the disambiguation method. We will call
                   sitions and contracted prepositions can be                      these test cases the ‘CZ test set’.
                   handled together. They show an overall at-                         As a basis for this study we surveyed Ger-
                   tachment tendency towards the noun. But                         man prepositions and their relatives and we
                   pronominal adverbs and reciprocal pronouns                      checked for prepositions, contracted prepo-
                   show an overall attachment tendency to-                         sitions, pronominal adverbs and reciprocal
                   wardstheverbandthereforemustbetreated                           pronouns whether they can mutually benefit
                                  1                                                from each other with respect to attachment
                   separately.                                                     tendencies.
                       Keywords: Corpus linguistics, ambigu-
                   ity resolution, unsupervised learning                           2 German prepositions
                   1 Introduction                                                  Prepositions in German are a class of words
                   Any computer system for natural language                        relating linguistic elements to each other
                   processing has to struggle with the problem                     with respect to a semantic dimension such
                   of ambiguities. If the system is meant to ex-                   as local, temporal, causal or modal. They
                   tract precise information from a text, these                    do not inflect and cannot function by them-
                   ambiguities must be resolved. One of the                        selves as a sentence unit (cf. [Bußmann,
                   mostfrequent ambiguities arises from the at-                    1990]). But, unlike other function words, a
                   tachment of prepositional phrases (PPs). A                      German preposition governs the grammati-
                   PP that follows a noun (in English or Ger-                      cal case of its argument (genitive, dative or
                   man) can be attached to the noun or to the                      accusative). Frequent German prepositions
                   verb. We did an in-depth study on unsu-                         are an, fur,Ä    in, mit, zwischen.
                   pervised statistical methods to resolve such                       Prepositions are considered to be a closed
                   ambiguities in German sentences based on                        word class. Nevertheless it is difficult to de-
                   cooccurrence values derived from a shallow                      termine the exact number of German prepo-
                   parsed corpus (see [Volk, 2001] and [Volk,                      sitions.     [SchrÄoder, 1990] speaks of “more
                   2002]).                                                         than 200 prepositions”, but his “Lexikon
                       Corpus processing consisted of proper                       deutscher PrÄapositionen” lists only 110 of
                   name recognition and classification, Part-                       them.       In this dictionary all entries are
                   of-Speech tagging, lemmatization, phrase                        marked with their case requirement and
                   chunking, and clause boundary detection.                        their semantic features. For instance, ohne
                   We used a corpus of more than 5 million                         requires the accusative and is marked with
                   words from the Computer-Zeitung (CZ), a                         the semantic functions instrumental, modal,
                                                                                   conditional and part-of.2
                       1This paper is based on my research at the Uni-
                                                                                       2See also [Klaus, 1999] for a detailed comparison
                   versity of Zurich in a project supported by the
                   Swiss National Science Foundation under grant 12-               of the range of German prepositions as listed in a
                   54106.98.                                                       number of recent grammar books.
                                          The lexical database CELEX [Baayen et                                                                                  The most frequent homographic func-
                                    al., 1995] contains 108 German prepositions                                                                            tions are separable verb prefix and conjunc-
                                    with frequency counts derived from corpora                                                                             tion. Fortunately, these functions are clearly
                                    of the “Institut furÄ deutsche Sprache”. This                                                                          marked by their position within the clause.
                                    results in the arbitrary inclusion of nÄordlich,                                                                       A clause conjunction usually occurs at the
                                    nordÄostlich, sudÄ lich while Äostlich and west-                                                                       beginning of a clause, and a separated verb
                                    lich are missing.                                                                                                      prefix mostly occurs at the end of a clause
                                          Searching through 5.5 million tokens of                                                                          (rechte Satzklammer). A part-of-speech tag-
                                    our tagged computer magazine corpus we                                                                                 ger can therefore disambiguate these cases.5
                                    found around 540,000 preposition tokens                                                                                      Typical (i.e. frequent) prepositions are
                                                                                                                                                3
                                    corresponding to 99 preposition types.                                                                                 monomorphemic words (e.g. an, auf, fur,Ä                                                                in,
                                    These counts do not include contracted                                                                                 mit, ubÄ er, von, zwischen). Many of the less
                                    prepositions. A list of the 66 most frequent                                                                           frequentprepositionsarederivedorcomplex.
                                    German prepositions with frequencies from                                                                              Theyhaveturnedintoprepositionsovertime
                                    our corpus can be found in appendix A.                                                                                 andstill show traces of their origin. They are
                                          An early frequency count for German by                                                                           derived from other parts-of-speech such as
                                    [Meier, 1964] lists 18 prepositions among the
                                    100 most frequent word forms. 17 out of                                                                                       ² nouns (e.g. angesichts, zwecks),
                                    these 18 prepositions are also in our top-20                                                                                  ² adjectives (e.g. fern, unweit),
                                    list. Only gegen is missing which is on rank
                                    23 in our corpus. This means that the usage                                                                                   ² participle                         forms                 of           verbs                (e.g.
                                    of the most frequent prepositions is stable                                                                                        entsprechend, wÄahrend; ungeachtet), or
                                    over corpora and time.
                                          All frequent prepositions in German have                                                                                ² lexicalized prepositional phrases (e.g.
                                    some homograph serving as                                                                                                          anhand, aufgrund, zugunsten).
                                          ² separable verb prefix (e.g. ab, auf, mit,                                                                             German prepositions typically do not al-
                                               zu),                                                                                                        low compounding. It is generally not possi-
                                          ² clause conjunction (e.g. bis, um)4,                                                                            ble to form a new preposition by a concate-
                                                                                                                                                           nation of prepositions. The two exceptions
                                          ² adverb (e.g. auf, fur,Ä                                    ubÄ  er) in often id-                               are gegenubÄ er and mitsamt. Other concate-
                                               iomatic expressions (e.g. auf und davon,                                                                    nated prepositions have led to adverbs like
                                               ubÄ   er und ubÄ er),                                                                                       inzwischen, mitunter, zwischendurch.
                                          ² infinitive marker (zu),                                                                                               [Helbig                and Buscha,                              1998]              call          the
                                                                                                                                                           monomorphemic                                     prepositions                          primary
                                          ² proper name component (von), or                                                                                prepositions and the derived preposi-
                                                                                                                                                           tions             secondary prepositions.                                                           This
                                          ² predicative adjective (e.g. an, auf, aus,                                                                      distinction is based on the fact that only
                                               in, zu as in Die Maschine ist an/aus.                                                                       primary prepositions form prepositional
                                               Die TurÄ               ist auf/zu.).                                                                        objects, pronominal adverbs (cf. section 2.2)
                                           3These figures are based on automatically as-                                                                    and prepositional reciprocal pronouns (cf.
                                    signed part-of-speech tags. If the tagger systemat-                                                                    section 2.3).
                                    ically mistagged a preposition, the counting proce-                                                                          In addition, this distinction corresponds
                                    dure does not find it. In the course of the project                                                                     to different case requirements. The primary
                                    we realized that this happened to the prepositions                                                                     prepositions govern accusative (durch, fur,Ä
                                    a, via and voller as used in the following example                                                                     gegen, ohne, um) or dative (aus, bei, mit,
                                    sentences (all examples in this paper are from the                                                                     nach, von, zu) or both (an, auf, hinter, in,
                                    Computer-Zeitung, Konradin-Verlag, 1993-1997).
                                    (1)              Derselbe Service in der Regionalzone (bis                                                             neben, ubÄ er, unter, vor, zwischen).                                                             Most
                                                     zu 50 Kilometern) kostet 23 Pfennig a 60                                                              of the secondary prepositions govern gen-
                                                     Sekunden.                                                                                             itive (angesichts, bezuglich,Ä                                             dank).                 Some
                                    (2)              Master und Host kommunizieren via IPX.                                                                       5Note the high degree of ambiguity for zu which
                                    (3)              Windows steckt voller eigener Fehler.                                                                 can be a preposition zu ihm, a separated verb prefix
                                                                                                                                                           sie sieht ihm zu, the infinitive marker ihn zu sehen, a
                                           4[Jaworska, 1999] (p. 306) argues that “clause-                                                                 predicative adjective das Fenster ist zu, an adjectival
                                    introducing preposition-like elements are indeed                                                                       or adverb marker zu gross, zu sehr, or the ordinal
                                    prepositions”.                                                                                                         number marker sie kommen zu zweit.
                            prepositions (most notably wÄahrend) are in                                                  the probability estimates in [Ratnaparkhi,
                            the process of changing from genitive to da-                                                 1998] except that Ratnaparkhi includes a
                            tive. Some prepositions do not show overt                                                    back-off to the uniform distribution for the
                            case requirements (je, pro, per; cf. [Schaeder,                                              zero denominator case.                            We added special
                            1998]) and are used with determiner-less                                                     precautions for this case in our disambigua-
                            noun phrases.                                                                                tion algorithm. The cooccurrence values are
                                 Some prepositions show other idiosyncra-                                                also very similar to the probability estimates
                            cies. The preposition bis often takes another                                                in [Hindle and Rooth, 1993].
                            preposition (in, um, zu as in 4) or combines                                                     We started by computing the cooccur-
                            with the particle hin plus a preposition (as                                                 rence values over word forms for nouns,
                            in 5). The preposition zwischen is special in                                                prepositions, and verbs based on their part-
                            that it requires a plural argument (as in 6),                                                of-speech tags. In order to compute the pair
                            often realized as a coordination of NPs (as                                                  frequencies freq(N1;P), we search the train-
                            in 7).                                                                                       ing corpus for all token pairs in which a
                                                                                                                         noun is immediately followed by a preposi-
                            (4)          Portables mit 486er-Prozessor                                                   tion. The treatment of verb + preposition
                                         werden bis zu 20 Prozent billiger.                                              cooccurrences is different from the treatment
                            (5)          ... und berucksichtigtÄ                    auch Daten                           of N+P pairs since verb and preposition are
                                         und Datentypen bis hin zu Arrays                                                seldom adjacent to each other in a German
                                         oder den Records im VAX-Fortran.                                                sentence. On the contrary, they can be far
                                                                                                                         apart from each other, the only restriction
                            (6)          Die Verbindungstopologie zwischen                                               being that they cooccur within the same
                                         den Prozessoren lÄaßt sich als                                                  clause. We use the clause boundary infor-
                                         dreidimensionaler Torus darstellen.                                             mation in our training corpus to enforce this
                                                                                                                         restriction. For computing the cooccurrence
                            (7)          Durch Microsoft Access mussenÄ                                 sich             values we accept only verbs and nouns with
                                         die Anwender nicht mehr lÄanger                                                 an occurrence frequency of more than 10.
                                         zwischen Bedienerfreundlich-                                                        WiththeN+PandV+Pcooccurrenceval-
                                         keit und Leistung entscheiden.                                                  ues for word forms we did a first evaluation
                                                                                                                         over the CZ test set with the following sim-
                            Results for PP attachment                                                                    ple disambiguation algorithm.
                            We explored various possibilities to extract
                            PPdisambiguation information from the au-                                                    if ( cooc(N1,P) && cooc(V,P) ) then
                            tomatically annotated CZ corpus. We first                                                          if ( cooc(N1,P) >= cooc(V,P) ) then
                            used it to gather frequency data on the cooc-                                                            noun attachment
                            currence of pairs: nouns + prepositions and                                                       else
                            verbs + prepositions.                                                                                    verb attachment
                                 The cooccurrence                          value          is      the        ra-
                            tio        of       the         bigram             frequency               count                 We found that we can only decide 57%
                            freq(word;preposition)                              divided           by the                 of the test cases with an accuracy of 71.4%
                            unigram frequency freq(word).                                         For our                (93.9% correct noun attachments and 55.0%
                            purposes word can be the verb V or the                                                       correct verb attachments).                                 This shows a
                            reference noun N1.                            The ratio describes                            striking imbalance between the noun attach-
                            the percentage of the cooccurrence of                                                        ment accuracy and the verb attachment ac-
                            word + preposition against all occurrences                                                   curacy. This imbalance was countered with
                            of word.                  It is thus a straightforward                                       a noun factor which was automatically de-
                            association measure for a word pair. The                                                     rived from the corpus based on the overall
                            cooccurrence value can be seen as the                                                        attachmenttendencyofprepositionstowards
                            attachment probability of the preposition                                                    nouns in comparison to their tendency to-
                            based on maximum likelihood estimates.                                                       wards verbs (cf. [Volk, 2002]).                                  This move
                            Wewrite:                                                                                     leads to an improvement of the overall at-
                                                                                                                         tachment accuracy to 81.3%. We then went
                                     cooc(W;P) = freq(W;P)=freq(W)                                                       on to lemmatize all word forms which also
                                 with W ∈ {V;N }. The cooccurrence val-                                                  included mapping contracted prepositions to
                                                                1                                                        their corresponding bare forms.
                            ues for verb V and noun N1 correspond to
The words contained in this file might help you see if this file matches what you are looking for:

...Zurich open repository and archive university of library strickhofstrasse ch www zora uzh year german prepositions their kin a survey with respect to the resolution pp attachment ambiguities volk martin abstract this paper surveys relatives contracted pronomi nal adverbs reciprocal pronouns we elaborate on corpus frequencies for these properties show that can be handled together they an overall tendency towards noun but pronominal verb therefore must treated separately posted at zoraurl https doi org conference or workshop item originally published in linguistic dimensions use computational linguistics formalisms applications toulouse stockholm department se ling su weekly computer science newspaper ad dition training prepared sentence manually pro notated syntax trees from treebank nouns extracted over test cases am biguously positioned pps evaluation prepo disambiguation method will call sitions cz set as basis study surveyed ger tachment man checked wardstheverbandthereforemustbetre...

no reviews yet
Please Login to review.