jagomart
digital resources
picture1_Spanish Pdf 103614 | 2011 Luz Rello Conjugation Cicling


 148x       Filetype PDF       File size 0.21 MB       Source: www.superarladislexia.org


File: Spanish Pdf 103614 | 2011 Luz Rello Conjugation Cicling
onoma a linguistically motivated conjugation system for spanish verbs 1 2 luz rello and eduardo basterrechea 1 nlp web research group dept of information and communication technologies universitat pompeu fabra ...

icon picture PDF Filetype PDF | Posted on 23 Sep 2022 | 3 years ago
Partial capture of text on file.
                                         Onoma: A Linguistically Motivated
                                       Conjugation System for Spanish Verbs
                                                           1⋆                           2
                                                  Luz Rello   and Eduardo Basterrechea
                                                       1 NLP & Web Research Group
                                            Dept. of Information and Communication Technologies
                                                         Universitat Pompeu Fabra
                                                              Barcelona, Spain
                                                           2 Molino de Ideas s.a.
                                                            Nanclares de Oca, 1F
                                                               Madrid, Spain
                                     Abstract. Inthispaperweintroduceanewconjugatingtoolwhichgen-
                                     erates and analyses both existing verbs and verb neologisms in Spanish.
                                     This application of finite state transducers is based on novel linguis-
                                     tically motivated morphological rules describing the verbal paradigm.
                                     Given that these transducers are simpler than the ones created in previ-
                                     ous developments and are easy to learn and remember, the method can
                                     also be employed as a pedagogic tool in itself. A comparative evaluation
                                     of the tool against other online conjugators demonstrates its efficacy.
                               1   Introduction
                               Although the literature about online Spanish conjugators is scarce, it does reveal
                                                                          3
                               that some are fully memory based (DRAE) while others rely on finite state
                                                      4
                               morphological rules [17] .
                                  To the best of our knowledge, the goal of most of the work related to verbal
                               morphology was not the creation of an end-user tool such as a conjugator. How-
                               ever, both machine learning and rule-based approaches have been taken into
                               consideration when processing inflectional morphology. While instance based-
                               learning algorithms can induce efficient morphological patterns from large train-
                               ing data [2,1,5,13], approaches using finite state transducers [19,8,6] do enable
                               the implementation of robust morphological analyzer-generators which are suc-
                               cessful in handling concatenation phenomena [4].
                                  The Onoma conjugator5 was implemented as a cascade of finite state trans-
                               ducers that implements a decision tree. The use of finite state transducers (FSTs)
                               ⋆ While developing this work the first author’s institution was Molino de Ideas s.a.
                               3 Conjugator from the Dictionary of the Royal Spanish Academy (DRAE). Available
                                 at: http://buscon.rae.es/draeI/
                               4 The conjugator developed by Grupo de Estructuras de Datos y Lingu¨´ıstica Com-
                                 putacional (GEDLC) at the University of Las Palmas de Gran Canaria, which is
                                 available at: www.gedlc.ulpgc.es/investigacion/scogeme02/flexver.htm
                               5 Developed and funded by Molino de Ideas. http://conjugador.onoma.es
                                                                                      provides the possibility of generating verbal paradigms as well as the reverse
                                                                                      process: the analysis of inflectional verb forms [9]. Further, the use of a cascade
                                                                                      structure facilitates the implementation of ordered alternation rules [10,11].
                                                                                               The remainder of the paper is structured as follows: the data and methodol-
                                                                                      ogyusedinthisstudyisexplainedinSection2,whileSection3describesSpanish
                                                                                      verbal morphology. Section 4 discusses the architecture of the system. A com-
                                                                                      parative evaluation of the system against other online conjugators is presented
                                                                                      in Section 5. Finally, in Section 6, conclusions are drawn.
                                                                                      2            Data and Methodology
                                                                                      AdatabasenamedtheMolinoIdeasVerbConjugationDatabase(MIVC-DB)was
                                                                                      used for the modeling process. It contains 15,367 verbs (plus their correspond-
                                                                                      ing verbal paradigms) including all the verbs registered in the Royal Spanish
                                                                                      Academy Dictionary (11,060 verbs) [15], the Spanish Wikipedia, and the verbs
                                                                                      found in a collection of 3 million journalistic articles from newspapers written
                                                                                                                                                                                         6
                                                                                      in Spanish from America and Spain .
                                                                                               Our conjugator differs from the other Spanish processors in its architecture
                                                                                      [17] (the GEDLC conjugator relies on the interaction of a segmentation program,
                                                                                      three lists containing prefixes, verbal endings and pronouns, and two modules:
                                                                                      one for the verbal endings and another for obtaining required external informa-
                                                                                      tion) and in the design of the transducers, which are not based on concatenation
                                                                                      rules [19] (in this FST model, a specific ending is added to 62 conjugation classes,
                                                                                      giving as a result almost 150 verb-stem final states), but on rules which modify
                                                                                      a hypothetical regular verb form, providing the possibility to extend such rules
                                                                                      for the conjugation and analysis of verb neologisms in Spanish.
                                                                                               When designing the rules and patterns for each FST, the Spanish verbal
                                                                                      inflectional paradigm was analyzed in detail from a linguistic point of view. This
                                                                                      analysis led to the derivation of a simpler description of the inflectional verb
                                                                                      paradigm which can be fully expressed (except for six verbs, see Section 4) using
                                                                                      just nine patterns and a set of rules, as opposed to approximately one hundred
                                                                                      and twenty conjugation models as in other approaches [7,18]. Given that the
                                                                                      FSTs used in this system are easy to learn and remember, the description can
                                                                                      be employed as a pedagogic tool in its own right by students of Spanish as
                                                                                      a foreign language. It helps in the learning of the Spanish verb paradigm since
                                                                                      currently existing methods (e.g. [14,12]) do not provide guidance on the question
                                                                                      of whether verbs are regular or irregular. This is due to the fact that the system
                                                                                      can identify the nature of any possible verb by reference only to its infinitive
                                                                                                   7
                                                                                      form following just seven steps. [16].
                                                                                               For the design of the algorithm, in order to validate the rules and patterns
                                                                                      extracted from the analysis of the MIVC-DB, an error-driven approach was
                                                                                      taken.
                                                                                        6 Newspapers with the major representation in our corpus are: El Pa´ıs, ABC, Marca,
                                                                                            Public´         o, El Universal, Clar´ın, El Mundo and El Norte de Castilla
                                                                                        7 In some rare cases, external information which the system also provides is required,
                                                                                            see Section 4.
                                  3    Spanish Verb Morphology
                                  In Spanish, inflected verb forms exist for the nineteen tenses/moods as shown
                                  in Table 18.
                                   Tense/mood                                    Examples, verb ayudar (to help)
                                   present tense/indicative                      ayudo, 1st person singular
                                   present tense/subjunctive                     ayude, 1st person singular
                                   present tense/imperative                      ayuda, 2nd person singular
                                   preterite imperfect tense/indicative          ayudaba, 1st person singular
                                   preterite imperfect tense/subjunctive 1       ayudara, 1st person singular
                                   preterite imperfect tense/subjunctive 2       ayudase, 1st person singular
                                   preterite perfect composed tense/indicative   he ayudado, 1st person singular
                                   preterite perfect composed tense/subjunctive haya ayudado, 1st person singular
                                   past perfect tense/indicative                 ayud´e, 1st person singular
                                   past perfect composed tense/subjunctive       hube ayudado, 1st person singular
                                   preterite pluscuanperfect tense/indicative    hab´ıa ayudado, 1st person singular
                                   preterite pluscuanperfect tense/subjunctive 1 hubiera ayudado, 1st person singular
                                   preterite pluscuanperfect tense/subjunctive 2 hubiese ayudado, 1st person singular
                                   future tense/indicative                       ayudar´e, 1st person singular
                                   future tense/subjunctive                      ayudare, 1st person singular
                                   future perfect tense/indicative               habr´e ayudado, 1st person singular
                                   future perfect tense/subjunctive              hubiere ayudado, 1st person singular
                                   conditional simple tense/indicative           ayudar´ıa, 1st person singular
                                   conditional perfect tense/indicative          habr´ıa ayudado, 1st person singular
                                                   Table 1. Inflected forms from the verbal paradigm.
                                     Except for the imperative, each tense possesses seven inflected forms corre-
                                  sponding to grammatical person. Furthermore, there are two infinitives and two
                                  gerunds (present and perfect) plus four forms of the participle form, depending
                                  on its number/gender variations. The potential therefore exists for up to 140
                                  different forms per verb.
                                     A Spanish verb consists of its stem, tense-mood inflections and person-
                                  number inflections. Most of the complexity resides in four factors:
                                   1. Both kinds of inflection (tense-mood and person-number) can sometimes be
                                      realized by the same morphological segment;
                                   2. the stem can be realised by different variations, i.e. the same verb can have
                                      more than one stem;
                                   3. prefixes and suffixes can be added to the stem; and
                                   4. the verb can be irregular which means that either the stem, the inflections
                                      or both are different from the hypothetical regular paradigm of conjugation.
                                  8 Throughout the paper, the solidus will be used when denoting tense/mood combi-
                                    nations
                Of 15,367 verbs, 4,225 are irregular (27.5 %). Moreover, 26.8% of the verbal
              neologisms in Spanish are irregular [16]. This group of irregular neologisms follow
              the inflectional patterns of established verbs and conflates genuine paradigmatic
              irregularity and orthographic issues regarding grapheme realization on stem final
              consonants among others, shown in Section 4.
                Most morphological processing systems are based on combining stems with
              inflections [19,7,12]. By contrast, our verbal paradigm description is based on
              patterns and transformational rules. Here, the term rule is used to denote an
              alteration that affects the hypothetical regular form of an irregular verb to gen-
              erate the irregular form that matches with the appropriate irregular conjugation.
              Such rules are applied to a pattern which is the set of inflected forms affected
              by the irregularity rules (see subsection 4.1) in the verbal conjugation paradigm
              of the particular verb.
              4 System Architecture
              The system is composed of two modules, which employ finite state machines.
              The first one (Classifier) is designed to recognize the verb form and extract
              the information needed for its conjugation or analysis. This information is: (1)
              the word from which the verb form derives (if there is one) and (2) some formal
              information on the verb form which is derived via seven finite state automata
              (regular expressions) which detect wether the verb is regular or irregular based
              on its ending [16] or, in some cases, from the word that the verb is derived
              from. This module makes use of two additional purpose-built submodules: one
              to detect the word from which the verb is derived and another to identify the
              stress pattern of the verb. These two submodules are used to detect the verb
              root and to provide information that will later be exploited for its inflection or
              analysis. When the verb form is irregular, this information will be used to select
              the irregularity rules and patterns to be applied (see subsection 4.1).
                By means of the first module, the verbs are classified into two groups [3]:
              (a) regular verbs and (b) irregular verbs. When identified, irregular verbs are
              further divided into (b.1) the so-called Magnificent verbs, traer (to bring), valer
              (to be worth), salir (to go out), tener (to have), venir (to come), poner (to put),
              hacer (to do), decir (to say), poder (can), querer (to want), saber (to know),
              caber (to fit), andar (to walk), and their derivations; (b.2) verbs which undergo
              diphthongization or a vowel replacement in their root; (b.3) verbs which are
              affected by diacritic rules of irregularity; (b.4) verbs which suffer orthographic
              changes in their endings; (b.5) verb forms whose root ends in a vowel and will
              undergo heterogeneous rules of irregularity, and finally; (b.6) the irreducible
              verbs which are a set of six verbs whose conjugations are stored in memory:
              the auxiliary verb (haber, (to have)), the copulative verbs, ser (to be) or estar
              (to be), and the monosyllabic verbs: ir (to go) dar (to give) and ver (to see).
              Apart from the irreducible verbs, the rest of the verbal paradigm system is based
              entirely on rules and patterns implemented in Module 2 (Modeling).
                Module 2 is composed of two conjugation modules. The first module (2.1
              Hypothetical verb form) conjugates –or analyses– the verb form as if it were
The words contained in this file might help you see if this file matches what you are looking for:

...Onoma a linguistically motivated conjugation system for spanish verbs luz rello and eduardo basterrechea nlp web research group dept of information communication technologies universitat pompeu fabra barcelona spain molino de ideas s nanclares oca f madrid abstract inthispaperweintroduceanewconjugatingtoolwhichgen erates analyses both existing verb neologisms in this application nite state transducers is based on novel linguis tically morphological rules describing the verbal paradigm given that these are simpler than ones created previ ous developments easy to learn remember method can also be employed as pedagogic tool itself comparative evaluation against other online conjugators demonstrates its ecacy introduction although literature about scarce it does reveal some fully memory drae while others rely best our knowledge goal most work related morphology was not creation an end user such conjugator how ever machine learning rule approaches have been taken into consideration when pro...

no reviews yet
Please Login to review.