jagomart
digital resources
picture1_Spanish Pdf 102349 | W10 0301


 154x       Filetype PDF       File size 0.10 MB       Source: aclanthology.org


File: Spanish Pdf 102349 | W10 0301
automatic conjugation and identication of regular and irregular verb neologisms in spanish luzrelloandeduardobasterrechea molino de ideas s a nanclares de oca 1f madrid 28022 spain lrello ebaste molinodeideas es abstract ...

icon picture PDF Filetype PDF | Posted on 23 Sep 2022 | 3 years ago
Partial capture of text on file.
                          Automatic conjugation and identification of regular and irregular verb
                                                                neologisms in Spanish
                                                         LuzRelloandEduardoBasterrechea
                                                                     Molino de Ideas s.a.
                                                                    Nanclares de Oca, 1F
                                                                    Madrid, 28022, Spain
                                                   {lrello, ebaste}@molinodeideas.es
                                             Abstract                                 Several researchers have developed tools and
                                                                                    methods related to Spanish verbs. These include
                           In this paper, a novel system for the automatic          morphological processors (Tzoukermann and Liber-
                           identification and conjugation of Spanish verb            man, 1990), (Santana et al., 1997), (Santana et al.,
                           neologisms is presented. The paper describes             2002), semantic verb classification (Esteve Ferrer,
                           a rule-based algorithm consisting of six steps           2004) or verb sense disambiguation (Lapata and
                           which are taken to determine whether a new               Brew, 2004). Nevertheless, to our knowledge, ours
                           verbisregularornot,andtoestablishtherules                is the first attempt to automatically identify, classify
                           that the verb should follow in its conjugation.          and conjugate new Spanish verbs.
                           Themethodwasevaluatedon4,307newverbs
                           and its performance found to be satisfactory               Our method identifies new and existing Spanish
                           bothforirregular and regular neologisms. The             verbs and categorises them into seven classes: one
                           algorithm also contains extra rules to cater for         class for regular verbs and six classes of irregular
                           verbneologismsinSpanishthatdonotexistas                  verbs depending on the type of the irregularity rule
                           yet, but are inferred to be possible in light of         whose operation produced it. This algorithm is im-
                           existing cases of new verb creation in Spanish.          plemented by means of six modules or transducers
                                                                                    which process each new infinitive form and classify
                     1    Introduction                                              the neologism. Once the new infinitive is classified,
                                                                                    it is conjugated by the system using a set of high
                     Thispaperpresentsanewmethodconsistingofaset                    accuracy conjugation rules according to its class.
                     of modules which are implemented as part of a free               One of the advantages of this procedure is that
                                                         1
                     online conjugator called Onoma .                               only very little information about the new infinitive
                        The novelty of this system lies in its ability to           form is required. The knowledge needed is exclu-
                     identify and conjugate existing verbs and potential            sively of a formal kind. Extraction of this informa-
                     new verbs in Spanish with a degree of coverage                 tion relies on the implementation and use of two ex-
                     that cannot completely be achieved by other ex-                tra modules: one to detect Spanish syllables, and the
                     isting conjugators that are available.       Other exist-      other to split the verb into its root and morphological
                     ing systems do not cope well with the productively             affixes.
                     rich word formation processes that apply to Spanish              In cases where the neologism is not an infinitive
                     verbs and lead to complexities in their inflectional            form, but a conjugated one, the system generates a
                     forms that can present irregularities. The operation           hypothetical infinitive form that the user can corrob-
                     of these processes mean that each Spanish verb can             orate as a legitimate infinitive.
                     comprise 135 different forms, including compound                 Given that the transducers used in this system
                     verb forms.                                                    are easy to learn and remember, the method can be
                         1Onomacanbeaccessedathttp://conjugador.onoma.es            employed as a pedagogic tool itself by students of
                                                                            1
                       Proceedings of the NAACL HLT 2010 Second Workshop on Computational Approaches to Linguistic Creativity, pages 1–5,
                                                                          c
                                       Los Angeles, California, June 2010. 
2010 Association for Computational Linguistics
                   Spanish as a foreign language. It helps in the learn-            Corpus          Numberofverbs
                   ing of the Spanish verb system since currently ex-               DRAE                        11,060
                   isting methods (e.g. (Puebla, 1995), (Gomis, 1998),              MolinoIdeas                 15,367
                   (Mateo,2008))donotprovideguidanceontheques-
                   tion of whether verbs are regular or irregular. This                     Table 1: Corpora used.
                   is due to the fact that our method can identify the
                   nature of any possible verb by reference only to its     a collection of 3 million journalistic articles from
                   infinitive form. The application of other kinds of                                                         2
                   knowledge about the verb to this task are currently      newspapers in Spanish from America and Spain .
                   being investigated to deal with those rare cases in         Verbs which do not occur in the Dictionary of the
                   which reference to the infinitive form is insufficient     Royal Spanish Academy (DRAE, 2001) are consid-
                   for making this classification.                           ered neologisms in this study. Thus 4,307 of the
                      This study first required an analysis of the exist-    15,367 verbs in the MIVC-DB are neologisms. The
                   ing verb paradigms used in dictionary construction       paradigms of the new verbs whose complete con-
                   (DRAE,2001)followedbythedetailedexamination              jugation was not found in the sources were auto-
                   of new verbs’ conjugations (Gomis, 1998), (Santana       matically computed and manually revised in order
                   et al., 2002), (Mateo, 2008) compiled in a database      to ensure their accuracy. The result of this semi-
                   created for that purpose. For the design of the algo-    automatic process is a database consisting only of
                   rithm, in order to validate the rules and patterns, an   attested Spanish verbs.
                   error-driven approach was taken.                         3   Creativity in Spanish verbs
                      The remainder of the paper is structured as fol-      The creation of new verbs in Spanish is especially
                   lows: section 2 presents a description of the cor-       productive due to the rich possibilities of the diverse
                   poraused. InSection3,thedifferentwordformation           morphological schema that are applied to create ne-
                   processes that apply to Spanish verbs are described,     ologisms (Almela, 1999).
                   whileSection4isdevotedtothedetaileddescription              NewSpanishverbsarederived by two means: ei-
                   of the rules used by the system to classify the neolo-   ther (1) morphological processes applied to exist-
                   gisms, which are evaluated in Section 5. Finally, in     ing words or (2) incorporating foreign verbs, such
                   Section 6 we draw the conclusions.                       as digitalizar from to digitalize.
                   2    Data                                                   Three morphological mechanisms can be distin-
                                                                            guished: prefixation, suffixation and parasynthe-
                   Two databases were used for the modeling pro-            sis. Through prefixation a bound morpheme is at-
                   cess. The first (named the DRAE Verb Conjugation          tached to a previously existing verb.      The most
                   Database (DRAEVC-DB)) is composed of all the             common prefixes used for new verbs found in our
                   paradigmsoftheverbscontainedinthe22ndedition             corpus are the following: a- (abastillar), des- (de-
                   of the Dictionary of the Royal Spanish Academy           sagrupar), inter- (interactuar), pre- (prefabricar),
                   (DRAE,2001). Thisdatabasecontains11,060exist-            re- (redecorar), sobre- (sobretasar), sub- (subval-
                   ing Spanish verbs and their respective conjugations.     uar) and super- (superdotar). On the other hand,
                   The second database (named the MolinoIdeas Verb          the most frequent suffixes in Spanish new verbs are
                   ConjugationDatabase(MIVC-DB)),createdforthis             -ar (palar), -ear (panear), -ificar (cronificar) and -
                   purpose, contains 15,367 verbs. It includes all the      izar (superficializar). Finally, parasynthesis occurs
                   verbs found in the DRAE database plus 4,307 con-         when the suffixes are added in combination with a
                   jugated Spanish verbs that are not registered in the     prefix (bound morpheme). Although parasynthesis
                   Royal Spanish Academy Dictionary (DRAE, 2001),           is rare in other grammatical classes, it is quite rele-
                   which are found in standard and colloquial Spanish       vant in the creation of new Spanish verbs (Serrano,
                   and whose use is frequent on the web.                       2The newspapers with mayor representation in our corpus
                      The MIVC-DB contains completely conjugated            are: El Paıs, ABC, Marca, Publico, El Universal, Cların, El
                                                                                     ´                ´                      ´
                   verbs occurring in the Spanish Wikipedia and in          MundoandElNortedeCastilla
                                                                     2
                        1999). The most common prefixes are -a or -en in                         of these cases, the verb is irregular and will undergo
                        conjunction with the suffixes -ar, -ear, -ecer and -                     the rules and patterns of its own class. (Basterrechea
                        izar (acuchillear, enmarronar, enlanguidecer, aban-                     and Rello, 2010).
                        dalizar).                                                                  Module 2: If the infinitive or prefixed infinitive
                           In this paper, the term derivational base is used                    form finishes in -quirir (adquirir) or belongs to the
                        to denote the immediate constituent to which a mor-                     list: dormir, errar, morir, oler, erguir or desosar, the
                        phological process is applied to form a verb. In or-                    form is recognized as an irregular verb and will be
                        der to obtain the derivational base, it is necessary                    conjugated using the irregularity rules which oper-
                        to determine whether the last vowel of the base is                      ate on the root vowel, which can be either diphthon-
                        stressed.     When the vowel is unstressed, it is re-                   gized or replaced by another vowel (adquiero from
                                                                                                adquirir, duermo and durmio from dormir).
                        moved from the derivational base while a stressed                                                           ´
                        vowel remains as part of the derivational base. If a                       Module3: Thethirdtransduceridentifieswhether
                        consonant is the final letter of the derivational base                   the infinitive form root ends in a vowel. If the verb
                        it remains a part of it as well.                                        belongs to the second or third conjugation (-er and -
                                                                                                ir endings) (leer, oır), it is an irregular verb, while if
                                                                                                                       ´
                        4     Classifying and conjugating new verbs                             the verb belongs to the first conjugation (-ar ending)
                        Broadly speaking, the algorithm is implemented by                       then it will only be irregular if its root ends with an
                        six transduction modules arranged in a switch struc-                    -u or -i (criar, actuar). For the verbs assigned to the
                        ture.    The operation of most of the transducers is                    first conjugation, diacritic transduction rules are ap-
                                                                                                plied to their inflected forms (crıo from criar, actuo
                        simple, though Module 4 is implemented as a cas-                                                                 ´                      ´
                        cade of transduction modules in which inputs may                        from actuar); in the case of verbs assigned to the
                        potentially be further modified by subsequent mod-                       second and third conjugations, the alterations per-
                        ules (5 and 6).                                                         formedontheirinflectedformsaremainlyadditions
                                                                                                or subtitutions of letters (leyo de leer, oigo de oır).
                           The modules were implemented to determine the                                                            ´                        ´
                        class of each neologism. Depending on the class to                         There are some endings such as (-ier, -uer and
                        which each verb belongs, a set of rules and patterns                    -iir) which are not found in the MIVC-DB. In the
                        will be applied to create its inflected forms. The                       hypothetical case where they are encountered, their
                        proposed verb taxonomy generated by these trans-                        conjugation would have followed the rules detailed
                        ducers is original and was developed in conjunction                     earlier. Rules facilitating the conjugation of poten-
                        with the method itself. The group of patterns and                       tial but non-existing verbs are included in the algo-
                        rules which affect each verb are detailed in previous                   rithm.
                        work (Basterrechea and Rello, 2010). The modules                           Module 4: When an infinitive root form in the
                        described below are activated when they receive as                      first conjugation ends in -c, -z, -g or -gu (secar,
                        input an existing or new infinitive verb form. When                      trazar, delegar) and in the second and third conju-
                        the infinitive form is not changed by one transducer,                    gation ends in -c, -g, -gu or -qu (conocer, corregir,
                        it is tested against the next one. If not adjusted by                   seguir), that verb is affected by consonantal ortho-
                        any transducer, then the new infinitive verb is as-                      graphic adjustments (irregularity rules) in order to
                                                                                                preserve its pronunciation (seque from secar, trace
                        sumedtohavearegularconjugation.                                                                                   ´                      ´
                                                                                                from trazar, delegue from delegar, conozco from
                                                                                                                          ´
                           Module 1: The first transducer checks whether                         conocer, corrijo from corregir, sigo from seguir).
                        the verb form is an auxiliary verb (haber), a copu-                        In case the infinitive root form of the second and
                        lative verb (ser or estar), a monosyllabic verb (ir,                    third conjugation ends in -n or -ll (taner, engullir),
                                                                 3                                                                 ˜             ˜
                        dar or ver), a Magnificent verb , or a prefixed form                      the vowel i is removed from some endings of the
                        whosederivational base matches one of these afore-                      paradigm following the pattern detailed in (Baster-
                        mentioned types of verbs. If the form matches one                       rechea and Rello, 2010).
                            3Thereare14so-calledMagnificentverbs: traer,valer,salir,                Verbs undergoing transduction by Module 4 can
                        tener, venir, poner, hacer, decir, poder, querer, saber, caber, an-     undergo further modification by Modules 5 and 6.
                        dar and -ducir (Basterrechea and Rello, 2010).                          Any infinitive form which failed to meet the trig-
                                                                                       3
                    gering conditions set by Modules 1-4 is also tested         Verbneologism        Verbneologism       Numberof
                    against 5 and 6.                                            type                 class               neologisms
                       Module 5: This module focuses on determining             regular              regular rules              3,154
                    the vowel of the infinitive form root and the verb’s         irregular            module1rules                  27
                    derivational base. If the vowel is e or o in the first       irregular            module2rules                   9
                    conjugation and the verb derivational base includes         irregular            module3rules                  39
                    diphthongsieorue(helar,contar), orifthevowelis              irregular            module4rules                 945
                    e in the infinitive forms belonging to the second and        irregular            module5rules                  87
                    third conjugation (servir, herir), then the verb is ir-     irregular            module6rules                  46
                    regular and it is modified by the irregularity rules         Total verb
                    which perform either a substitution of this vowel           neologisms                                      4,307
                    (sirvo from servir) or a diphthongization (hielo from
                    helar, cuento from contar or hiero from herir).                        Table 2: New verbs evaluation
                       Module6: Finally,theexistenceofadiphthongin
                    the infinitive root is examined (reunir, europeizar).       MIVC-DBisintroducedbytheuser4,itisautomati-
                    If the infinitive matches the triggering condition for      cally addedtothedatabase. Thesystemisconstantly
                    this transducer, its paradigm is considered irregu-        updated since it is revised every time a new irregu-
                    lar and the same irregularity rules from module 3          larity is detected by the algorithm. The goal is to
                    -inserting a written accent in certain inflected forms-     enable future adaptation of the algorithm to newly
                    are applied (reuno from reunir, europeızo from eu-
                                    ´                         ´                encounteredphenomenawithinthelanguage. Sofar,
                    ropeizar).                                                 non-normative verbs, invented by the users, such as
                       Any verb form that fails to meet the triggering         arreburbujear, insomniar, pizzicatear have also been
                    conditions set by any of these six transducers has         conjugated by Onoma.
                    regular conjugation.                                         Of all the new verbs in MIVC-DB, 3,154 were
                       It is assumed that these 6 modules cover the full       regular and 1,153 irregular (see Table 2). The ma-
                    range of both existing and potential verbs in Span-        jority of the irregular neologisms were conjugated
                    ish. The modules’ reliability was tested using the         by transducer 4.
                    full paradigms of 15,367 verbs. As noted earlier,
                    there are some irregularity rules in module 3 which        6   Conclusions
                    predict the irregularities of non existing but possible    Creativity is a property of human language and the
                    neologisms in Spanish. Those rules, in conjunction         processing of instances of linguistic creativity repre-
                    with the rest of the modules, cover the recognition        sents one of the most challenging problems in NLP.
                    and conjugation of the potential new verbs.                Creative processes such as word formation affect
                    5   Evaluation                                             Spanish verbs to a large extent: more than 50% of
                                                                               the actual verbs identified in the data set used to
                    The transducers have been evaluated over all the           build MIVC-DB do not appear in the largest Span-
                    verbs from the DRAEVC-DB and the 4,307 new                 ish dictionary. The processing of these neologisms
                    verbs from MICV-DB.                                        poses the added difficulty of their rich inflectional
                       In case a new verb appears which is not similar         morphology which can be also irregular. Therefore,
                    to the ones contained in our corpus, the transduc-         the automatic and accurate recognition and gener-
                    tion rules in Module 3 for non existing but poten-         ation of new verbal paradigms is a substantial ad-
                    tial verbs in Spanish would be activated, although         vance in neologism processing in Spanish.
                    no examples of that type have been encountered in            In future work we plan to create other algorithms
                    the test data used here. As this system is part of the     to treat the rest of the open-class grammatical cate-
                    free online conjugator Onoma, it is constantly being       goriesandtoidentifyandgenerateinflectionsofnew
                    evaluated on the basis of users’ input.                       4Forms occurring due to typographical errors are not in-
                       Every time a new infinitive form absent from             cluded.
                                                                        4
The words contained in this file might help you see if this file matches what you are looking for:

...Automatic conjugation and identication of regular irregular verb neologisms in spanish luzrelloandeduardobasterrechea molino de ideas s a nanclares oca f madrid spain lrello ebaste molinodeideas es abstract several researchers have developed tools methods related to verbs these include this paper novel system for the morphological processors tzoukermann liber man santana et al is presented describes semantic classication esteve ferrer rule based algorithm consisting six steps or sense disambiguation lapata which are taken determine whether new brew nevertheless our knowledge ours verbisregularornot andtoestablishtherules rst attempt automatically identify classify that should follow its conjugate themethodwasevaluatedon newverbs performance found be satisfactory method identies existing bothforirregular categorises them into seven classes one also contains extra rules cater class verbneologismsinspanishthatdonotexistas depending on type irregularity yet but inferred possible light whos...

no reviews yet
Please Login to review.