254x Filetype PDF File size 0.88 MB Source: aclanthology.org
Co-evolution of Language and of the Language Acquisition Device
Ted Briscoe
ejb¢cl, cam. ac. uk
Computer Laboratory
University of Cambridge
Pembroke Street
Cambridge CB2 3QG, UK
Abstract rapid increase in gramlnatical complexity accompa-
A new account of parameter setting dur- nying the transition from pidgin to creole languages.
ing grammatical acquisition is presented in Prom the perspective of the parameters framework,
terms of Generalized Categorial Grammar the Bioprogram Hypothesis claims that children are
embedded in a default inheritance hierar- endowed genetically with a UG which, by default,
chy, providing a natural partial ordering specifies the stereotypical core creole grammar, with
on the setting of parameters. Experiments right-branching syntax and subject-verb-object or-
show that several experimentally effective der, as in Saramaccan. Others working within the
learners can be defined in this framework. parameters framework have proposed unmarked, de-
Ew)lutionary simulations suggest that a fault parameters (e.g. Lightfoot, 1991), but the Bio-
lea.rner with default initial settings for pa- program Hypothesis can be interpreted as towards
rameters will emerge, provided that learn- one end of a continuum of proposals ranging from all
ing is memory limited and the environment parameters initially unset to all set to default values.
of linguistic adaptation contains an appro- 2 The Language Acquisition Device
priate language. A model of the Language Acquisition Device (LAD)
1 Theoretical Background incorporates a UG with associated parameters, a
parser, and an algorithm for updating initial param-
Grmnnmtical acquisition proceeds on the basis of a eter settings on parse failure during learning.
partial genotypic specifica.tion of (universal) grmn- 2.1 The Grammar (set)
mar (UG) complemented with a learning procedure
elmbling the child to complete this specification ap- Basic categorial grammar (CG) uses one rule of ap-
propriately. The parameter setting frainework of plication which combines a functor category (con-
Chomsky (1981) claims that learning involves fix- taining a slash) with an argument category to form
ing the wdues of a finite set of finite-valued param- a derived category (with one less slashed argument
eters to select a single fully-specified grammar from category). Grammatical constraints of order and
within the space defined by the genotypic specifi- agreement are captured by only allowing directed
cation of UG. Formal accounts of parameter set- application to adjacent matching categories. Gener-
ting have been developed for small fragments but alized Categorial Grammar (GCG) extends CG with
even in these search spaces contain local maxima further rule schemata) The rules of FA, BA, gen-
and subset-superset relations which may cause a eralized weak permutation (P) and backward and
learner to converge to an incorrect grammar (Clark, forward colnposition (I?C, BC) are given in Fig-
1992; Gibson and Wexler, 1994; Niyogi and Berwick, ure 1 (where X, Y and Z are category variables,
1995). The solution to these problems involves defin- [ is a vm'iable over slash and backslash, and ...
ing d(,fault, umnarked initial values for (some) pa- denotes zero or more further flmctor arguments).
rameters and/or ordering the setting of paraineters Once pernmtation is included, several semantically
during learning. l\¥ood (1993) is a general introduction to Categorial
Bickerton (1984) argues for the Bioprograin Hy- Grammar mid extensions to the basic theory. The most
pothesis a.s an explanation for universal similarities closely related theories to that presented here are those
between historically unrelated creoles, and for the of Steedman (e.g. 1988) and Hoffman (1995).
418
Forward Application:
X/Y Y ~ X A y [X(y)] (y) ::~ X(y)
Backward Application:
Y X\Y ~ X A y [X(y)] (y) =~ X(y)
Forward Composition:
X/Y Y/Z ~ X/Z y [X(y)] A z [Y(z)] =~ A z [X(Y(z))]
Backward Composition:
Y\Z X\Y ~ X\Z z [Y(z)] A y [X(y)] ~ A z [X(Y(z))]
(Generalized Weak) Permutation:
(XIY1)... IY, ~ (XIYn)IYI... A Yn..-,Yl [X(yl ...,y,.)] =V A Yl,Y .... [X(yl ...,Yn)]
Figure 1: GCG Rule Schemata
Kim loves Sandy this system can handle (long-distance) scrambling
NP (S\NP)/NP NP elegantly and generates mildly context-sensitive lan-
kim' A y,x [love'(x y)] sandy' guages (Joshi et al, 1991).
P The relationship between GCG as a theory of UG
(S/NP)\NP (GCUG) and as a the specification of a particu-
A x,y [love'(x y)] lar grammar is captured by embedding the theory
-BA in a default inheritance hierarchy. This is repre-
S/NP sented as a lattice of typed default feature structures
A y [love'(kim' y)] (TDFSs) representing subsumption and default in-
FA heritance relationships (Lascarides et al, 1996; Las-
S carides and Copestake, 1996). The lattice defines
love'(kim' sandy') intensionally the set of possible categories and rule
schemata via type declarations on nodes. For ex-
Figure 2: GCG Derivation for Kim loves Sandy ample, an intransitive verb might be treated as a
subtype of verb, inheriting subject directionality by
default from a type gendir (for general direction).
equivalent derivations for Kim loves Sandy become For English, gendir is default right but the node of
available, Figure 2 shows the non-conventional left- the (intransitive) functor category, where the direc-
branching one. Composition also allows alterna- tionality of subject arguments is specified, overrides
tive non-conventional semantically equivalent (left- this to left, reflecting the fact that English is pre-
branching) derivations. dominantly right-branching, though subjects appear
GCG as presented is inadequate as an account of to the left of the verb. A transitive verb would in-
UG or of any individual grammar. In particular, herit structure from the type for intransitive verbs
the definition of atomic categories needs extending and an extra NP argument with default directional-
to deal with featural variation (e.g. Bouma and van ity specified by gendir, and so forth. 2
Noord, 1994), and the rule schemata, especially com- For the purposes of the evolutionary simulation
position and weak permutation, must be restricted described in §3, GC(U)Gs are represented as a se-
in various parametric ways so that overgeneration quence of p-settings (where p denotes principles or
is prevented for specific languages. Nevertheless, parameters) based on a flat (ternary) sequential en-
GCG does represent a plausible kernel of UG; Hoff- coding of such default inheritance lattices. The in-
man (1995, 1996) explores the descriptive power of a 2Bouma and van Noord (1994) and others demon-
very similar system, in which generalized weak per- strate that CGs can be embedded in a constraint-based
mutation is not required because functor arguments representation. Briscoe (1997a,b) gives further details of
are interpreted as multisets. She demonstrates that the encoding of GCG in TDFSs.
419
NP N S gen-dir subj-dir applic
AT AT AT DR DL DT
NP gendir applic S N subj-dir
AT DR DT AT AT DL
"applic NP N gen-dir subj-dir S
DT AT AT DR DL AT
Figure 3: Sequential encodings of the grammar fragment
heritance hierarchy provides a partial ordering on postpositions in which specifiers and modifiers follow
parameters, which is exploited in the learning pro- heads. There are other languages in the SOV family
cedure. For example, the atomic categories, N, with less consistent left-branching syntax in which
NP and S are each represented by a parameter en- specifiers and/or modifiers precede phrasal heads,
coding the presence/absence or lack of specification some of which are attested. "German" is a more
(T/F/?) of the category in the (U)G. Since they will complex SOV language in which the parameter verb-
be unordered in the lattice their ordering in the se- second (v2) ensures that the surface order in main
quential coding is arbitrary. However, the ordering clauses is usually SVO. 4
of the directional types gendir and subjdir (with There are 20 p-settings which determine the rule
values L/R) is significant as the latter is a more spe- schemata available, the atomic category set, and so
cific type. The distinctions between absolute, de- forth. In all, this CGUG defines just under 300
fault or unset specifications also form part of the grammars. Not all of the resulting languages are
encoding (A/D/?). Figure 3 shows several equiva- (stringset) distinct and some are proper subsets of
lent and equally correct sequential encodings of the other languages. "English" without the rule of per-
fragment of the English type system outlined above. mutation results in a stringset-identical language,
A set of grammars based on typological distinc- but the grammar assigns different derivations to
tions defined by basic constituent order (e.g. Green- some strings, though the associated logical forms are
berg, 1966; Hawkins, 1994) was constructed as a identical. "English" without composition results in
(partial) GCUG with independently varying binary- a subset language. Some combinations of p-settings
valued parameters. The eight basic language fami- result in 'impossible' grammars (or UGs). Others
lies are defined in terms of the unmarked order of yield equivalent grammars, for example, different
verb (V), subject (S) and objects (0) in clauses. combinations of default settings (for types and their
Languages within families further specify the order subtypes) can define an identical category set.
of modifiers and specifiers in phrases, the order of ad- The grammars defined generate (usually infinite)
positions and further phrasal-level ordering param- stringsets of lexical syntactic categories. These
eters. Figure 4 list the language-specific ordering strings are sentence types since each is equivalent
parameters used to define the full set of grammars to a finite set of grammatical sentences formed by
in (partial) order of generality, and gives examples selecting a lexical instance of each lexicai category.
of settings based on familiar languages such as "En- Languages are represented as a finite subset of sen-
glish", "German" and "Japanese". 3 "English" de- tence types generated by the associated grammar.
fines an SVO language, with prepositions in which These represent a sample of degree-1 learning trig-
specifiers, complementizers and some modifiers pre- gers for the language (e.g. Lightfoot, 1991). Subset
cede heads of phrases. There are other grammars in languages are represented by 3-9 sentence types and
the SVO family in which all modifers follow heads, 'full' languages by 12 sentence types. The construc-
there are postpositions, and so forth. Not all combi- tions exemplified by each sentence type and their
nations of parameter settings correspond to attested length are equivalent across all the languages defined
languages and one entire language family (OVS) is by the grammar set, but the sequences of lexical cat-
unattested. "Japanese" is an SOV language with egories can differ. For example, two SOV language
3Throughout double quotes around language names renditions of The man who Bill likes gave Fred a
are used as convenient mnemonics for familiar combina-
tions of parameters. Since not all aspects of these actual 4Representation of the vl/v2 parameter(s) in terms
languages are represented in the grammars, conclusions of a type constraint determining allowable functor cate-
about actual languages must be made with care. gories is discussed in more detail in Briscoe (1997b).
420
gen vl n subj obj v2 mod spec relcl adpos compl
Engl R F R L R F R R R R R
Ger R F R L L T R R R R R
Jap L F L L L F L L L L ?
Figure 4: The Grammar Set - Ordering Parameters
present, one with premodifying and the other post-
modifying relative clauses, both with a relative pro-
noun at the right boundary of the relative clause, are
shown below with the differing category highlighted.
Bill likes who the-man a-present Fred gave 1. The Reduce Step: if the top 2 cells of the
NP8 (S\NP,)\NPo Rc\(S\NPo) NPs\Rc NPo2 stack are occupied,
NPol ((S\NPs)\NPo2)\NPol then try
The-man Bill likes who a-present Fred gave a) Application, if match, then apply and goto
NPs/Rc NPs (S\NPs)\NPo Rc\(S\NPo) NPo2 1), else b),
NPol ((S\NPs)\NPo2)\NPol b) Combination, if match then apply and goto
1), else c),
c) Permutation, if match then apply and goto
2.2 The Parser 1), else goto 2)
The parser is a deterministic, bounded-context 2. The Shift Step: if the first cell of the Input
stack-based shift-reduce algorithm. The parser op- Buffer is occupied,
erates on two data structures, an input buffer or then pop it and move it onto the Stack to-
queue, and a stack or push down store. The algo- gether with its associated lexical syntactic cat-
rithm for the parser working with a GCG which in- egory and goto 1),
cludes application, composition and permutation is else goto 3)
given in Figure 5. This algorithm finds the most left- 3. The Halt Step: if only the top cell of the Stack
branching derivation for a sentence type because Re- is occupied by a constituent of category S,
duce is ordered before Shift. The category sequences then return Success,
representing the sentence types in the data for the else return Fail
entire language set are designed to be unambiguous
relative to thi s 'greedy', deterministic algorithm, so The Match and Apply operation: if a binary
it will always assign the appropriate logical form to rule schema matches the categories of the top 2 cells
each sentence type. However, there are frequently al- of the Stack, then they are popped from the Stack
ternative less left-branching derivations of the same and the new category formed by applying the rule
logical form. schema is pushed onto the Stack.
The parser is augmented with an algorithm which The Permutation operation: each time step lc)
computes working memory load during an analy- is visited during the Reduce step, permutation is ap-
sis (e.g. Baddeley, 1992). Limitations of working plied to one of the categories in the top 2 cells of the
memory are modelled in the parser by associating a Stack until all possible permutations of the 2 cate-
cost with each stack cell occupied during each step gories have been tried using the binary rules. The
of a derivation, and recency and depth of process- number of possible permutation operations is finite
ing effects are modelled by resetting this cost each and bounded by the maximum number of arguments
time a reduction occurs: the working memory load of any functor category in the grammar.
(WML) algorithm is given in Figure 6. Figure 7 gives
the right-branching derivation for Kim loves Sandy, Figure 5: The Parsing Algorithm
found by the parser utilising a grammar without per-
mutation. The WML at each step is shown for this
derivation. The overall WML (16) is higher than for
the left-branching derivation (9).
The WML algorithm ranks sentence types, and
421
no reviews yet
Please Login to review.