317x Filetype PDF File size 0.30 MB Source: www.socsci.uci.edu
Testing the Universal Grammar Hypothesis
1. Introduction
Perhaps the single most controversial claim in linguistic theory is that children learning their
native language face an induction problem, or in other words, that the available input
underspecifies the adult state. This induction problem is known by many names: the “Poverty of
the Stimulus” (e.g., Chomsky 1980a, Chomsky 1980b, Lightfoot 1989, Crain 1991), the “Logical
Problem of Language Acquisition” (e.g., Baker 1981, Hornstein & Lightfoot 1981), and “Plato’s
Problem” (e.g. Chomsky 1988, Dresher 2003). Regardless of the name, it all boils down to the
same claim: the data generally available to young children are compatible with multiple
hypotheses, or perhaps more correctly, data necessary to rule out the incorrect hypotheses are
either not available at all, or not available in sufficient quantity (Lightfoot 1982, Legate & Yang
2002, among many others).
The Universal Grammar (UG) hypothesis was introduced as a solution to this problem
(Chomsky 1957/1975, 1965). The logic of the UG hypothesis is straightforward: if the necessary
evidence for choosing the correct linguistic hypothesis is unavailable in the input, then children
must bring some internal bias to the language learning problem (Chomsky 1981, Hornstein &
Lightfoot 1981, Legate & Yang 2002, among many others). While the necessity of some kind of
bias is generally granted by even the most ardent critics of the UG hypothesis (e,g, Pullum &
Scholz 2002, Regier & Gahl 2004), the nature of the necessary biases is the subject of
considerable debate. First, there is the question of what cognitive objects the bias operates over.
A bias might operate over the representations the child considers as hypotheses (e.g., parameters
of linguistic variation (Chomsky 1981)), the data the child learns from (e.g., only unambiguous
data (Fodor 1998)), or the learning algorithm the child uses to alter belief in competing
hypotheses (e.g. trigger-based learning (Gibson & Wexler 1994, Niyogi & Berwick 1996)).
Second there is the question of whether the necessary bias is specific to language learning (i.e.
domain-specific) or applies generally to any kind of cognitive learning (domain-general). UG is
usually proposed as a collection of domain-specific biases ranging over both the representations
that children consider and the data that children learn from, but this is far from logically
necessary (e.g., Chomsky 1971, 1981, Kimball 1973, Baker 1978, Gordon 1986, Lightfoot 1991)
Recent debates about the UG hypothesis have tended to focus on two broad questions. The
first concerns the existence of the induction problem (e.g., Sampson 1989, 1999, Pullum &
Scholz 2002, MacWhinney 2004, Tomasello 2004), which is, of course, the motivation for the
UG hypothesis. Until recently, the claim that children’s input lacks sufficient evidence for
successful language learning has been based on the intuitions of linguists rather than on large-
scale empirical analyses of child-directed speech. However, without quantifiable evidence for
induction problems, there is no need for the UG hypothesis at all. In fact, Pullum and Scholz
(2002) have claimed just that: using data from the Wall Street Journal corpus (Linguistic Data
Consortium 1993) and the CHILDES database (MacWhinney 2000), they argue that there is no
evidence for an induction problem for several well-known linguistic phenomena in English such
as anaphoric one (Baker 1978) and yes-no questions involving complex subjects (Chomsky
1971). Even granting the existence of an induction problem for a given linguistic phenomenon,
the second broad question follows directly: what is the nature of the prior knowledge necessary to
solve that problem? More specifically, is the knowledge innate or derived from prior learning? Is
the knowledge domain-specific or domain-general? One could imagine any or all of the possible
combinations being applicable to various aspects of the linguistic system: some knowledge may
be innate and domain-general, some innate and domain-specific, some derived from domain-
general knowledge acquired previously, and some derived from domain-specific knowledge
acquired previously. With the proliferation of possible types of prior knowledge, it is not clear
that a single type will be sufficient to solve all of the induction problems in language learning. In
fact, Tomasello (2004) takes this one step further: he argues that the proliferation of specific
suggestions for that prior knowledge in the theoretical literature has rendered the UG hypothesis
untestable through standard scientific falsification. He contends that it will not be possible to
evaluate the UG hypothesis until it is broken down into specific hypotheses about biases with
respect to specific linguistic phenomena.
The project we propose here aims to address both of these questions directly, and in the
process lay out a concrete methodology for testing the UG hypothesis that is in similar in spirit to
what both the critics of the UG hypothesis (e.g., Pullum and Scholz 2002 and Tomasello 2004)
and the supporters of the UG hypothesis (e.g., Chomsky 1957/75, Crain and Pietroski 2002)
propose. Utilizing techniques recently made possible through advances in technology, and
combining aspects of theoretical, experimental, and computational linguistics, it is now feasible
to perform several quantitative tasks relevant to evaluating the UG hypothesis with respect to the
issues discussed above. We can search reasonably large corpora of both adult and child-directed
speech for relevant linguistic structures; we can precisely measure the adult knowledge state
children eventually attain using psycholinguistic techniques from experimental syntax; and we
can implement sophisticated probabilistic learning models (specifically Bayesian models) capable
of operating over the structured representations postulated by linguistic theory. With these
techniques in hand, we plan to investigate the existence of the induction problem by examining
both the realistic data used as input by children (available through resources such as CHILDES
(MacWhinney 2000)) and the knowledge state achieved by adults for complex linguistic
phenomenon such as syntactic islands (e.g., the experiments in Sprouse 2007). We will then
implement Bayesian learning models to test whether unbiased learners can reach the adult
knowledge state given the data available. If unbiased learners cannot do this, then we can
conclude that the induction problem does indeed exist for that phenomenon and that children
require learning biases to succeed. We can then identify what kind of biases lead to acquisition
success by incorporating different types of learning biases into the models (as is done, for
example, for learning anaphoric one in Pearl & Lidz (submitted)). The biases implemented may
be domain-general in nature (e.g., Regier & Gahl 2004, Perfors, Tenenbaum, & Regier 2006,
Pearl & Lidz submitted) or domain-specific (Sakas & Fodor 2001, Pearl & Weinberg 2007, Pearl
2008, submitted, Pearl & Lidz submitted). Crucially, because the Bayesian modeling framework
allows us to accommodate biases of many kinds, from choosing the smallest hypothesis
consistent with the data (Tenenbaum & Griffiths, 2001) to restricting the input to certain clauses
(Lightfoot 1991, Pearl & Weinberg 2007) to constraining the representations under consideration
via parameters (Chomsky 1981), we will be able to both reduce the UG hypothesis to smaller
specific hypotheses and evaluate the necessity of those hypotheses for successful learning (for
instance, as advocated for by Tomasello (2004)).
2. Accurate measures of the primary data
The first step of our investigation is to assess the input that is actually available to children
for various linguistic phenomena. Since the debate regarding the induction problem and the
necessity of UG hinges on the state of children’s input, occurrence facts about child input should
not be based on the intuitions of linguists (an idea advocated extensively in Pullum & Scholz
(2002), for instance). This is particularly true now that corpora of child-directed speech are freely
available, such as CHILDES (MacWhinney 2000). Notably, however, the corpora available are
rarely marked with all the information of interest to a linguist focused on complex syntactic and
semantic phenomena, which are primarily the locus of the induction problem debate (Crain &
Pietroski 2002, Legate & Yang 2002, Pullum & Scholz 2002, Lidz, Waxman, & Freedman 2003,
Reali & Christiansen 2004, Regier & Gahl 2004, Kam et al. 2005, Perfors, Regier, & Tenenbaum
2006, Foraker et al. 2007, Pearl & Lidz submitted, among many others). While some corpora may
contain morphological information or part-of-speech identification, most are simply transcripts of
child-directed speech. We propose to annotate several available child corpora in the CHILDES
database syntactically (using, for example, the features in Government and Binding Theory
(Chomsky, 1981)) via a two-step process. The output of this process will be fully formed
hierarchical structures, so that formal analyses from theoretical linguistics can be easily adopted
as biases in the models we later build (see sections 4 and 5 for details). First, we will use a freely
available dependency tree parser (such as the Charniak parser1) to generate a first-pass syntactic
analysis. Then, we will evaluate the resulting syntactic trees by hand (with the help of
undergraduate research assistants), correcting when necessary, to ensure the accuracy of the
structures generated. We intend to make the final parsed corpora available through CHILDES for
other language researchers to use.
In addition, we propose to investigate adult corpora of conversational speech (such as those
available through TalkBank (http://www.talkbank.org) in order to compare the differences
between adult and child-directed speech for various linguistic phenomena. Often, child-directed
speech corpora are relatively sparse compared to available adult speech corpora, especially if
syntactic annotation is desired, which has led much of the corpus-based linguistic research to rely
on adult-directed speech (e.g., Pullum & Scholz (2002)). Yet, it is a common (and quite
reasonable) argument that child-directed speech may differ quite significantly from adult speech
(see, for example, discussion in Legate & Yang (2002)). Given that recent probabilistic learning
models are sensitive to the relative frequencies of various data (e.g., Foraker et al. 2007), it seems
only prudent to ask, for a given linguistic phenomenon, if the data frequencies do differ. It may
turn out for some linguistic phenomena that the relative frequencies do not vary much between
the speech directed at, say, three-year-olds and the speech directed at adults. This would then
suggest that adult speech corpora may indeed be a reasonable estimate of children’s input for
some phenomena, particularly complex syntactic and semantic interpretation phenomena that are
acquired later in development (e.g., negative polarity items like ‘any’, the interpretation of
connectives such as ‘or’, and binding theory phenomena, as discussed in Crain & Pietroski
(2002)). Given the abundance of adult-directed conversational speech, such a scenario would
provide a far richer source of data from which children’s input could be estimated. However,
should child-directed and adult-directed speech frequencies differ, it will be crucial to this project
to determine not only if, but also in what way they differ, so as to correctly evaluate both our own
models and those potentially offered by others.
Like the child-directed speech, much conversational adult-directed speech is not annotated
with syntactic information. The process we propose to use to generate annotated adult-directed
speech corpora is identical to the process for generating the annotated child-directed speech,
involving a first-pass annotation by a freely available parser and subsequent human evaluation of
the generated annotation. We intend to make the annotated corpora available to the research
community either through TalkBank (http://www.talkbank.org) or the Linguistic Data
Consortium (http://www.ldc.upenn.edu/), a common repository for electronic corpora.
3. Accurate measures of the adult state
The second step of our investigation is to assess the adult knowledge state children eventually
attain. It almost goes without saying that acceptability judgments form the primary measure of the
adult grammar in the field of theoretical syntax; therefore, acceptability judgments are the logical
choice for a quantifiable measure of the adult state. There are at least three reasons for the
predominance of acceptability judgments in the study of adult grammars. First, acceptability
judgments can be provided with little effort from the subject (Schutze 1996, Cowart 1997).
Second, these judgments are highly reliable across speakers of the same language (Cowart 1997,
Keller 2000, Sprouse 2007). Third, these judgments are a robust proxy for grammaticality
(Chomsky 1965, Schutze 1996, Cowart 1997, and many others). Paradoxically, the very
properties that have made acceptability judgments such a valuable data source for theoretical
syntacticians have also served to undermine general confidence in that data. First, because
1 Available through Brown University (ftp://ftp.cs.brown.edu/pub/nlparser/).
judgments are available to any native speaker, linguists have tended to use their own judgments
rather than those of naïve consultants (Christiansen and Edelman 2003). Second, because
judgments are generally reliable across speakers, linguists have tended to use single data points
rather than samples (Bresnan 2007, Cowart 1997). Third, because judgment tasks are often
designed as a choice between grammatical and ungrammatical, until recently relatively little
research has been done on the gradience inherent to acceptability judgments, and the factors that
might be causing or influencing that gradience (Keller 2000, Sorace and Keller 2005).
In response to these concerns, several linguists have developed a set of formal methodologies,
which have collectively come to be known as experimental syntax, for collecting acceptability
judgments. While the details vary from experiment to experiment, experimental syntax
methodologies all have at least four components in common (Featherston 2007, Sprouse 2007).
First, judgments are collected from a sample of naïve consultants, usually at least 10 and ideally
more than 20, to insure that judgments generalize to the broader population. Second, consultants
are presented with a variety of sentences for any given structure under investigation, to insure that
the judgments generalize across lexical items. Third, consultants are presented with a formal
task, such as a Likert Scale task or the Magnitude Estimation task (Stevens 1957, Bard et al.
1996), to help insure that relative acceptability data are not lost to categorical responses. Fourth,
data are analyzed using standard behavioral statistics. For this project, we will use experimental
syntax techniques to measure the relative acceptability of structures in the adult grammar for
comparison to the relative frequencies of those structures in the child-directed speech corpora and
adult conversational speech corpora.
Experimental syntax methodologies have advantages over previous informal collection
techniques too numerous to mention here (see Schutze 1996, Cowart 1997, Keller 2000,
Featherston 2007, and Sprouse 2007 for discussion). However, given the nature of this project - in
particular, the comparison between relative frequencies and acceptability judgments - two of
these advantages bear mention. First, experimental syntax has introduced rating tasks, such as
magnitude estimation (Stevens 1957), that provide a more precise measure of relative
acceptability than previous informal collection tasks. Most informal collection tasks involved
binary rating scales such as yes/no or limited, discrete rating scales such as the 5 or 7 point Likert
scales. All of these limited scales can result in a loss of information to categorization (Bard et al.
1996). In contrast, magnitude estimation places no predefined restriction on the response scale:
subjects may use the entire positive number line for their responses, thus eliminating the
categorization problem. Bard et al. (1996) demonstrated that given such freedom, subjects
routinely distinguish more than 7 levels of acceptability. Furthermore, Sprouse (submitted b) has
demonstrated that subjects’ responses in magnitude estimation tasks are incredibly robust across
samples, even with minor variations to the experimental design (such as modifying the modulus
sentence). Taken together, these facts suggest that newer rating tasks such as magnitude
estimation will provide more detailed data regarding the adult grammar.
Second, experimental syntax has also introduced the principles of factorial experimental
design, which has enabled the investigation of contributions from factors that are traditionally
outside the domain of syntactic theory, but that may still have an effect on both acceptability
judgments and (crucially) relative frequencies. For example, Sprouse (2008, submitted a) both
demonstrate that the acceptability of wh-movement dependencies is affected by the distance of
the dependency (see also Frazier (1989) and Phillips et al (2005)). Specifically, shorter wh-
movement dependencies (1) are significantly more acceptable than longer wh-movement
dependencies (2) despite the fact that syntactic theories predict both structures to be categorically
grammatical.
(1) Jack hoped that you knew who the giant would chase.
(2) Jack knew who you hoped that the giant would chase.
no reviews yet
Please Login to review.