Syntax for the uncertain

(This entry is for the Chomsky skeptic: the type of long distance relationship prohibited among prepositional phrases provides strong evidence for a generativist view of grammar and a computational view of syntax in the brain.)

Anti-Chomskians have focused their attacks on productivity, claiming that novel syntactic structures are rare. Certainly formulaic utterances are rampant in speech and have justly received much attention recently. Diana Sidtis, who has published widely on formulaic utterances, adds to these schematic utterances — utterance patterns structurally fixed like formulae, but not fixed for content. The claim seems to be that if schemata and formulae dominate speech patterns, the generative element is marginal at best, a mere intuitive capacity largely unused.

Setting aside the question of why humans would have such an unused capacity, this argument ignores the essential duality of the Chomsky program. The goal is not just to generate all the sentences of natural language. It’s to generate all and only the sentences of natural language. It doesn’t just explain novelty and unbounded productivity. The really dramatic, interesting and compelling side of Chomsky’s work from the very outset was the other horn of the bull: discovering one mechanism that will both generate all the sentences yet won’t overgenerate. Generative syntax crucially explains why some extremely simple sentences are unprocessable, even when they contain the same structures as more complex and easy-to-process sentences.

Sometimes I think Chomsky and syntax have garnered so many vitriolic enemies because Chomsky’s original examples were not chosen for pedagogical perspicuousness and the computational origins of generative theory are not consistently taught. So here’s an attempt at pedagogical perspicuity which I hope will convert both agnostics and scoffers-in-good-faith.

Both long distance and local relations are possible for prepositional phrases

You walk into the lobby of the hotel. There are several people sitting at the bar and in the lounge, some in suits. You approach the front desk. The attendant tells you you received a call, using one of these sentences:

1. The guy at the end of the bar in the suit with the stripes on the chair with three legs called.
2. The guy at the end in the suit of the bar called.
3. The guy on the chair with three legs at the bar called.

Notice that sentence (1) is easy to understand even though it is long and complex. I’ve yet to encounter a class of undergrads who didn’t understand it instantly. Yet it contains no less than three pairs of prepositional phrases, each pair holding a local relation within the pair and a long distance relation with the subject of the sentence. So

the chair with three legs

is a noun phrase with a prepositional phrase [with three legs] related directly to [the chair]. It’s the chair that has three legs, not the guy.

On the other hand, the stripes are not on the chair, it’s the guy who is on the chair. So there is no relation in this sentence [the stripes on the chair] even though there is a relation [the chair with three legs].

So these prepositional phrases can relate over long distances to the subject, or they can hold a purely local relationship with the nearest noun phrase. Both long distance and local relations are possible for prepositional phrases.

Some long distance relationships are impossible

But now consider sentence (2). It is a simpler string of words: only three prepositional phrases — yet I have not met any English speaker who can process it to get [of the bar] to relate to [at the end] even though it’s semantically obvious and it’s the only semantic possibility. This sentence is not difficult to process; it is impossible! Even when you know what it’s intended to mean, you still can’t get it to mean that.

And yet, it contains the same prepositional phrases, some with local relationships and some with long distance relationships, in no way different from (1), except (2) is simpler and (1) is a great deal more complex. Why is the more complex sentence easy and the simple sentence strictly impossible?

Is it because a prepositional phrase cannot intervene between two related prepositional phrases? Sentence three shows this cannot be the reason.

Sentence (3) has the most complex relationships of all three sentences, and yet it too is relatively easy to process. Imagine there are two guys sitting on three-legged chairs, one chair at the bar and one in the lounge.

3. The guy on the chair with three legs at the bar left this for you.


The guy on [the chair [with three legs] [at the bar]]

where the chair is both at the bar and has three legs.

It’s not hard to understand, even though there is a prepositional phrase intervening between [the chair] and [with three legs].

So prepositional phrases may intervene sometimes but not always. What’s the explanation?

What determines which are possible and which are impossible?

Computational theory early on gave us the answer. A machine that processes language word by word cannot exclude sentences like (2) while including sentences like (1) and (3). But a machine that processes phrases as well as words, can. A finite automaton can produce any and all of the prepositional relationships above, including, unfortunately, (2), which is not possible for native English speakers. A push-down automaton, however, can produce (1) and (3) without any trouble, but is mechanically, physically, structurally, logically unable to produce (2).

The internal structure of a prepositional phrase can be processed by a machine, like a finite automaton, that reads one grammatical category at a time

prep + determiner + noun

in that order. Such a machine consists of a set of states including an initial state and at least one final state and a set of functions that take one state into another depending on input. The initial state here accepts a preposition which takes it into a new state accepting a determiner. Feeding the machine at this point a determiner will take the machine to a noun-accepting state. (When I have a chance, I’ll flesh this out a bit. Meanwhile, if you’re curious, any textbook on computer theory will have a good description of how finite automata work and the push-downs mentioned below.)

To accommodate (1), such a machine could have a structure corresponding to a regular expression like

(P=prep, D=deter, N=noun, *=any number of times including zero)

and to get (2) and (3), it needs simply

where any relationships among the prepositional phrases are allowed.

Such a grammar will allow any number of pairs of locally related prepositional phrases along with unrelated intervening prepositional phrases. In other words, a machine that processes one word at a time can be constructed to process all three sentences: it overgenerates to produce (2) as well.

But a push-down automaton — the kind of machine the accepts context free grammars — can’t be designed to produce (2) and needs no special complexity to accommodate the long distance and local relations of (1) and (3).

The simplest context free grammar that can be constructed to process (1)is:
(S=sentence, NP=noun phrase, VP=verb phrase, PrP=prepositional phrase)
NP=> D, NP
NP=> N
NP=> NP, PrP
PrP=> Pr, NP
VP=> VP, PrP
VP=> V

This simplest grammar, exactly as it is, will also generate (3), but no context free grammar can be constructed to generate (2). (This is all much easier to see with trees, but trees are tough to draw on a blog.)

This is very powerful evidence that the brain has a context free grammar represented in it — not necessarily in a specific place, possibly only in a process distributed through a variety of locations in the brain — but represented somehow.

I haven’t touched here on examples that show that a context free grammar cannot handle all the phenomena of language or on examples that suggest that elements can be moved around by the brain. English speakers have more powerful machinery between their ears capable of taking this fundamental push down structure and playing with it, within some limits. Figuring out the limits is the stuff of current linguistic theory. I am interested here only in presenting sentences that demonstrate that the brains of English speakers must have a pushdown structure that prevents the generation of sentences like (2) which are strictly impossible for native English speakers to process. This demonstration is just for the agnostics and scoffers: How else can you explain why (2) is impossible?


  1. kyle Says:

    im sry but i couldnt make the grammar u specified work; i needed an additional function to connect the sentence:

    NP=> N, NP

    maybe im doing something wrong

  2. rob Says:

    many, many thanks for the comment — I left the rules a mess, but have corrected them in the blog now. they should work smoothly now.

  3. wyatt Says:

    I’m no linguist, but it seems to me that not all prepositions are equal.

    The problem with sentence 2 is that the preposition is “of”. It seems that of always refers to the word preceding it (what you call a ‘local relation’). All the other prepositions–at, in, with, on–do not need refer to the preceding word (thus the can have ‘local relation’and ‘long distance relation’; for with, substitute “with an awesome tatoo” to see this in sentence one.

    In other languages of is covered by the genitive case, and always refers to the preceding word. Other cases need not refer to the preceding word.

    Another observation: sentece on has 3 pairs o prep phrases, asyou mentioned. The pairs make it more intelligible. Take one out, it is less, at tims:

    The guy at the end in the suit with the stripes on the chair with three legs called.

    The guy at the end of the bar in the suit on the chair with three legs called.

    The guy at the end of the bar in the suit with the stripes on the chair called.

    Take 2 out, it gets more confusing:

    The guy at the end of the bar in the suit on the chair called.

    This makes me think the pairing makes the sentence more parsable.
    But i don’t know what a push-down automan is…

    1. The guy at the end of the bar in the suit with the stripes on the chair with three legs called.
    2. The guy at the end in the suit of the bar called.
    3. The guy on the chair with three legs at the bar called.

  4. rob Says:

    Good questions. To test whether “of” is more dependent on proximity, just replace another prep.
    a. The guy on the barstool in the suit with three legs.
    b. The guy in the suit on the chair with the stripes.
    Many prepositions are replaced with cases in other languages: instrumental, dative. But none of that will explain the simple distance relationship between
    c. The woman with the bird sang
    where “the bird sang” is as proximate as can be, but it’s not the meaning of the sentence.
    The most interesting case is
    d. The man on the chair with three legs at the bar
    where it’s the “chair at the bar” which also has three legs. In other words, distance is irrelevant. Or to take an “of” case
    e. …the sensation in my fingers of cold.
    f. …the pages that she read of my book
    g. …pick out the pages in the envelope of my book.
    Regarding the pairing, or the degrading of non pairs, all of these should be placed in a communicative context — sentences are not spoken in a vacuum, but an exchange of information. So if there are three guys at the bar, two of them in suits and only one on a chair,
    h. The guy at the end of the bar in the suit on the chair
    sounds right, and that’s the sensible context in which someone might say it. But no matter how you draw up a context, you won’t be able to make sense of
    i. The guy in the suit on the chair with the stripes…
    as “suit with the stripes” just as
    j. The guy at the end in the suit of the bar
    k. The guy in the suit in the chair with the tie
    where it’s “the suit with the tie” not “the guy with the tie.”

    A pushdown automaton is a machine that has the effect of memory. It accepts categories but requires that they must be filled before the sequence is completed. So you don’t give it the first word of a sentence followed by the second sentence etc until you get to the last word and then accept the whole sentence as long as the sequence follows the order of acceptable parts of speech; instead you give it the category “Sentence” which requires “a noun phrase” plus “a verb phrase” before the sentence is accepted by the automaton. And “a noun phrase” requires at least a plural noun or a name, or an article plus a common noun among many other possibilities; and “a verb phrase” requires at least a verb before “a verb phrase” is ‘discharged’ as it were. So the machine waits until all the categories have been fulfilled by their parts all the way down to individual words and no more categories.

    That kind of push-down (like a cafeteria pile of trays, the categories are the first set down, but they don’t get discharged until everything thing else is piled up on it and removed one-by-one — “Sentence” is the first word it accepts, but it is discharged last, after all the words are filled that discharge all the categories in them and all the categories are filled in “Sentence.”

    You asked. ūüôā

  5. rob Says:

    Small correction in the last reply. In the last sentence:

    ‚ÄúSentence‚ÄĚ is the first word it accepts, but it is discharged last…

    should be changed to

    ‚ÄúSentence‚ÄĚ is the first category it accepts, but it is discharged last….

