Another Argument Against Wh-Trace*

Ivan A. Sag
Stanford University

1 Introduction

Recent work has called the existence of wh-traces into question. I summarize these results in the remainder of this section, updating the argumentation of Sag and Fodor (1994). In Section 2, I provide an independent argument, based on the interaction of coordination and extraction facts in English, that the theory of grammar is better off without wh-traces.

1.1 Processing Complexity

Pickering and Barry (1991) argue on psycholinguistic grounds for a theory of extraction in which an extracted element is associated with its semantic role, as soon as the head licensing this role is encountered. Such a theory would account for the difference in processing effort required to parse the examples below. Often, the extraction site and the head that licenses it are adjacent (as in (1)). This is not the case for (2).

(1)The policeman saw the boy that the crowd at the party accused __ of the crime.
(2)That's the prize which we gave [every student capable of answering every single tricky question on the details of the new and extremely complicated theory about the causes of political instability in small nations with a history of military rulers] __ .

Surprisingly, such examples are relatively easy to process, unlike examples in which a constituent follows an extremely long preceding constituent, as in (3).

(3)We gave [every student capable of answering every single tricky question on the details of the new and extremely complicated theory about the causes of political instability in small nations with a history of military rulers] [a prize].

If interpretation of extracted elements correlates with the position of a trace at the extraction site, this contrast remains unexplained.

But if termination of the long-distance dependency takes place when the head selecting for the extracted element (rather than an empty category) is encountered, then this should be not surprising. Further evidence for this hypothesis comes from cases such as (4) and (5) below.

(4)John found the saucer [which Mary put the cup [which I poured the tea into __ ] on __ ].
(5)John found the saucer [on which Mary put the cup [into which I poured the tea __ ] __ ].

The acceptability of center-embedding is normally only marginal, but the situation improves dramatically if the extracted elements can in fact be associated in a non-nested fashion. This is mirrored by the contrast between (4), which is hard to process, and (5), which is not. Thus the contrast between these examples can be explained if it is assumed that the verb terminates the processing of the extraction dependency, rather than a trace occurring in a position located several words after the verb. (5) is then predicted to be easy to process because it involves no processing of a double extraction dependency: the first extraction dependency is fully terminated by the verb put, i.e. before the second dependency even begins. Processing facts therefore provide an argument in favor of eliminating wh-trace from the theory of. (It should be noted that there is more recent work that tries to reconcile such facts with the trace-based analysis by making specific (head-driven) assumptions about the structures posited by the human sentence processor. See, for example, Gibson and Hickok (1993) and Gorrell (1993). The fact remains, however, that the predictions of any such approach are at best just those that follow immediately from the traceless analysis of extraction dependencies.)

1.2 Wanna Contraction

The most intensely debated argument concerning the status of traces is wanna-contraction. It has been proposed that a simple account of the relevant facts (6) follows from the assumption that a phonological contraction rule is prevented from applying because of the presence of an intervening wh-trace:

(6)a.Does Kimi want PROi to (wanna) go to the movies?
 b.Whoj does Kimi want PROi to (wanna) go to the movies with __ j?
 c.Whoj does Kimi want __ j to (*wanna) go to the movies?

However, the plausibility of this account is seriously undermined by the fact that other forms of contraction are not subject to the same restriction:

(7)Whoj does Kimi think __ j is (think's) beneath contempt?

Attempts to reconcile this observation with the proposed constraint on wanna-contraction either involve unmotivated rule ordering, or accounts of the contraction data that are highly implausible on morphosyntactic grounds. An additional problem, suggesting that there might not be a rule at all, is the arbitraryness of the set of verbs to which it applies: gonna, hafta, *intenna (intend to), *lufta (love to), *meanna (meant to). Indeed, as Pullum (1997) shows in detail, the optimal analysis of this entire class of verbs involves no rule of 'Wanna Contraction'—wanna forms are morphologically derived. The relevant morpholexical rule applies to seven verbs that select for a single infinitival complement; hence transitive want (which selects for two complements) has no such form. Pullum's analysis explains all phenomena previously discussed in the literature, as well as further data that serve to distinguish his proposal from others that have been advanced. Moreover, this descriptively superior analysis of wanna and related forms makes no use of traces.

1.3 Auxiliary Contraction

Another putative argument for the existence of wh-trace is based on auxiliary contraction, which has been claimed to be impossible if the auxiliary precedes a trace or a site where VP-Ellipsis has applied.

(8)a.The butcher is laughing and the baker s (*baker's), too.[VP-ellipsis]
 b.How tall do you think she is (*'s) __ ? [wh-Extraction]

However, Selkirk (1984) and others have argued that the failure of auxiliary cliticization in these cases has a purely phonological explanation in terms of the following condition:

(9)The Accent Condition: Lack of accent is a necessary condition on auxiliary reduction.

Only unaccented forms undergo vowel reduction and then cliticization; the principles that determine accent placement interact to make examples like (9) impossible. On Selkirk's view, there is a general phonological constraint on destressing which blocks this rule from applying if it is followed by a silent demibeat (which naturally occurs at the end of a metrical grid, e.g. a clause boundary). In a more recent optimality theoretic account of these and related data, Selkirk (1995) argues that phrase-final function words are themselves prosodic words heading a phonological phrase. Hence stranded auxiliaries must be stressed and cannot be contracted.

Pullum and Zwicky (1997) accept Selkirk's Accent Condition, but argue that there are specific constructions that require accent and hence cannot undergo auxiliary contraction. Hence, they argue, the facts cannot be derived entirely in terms of principles as general as the principles of metrical structure offered by Selkirk. For my purposes here, it is of little consequence how this minor difference between the two analyses is resolved. Both Selkirk's analysis and that of Pullum and Zwicky rest on independent observations concerning English phonology, are more general than—and arguably superior to—earlier accounts, and make no appeal to the presence of wh-trace. Hence, as Pullum and Zwicky conclude, "we have removed a significant case of purported evidence for traces."

1.4 Floated Quantifiers and Adverbs

Floated quantifiers may not appear directly before an extraction site:

(10)a. They (all) were (all) completely satisfied.
 b. How satisfied do you think they all were __ ?
 c.*How satisfied do you think they were all __ ?

While certain earlier accounts of these examples have made use of traces, the alternative account suggested by Sag and Fodor does not. It is based on the analysis of floated quantifiers developed by Dowty and Brodie (1984), in which such quantifiers, like certain adverbial modifiers, are base-generated as VP-adjoined or AP-adjoined modifiers:

(11)a.VP:[all [went to the store]]
 b.AP:[all [very satisfied]]

Brodie and Dowty's proposal addresses the semantic relation of the adjoined quantifier and the subject NP in some detail. By eliminating the transformation of Quantifier Floating, their account also provides a basis for treating the known discrepancies between what can occur in NP-initial and in 'floated' position. Finally, the Brodie-Dowty analysis also appears to provide an immediate explanation for the problematic cases above—given a traceless account of extraction. That is, on a traceless extraction analysis, there is no way to generate a sentence like (10c), as there is no empty constituent for the quantifier (or adverb) to adjoin to. No further constraints (such as the trace-based one proposed in Sag 1980) are required. Floated quantifiers and adverbials thus provide a second argument against the the existence of wh-trace.

2 A New Argument.

It was Ross (1967) who codified the following, now-standard set of data under the rubric of the Coordinate Structure Constraint (CSC) and its 'Across-the-Board' (ATB) exceptions.

(12)a.*Which dignitaries do you think [[Sandy photographed the castle] and [Chris visited __ ]]?
 b.*Which dignitaries do you think [[Sandy photographed __ ] and [Chris visited the castle]]?
 c. Which dignitaries do you think [[Sandy photographed __ ] and [Chris visited __ ]]?
(13)a. Which of her books did you read both [[a review of __ ] and [a reply to __ ]]?
 b.*Which of her books did you find both [[a review of Gould] and [a reply to __ ]]?
 c.*Which of her books did you find both [[a reply to __ ] and [a review of Gould's new book]]?
(14)a.*Which of her books did you find both [[a review of __ ] and [ __ ]]?
 b.*Which of her books did you find [[ __ ] and [a review of __ ]]?
 c.*Which rock legend would it be ridiculous to compare [[ __ ] and [ __ ]]?
(cf. Which rock legend would it be ridiculous to compare __ with himself?)

The syntactic literature has seen various attempts to derive Ross's constraint, which he formulated as in (15) (with subconstraints here as named by Grosu (1973)).

(15)Coordinate Structure Constraint (Ross 1967):
 In a coordinate structure, conjunct may be moved (Conjunct Constraint)
 b.nor may any element contained in a conjunct be moved out of that conjunct. (Element Constraint)

However, as is often noted, transformational theories have failed to provide satisfactory accounts of the CSC. The following citations are typical:

(16)a.Napoli (1993: 401): "Notice that while Subjacency accounts for the CNPC, the SSC, the Subject Condition, and the wh-islands, it cannot account for the ungrammaticality of movement out of coordinate structures..."
 b.Napoli (1993: 409): "It is quite probable that a parallelism requirement on coordination is responsible for the unacceptability of extraction from CSs."
 c.Roberts (1997:188): "The CSC has to a large extent resisted satisfactory theoretical treatment in the principles-and-parameters framework."

The Element Constraint has been the subject of considerable controversy, much of which is centered on whether apparent counterexamples are indeed coordinate structures. I will not enter this controversy here (for recent opposing views, see Postal 1998 and Kehler 1998). Rather, I will argue that the reason why the Conjunct Constraint CC has resisted explanation within transformational grammar, broadly construed, is simply that researchers have assumed the existence of wh-trace. The Conjunct Constraint follows as a direct consequence of an extraction theory that countenances no wh-traces.

One of the few explicit proposals for explaining the CC in terms of wh-trace is due to Goodall (1987), whose novel treatment of coordinate structures in terms of the union of reduced phrase markers need not concern us here in detail. Suffice it to say that Goodall's coordination theory predicts Ross's across-the-board effect, but leaves the deviance of examples like those in (14) unexplained.

To explain the deviance of these examples, and all CC violations, Goodall appeals to Principle C of the binding theory. He argues that each trace functioning as a conjunct c-commands all sister conjuncts, and hence cannot be coindexed with any R-expression within any other conjunct. But since each sister conjunct must contain a coindexed wh-trace (a kind of R-expression) in order to be locally well-formed in the extraction context, the CC follows, Goodall claims, from the interaction of his coordination theory and the binding theory.

There is a problem with this explanation of the CC. First, as various people have pointed out, there are counterexamples to (all extant formulations of) Principle C. The following are representative. (For discussion of examples like these, see Bolinger 1979, McCray 1980, and Bresnan to appear.)

(17)a.Hei did what Johni always does...
 b.Shei was told that if shei wanted to get anywhere in this dog-eat-dog world, Maryi was going to have to start stepping on some people.
 c.The teacher warned himi that in order to succeed Walteri was going to have to work a lot harder from now on.
 d.It was rather indelicately pointed out to himi that Walteri would never become a successful accountant.
 e.If you try to tell himi that the reason why John'si dog was taken away from him was rabies, he'll get very upset.
 f.I've never been able to talk to himi about the examples Johni claimed would refute my theory.
 g.I've never been able to explain to heri that Betsyi's gophers destroyed my lawn each spring.

This has led many to abandon the idea of Principle C as a grammatical constraint, in favor of a more pragmatically oriented approach that would explain the deviance of putative Principle C violations in discourse-based terms.

However the dilemma raised by the acceptability of examples like (17) may be resolved, it is plain that in examples where the conjunct is a pronoun (instead of a wh-trace), coindexing with an R-expression in another conjunct should not be grammatically excluded:

(18)a.We invited [[Betsy'si mother] and [heri]] to the ceremony.
 b.A disagreement arose between [[Clinton'si bodyguard] and [himi]] over White House security.
 c.A disagreement arose between [[each candidate'si campaign manager] and [himi]] over the protocols for the debate.
 d.The winners of the four awards were: [[Jonesi], [Tanaka], [Yoo], and [Jonesi]].
 (Gazdar et al. 1982)

The clear grammaticality of examples of this kind, as well as the malleability of the acceptability of many examples thought to be governed by Principle C, stands in marked contrast to the stark ungrammaticality of CC violations. Goodall's attempt to explain the CC in terms of binding theory leaves this contrast entirely unexplained and hence is inadequate. Moreover, Goodall's is the only serious attempt in the last two decades to explain the CC in transformational terms.

Though there is no extant transformational explanation of the CC that I am aware of, the CC follows immediately within a traceless account of extraction. The relevant reasoning is this:

(19)a.A wh-gap is simply a position where an element selected by a head (whether complement or adjunct) fails to be realized (rather than a position where a phonetically unrealized constituent is syntactically realized).
 b.The elements that are coordinated, i.e. the conjuncts of a coordinate structure, must be syntactic constituents (or perhaps sequences thereof).
 c.Conjunctions are not heads; rather, coordinate structures instantiate an independent construction type.
 d.Therefore, wh-gaps, which are not constituents, can never be conjuncts.

This result follows without stipulation, as far as I can see, in any traceless analysis of extraction phenomena, as long as conjunctions are not treated as argument-selecting heads. (This suggestion that conjunctions are heads, which reappears from time to time in the literature (see, e.g. Munn (1992)), is not particularly intuitive, as the category of the coordinate phrase (as reflected by its outward distribution) is determined by that of the conjuncts, rather than the conjunction. Hence the conjunction-as-head analysis would require an unprecedented chameleon-like categorial behavior on the part of the putative head. There are further objections to be made against this kind of treatment that I will not elaborate here, adopting instead the traditional view that coordinate structures instantiate a sui generis construction type, as argued by Borsley (1994).) Thus, well-known facts having to do with the interaction of coordination and extraction provide a further piece of evidence against the existence of wh-trace.

3 Conclusion

Jorge's reaction when he heard some of these arguments presented at an LSA Meeting was simply: 'OK. I never believed in those trace things, anyway.' If my argumentation here is correct, however, the consequences for the Weltanschauung of most mainstream syntacticians is far more severe. In particular, these conclusions are inconsistent with all versions of Government and Binding Theory, Principles and Parameters Theory, and Minimalism (that I am familiar with). Rather, they suggest that traceless analyses of extraction, like those developed independently in transformationless theories of grammar are on the right track. See, in particular, Ades and Steedman (1982), Gazdar et al. (1984), Kaplan and Zaenen (1989), Pollard and Sag (1994, chap. 9), Steedman 1996, and Bouma et al. (in press).


