The modern computational approach to linguistics and language technology, can be traced back to work by Noam Chomsky in the 1950s. Although trained as a linguist, Chomsky brought a computational perspective to the study of language. The theory he introduced, which he called generative grammar, held that grammars for human languages are systems that generate the sentences of a language through a computational process. Chomsky's first job at MIT was part of an early project on machine translation in the '50s. This probably seemed like a natural fit, since Chomsky combined computational interests with his linguistics background. But Chomsky did not have the slightest interest in machine translation. Chomsky was after something completely different. What he wanted to do was redefine the very subject matter of linguistics. Centuries earlier, Descartes had considered the possibility that a machine could duplicate the abilities of a human being. He rejected this. Because of the human ability to use language in all its infinite variability, no machine could duplicate this ability since any machine would have a fixed repertoire of actions that can perform. What Descartes was really saying is that human language was a miracle that would be forever beyond our ability to understand. It is this miracle that Chomsky sought to explain. He focused squarely on the infinite nature of human language, and he wanted to explain Descartes universal instrument by appealing to Turing's universal instrument, the computer. Chomsky begins with one of the most basic facts of our language ability, the fact that we can recognize the sentences of our own language. The language of English can simply be thought of as a very long list of sentences. But of course, there are lots of sentences that do not go on the list. Ellen greeted Abe with pleasure, goes on the list, but not, pleasure Abe with greeted Ellen. Chomsky argued that there's a computational process underlying this ability. This process can generate all and only the sentences of English. Chomsky called this a generative grammar, and with this, he had transformed the subject matter of linguistics which now concerned the specific computational processes that could capture the human abilities in language. In the 1950s, this was completely uncharted territory. If English was generated by some sort of Turing machine, Chomsky needed to map out the space of possibilities, and he proceeded to define a hierarchy of different types of Turing machines. The simplest type he called a finite-state machine. Like any computer, it could read input and based on that input, change its state, in effect, it remembers the input. Now, Turing had envisioned his original machine as containing an infinitely long paper tape, so there was no particular limit on the things that it could remember. The finite-state machine comes with a pre-determined limit on the number of states, in other words, a limit on the number of things that it can remember. A finite-state machine has turned out to be a very attractive model of computing because it has a fixed limit on things to keep track of, it's easy to work with and very efficient. You can imagine that computer engineers were not fond of the infinitely long paper tape in the Turing machine. Chomsky's insight was that the restriction of a finite-state machine reflected some fundamental properties of language. Think about sentences with either, or. "Either the sun shines tomorrow, or we cancel the picnic". Chomsky makes the simple observation that you need to remember the occurrence of either until you encounter the or, otherwise it makes no sense. The or is dependent on the preceding either. We can write this as either S or S, where each S can be replaced by any sentence. There are other such dependencies such as if, then. As in, if it rains, then we cancel the picnic, so we can have sentences of the form if S then S. Here's where things get interesting. Since each S stands for any sentence, whatever, it can itself contain a dependency, so we can have, if either it rains or it's windy, then we cancel the picnic. The either, or dependency is embedded in the if, then dependency. Of course, there are many such dependencies in English, and there are also many cases where an S contains within itself another S. By this reasoning, Chomsky argues, there's no way to place a limit on such dependencies. Chomsky's first book, published in 1957, asks the question, "Is English a finite-state language?" By posing a mathematical question about language, Chomsky had initiated an intellectual revolution not just about language, but about the study of human mental abilities in general. He gives a mathematical proof that the answer is no. English is too complicated to be described by any finite-state machine. A different kind of computation was required, he argued. Chomsky was showing for the first time that the basic questions of language could be addressed in computational mathematical ways. The resulting excitement is difficult to describe. In the early 60's, the study of language began to attract new kinds of students. Suddenly, there were concrete, well-defined questions to ask about language. How exactly do you turn an active sentence into a passive? What is the precise relation between a question and its answer? These are some of the most basic ordinary questions about language, but posing them in a computational context, meant they had to be given completely explicit answers. It rapidly became clear that traditional grammars were hopelessly vague and incomplete. Take the relation of a question and an answer: Susan saw Ted. It could be an answer to, "Who did Susan see?". The word 'who' specifies a hole or gap which should be filled in by the answer. In this case, the gap is filled in by 'Ted', the object of the verb "see". So to understand the question, "Who did Susan see?", you need to understand that there's a connection or dependency between the WH word, who, and the object position of the verb 'see'. There are several types of WH words: who, why, when and so on, and the type of WH word specifies the type of answer word that is being sought. For 'who did Susan see, the answer word is a noun, that name something that could be the object of see. For 'when did Susan see Ted, the answer wish to describe a point in time, as in, Susan saw Ted Saturday. Remember, the goal was to describe this explicitly so it could be carried out by a computer, and this seemed quite straightforward. For example, to construct the question, 'who did Susan see?' from 'Susan saw Ted', we replace Ted by ''who'' and then move ''who'' to the beginning. Same thing with 'when did Susan see Ted' and 'Susan saw Ted Saturday'. Replace ''Saturday'' with 'when' and move it to the beginning. We have an algorithm for turning a statement into a question and back again. A word in the statement is replaced with the right type of question word and that is moved to the beginning leaving a gap. Another way of saying this is that there's a dependency between a question word and a gap. You might wonder, are there any limitations on these dependencies between question words and gaps? A simple, obvious question about one of the most basic and important aspects of English, but it appears never have been considered previously in the thousands of years in which people had been observing and obsessing over language. One obvious limitation might be that the gap can't be too far away from the question word, but it was rapidly discovered that the dependency could be stretched seemingly indefinitely, and because of observations like this, the dependency came to be called a long distance dependency. You can have the gap as far away as you like, but John Ross, one of Chomsky's early colleagues, wanted to look further and Ross noticed some puzzling cases that did not seem to work. For example, you can say, Tom asked when Susan saw Ted. Let's try to form a question by replacing 'Ted' with 'who' and moving it to the front giving, "Who did Tom ask when Susan saw?". Something goes wrong here. Ross had discovered a limitation on these dependencies. He went on to catalog many cases where the gap occurs in a position that does not allow a WH word to depend on it. Ross called these positions islands, since they were positions where the gap is somehow stuck and couldn't connect with the WH word. According to our algorithm, the WH word moves from the gap position to the beginning of the sentence. Sometimes it brings other words along with it. So from the sentence "Susan saw the tall, blond man", we can form a question, "which tall, blond man did Susan see? ". The words 'tall, blond man' move along with the question word 'which'. Ross named this process pied-piping based on the Pied Piper fairy tales. These early researchers in general of linguistics had to simply make up names for many of the phenomena they were observing. These are straightforward observations about some of the most basic features of language, but somehow no one had noticed them before.