top of page



The questions of how language evolved, the nature of language and what language is for are of central concern to Linguistics and evolutionary Biology. In the latter half of the twentieth century considerable breakthroughs have been made in both fields, aided by theories of general computation and discoveries in archaeology.

Investigation of these three questions has been hindered by common misconceptions about the evolution of language, including the notion that language is primarily a form of communication, language is “learned”, the capacity for language has “evolved” out of more primitive forms of communication, and that language evolved from singing as a form of sexual selection, often referred to as Darwin’s “musical protolanguage”, one of a number of miscalculations made by Darwin on the mechanics of evolution.

Let us deal with these three questions in turn.


Any account of the origin of language must come to terms with what has evolved. Modern Linguistics treats the basic property of language as an innate biological faculty with three distinct yet connected components:

  1. A combinatorial procedure of the mind known as Merge along with wordlike atomic elements, roughly the “CPU” of human language syntax.

  2. The sensorimotor interface that forms part of language’s system for externalisation, including vocal learning and production.

  3. The conceptional-intentional interface, otherwise known as “thought”.

The basic engine that drives human language syntax (1) appears to be far simpler than originally thought. The entire structure of human language can be accounted for by a single operation, namely Merge. This operation takes any two syntactic elements and combines them into a new, larger hierarchically structured expression. In its simplest terms, Merge is just set formation. Given a syntactic object X (either a word-like atom or something that is itself a product of Merge) and another syntactic object Y, Merge forms a new object as the set {X, Y}. For example, if we combine the atom read with books, we form the verb phrase read books. The new syntactic object read books can then be combined with another atom, for example, John to form a new set {John, read books} and so on.

Hierarchical merge operations such as these can be represented in a syntactic tree, as in the example below:

In this example, “the” and “man” are merged to form a Noun Phrase (NP), which, at a higher level is merged with a Verb Phrase (VP) “took the book”, constituted by a Verb “took” and a Noun Phrase“the book”.

The most elementary property of the language capacity is that it enables us to construct and interpret a discrete infinity of hierarchically structured expressions: discrete because there are five-word sentences and six-word sentences, but no five-and-a-half-word sentences; infinite because there is no longest sentence. It is possible to conceive of sentences with infinite syntactic layers embedded within one another through the process of “recursion”, for example John said that Mary said that etc. Language is therefore based on a recursive generative procedure that takes elementary word-like elements from a lexicon and applies repeatedly to yield structured expressions without bound.

Within our tripartite model for language, these syntactic structures, when externalised, are translated into units of forms, known as morphemes, and finally units of sound known as phonemes (or gestural units in the case of sign language), enabling the possibility of communication. The most commonly used modality for externalisation of language is sound but it appears any sensory modality can be used for input or output: sound, sign or touch. We can therefore deduce that vocal learning, which also evolved identically but independently in songbirds (e.g. the Zebra Finch and hummingbird), is not a necessary component of the language faculty.

The conceptual-intentional interface (3) is the least well understood of the three components of language. We know that, at the other interface of internal language, units of syntax in (1) are mapped onto conceptual elements, sometimes known as “sememes” (an analog of phonemes as units of phonology and morphemes for units of morphology).

Conceptual structures are found in other primates, for example actor-action-goal schemata, categorization, possibly the singular-plural distinction, and others. These were presumably recruited for language, but the conceptual resources of humans are far richer. Crucially, even the simplest words and concepts of human language and thought lack the relation to mind-independent entities that appears characteristic of animal communication, a distinction which ultimately caused the failure of all attempts to teach primates human language. In animal communication systems, signs have a one-to-one relation between brain processes and an aspect of the environment to which these processes adapt the animal’s behaviour. For human language and thought, it seems, there is no reference relation in the sense of Frege and Quine. What we understand to be a river, a person, a tree, water and so on, consistently turns out to be a creation of our rich internal perspectives. Take for example the entity referred to by the term river: suppose that the flow of this river is reversed: it is still the same river. Suppose that what is flowing becomes 95 percent arsenic because of discharges from an upstream plant: it is still the same river. The same is true of other quite radical changes in the physical object. On the other hand, with very slight changes it will no longer be a river at all. If its sides are lined with fixed barriers and it is used for oil tankers, it is a canal, not a river. Exploring the matter further, we discover that what counts as a river depends on mental acts and constructions, rather than a direct relation to the physical world. Chomsky even posits that, Just as Merge is innate, so are the meanings of words, or at least substantially. At peaks of language acquisition children learn words at the rate of one per hour, including their rich semantic complexities as in the example of river, a feat that can only be explained by innate knowledge.


We have seen that human language, generated by a single computational procedure, provides us with a rich internal world that mediates our experience of external events. In addition to this rich internal world of thought, language evidently assists with other cognitive functions such as organising thoughts and planning courses of action, for example, logical sequencing is expressed in terms such as “and”, “if” and “then”. Furthermore, since it is possible to generate internal language without externalising it, and it is possible to externalise language without communicating, we can deduce that language’s purpose is primarily thought, secondarily externalisation and tertiarily communication.

Theoretical evidence from the internal architecture of language supports this hypothesis. Language favours hierarchical order over linear order, and therefore computational efficiency over communication, in every instance.

In the sentence instinctively birds that fly swim, the adverb instinctively modifies the verb swim rather than the verb fly, even though fly is closer to instinctively in terms of linear distance, because swim is just one hierarchical level “down” whereas fly is two levels “down”, as illustrated in the syntactic tree below:

Linear order would be easier to understand since the two elements would be adjacent, but linear order would not, as demonstrated above, satisfy independent principles of computational efficiency, which the computational procedure Merge does. This theory is often referred to as the Strong Minimalist Thesis, since Merge, in its simple set formation, requires minimal computation. The Minimalist Program sets out an “ideal solution” to the architecture of language and seeks to approximate data from all languages of the world to this model.


Evidence from archaeological records of unambiguously symbolic artifacts such as shell ornaments and pigment use in the geometric engravings found in the Blombos Cave, indicate that the basic property of language must have sprung up some time between 200,000 years ago when anatomically modern human beings first appeared in Africa and their subsequent exodus out of Africa 60,000 years ago. It is thought that the development of Merge occurred together with, or in fact was the instigator of, a broader expansion of cognitive faculties commonly known as “The Great Leap Forward”. A narrower prediction of 120,000 to 80,000 years ago is likely but not beyond doubt. Whether, the broader or the narrower prediction of Merge’s genesis, it still constitutes the blink of an eye in evolutionary terms (the eye and visual system, by contrast took 550 million years to evolve), and therefore poses a significant challenge to the dogma of evolutionary theory, apparently violating Darwin’s core principle natura non facit saltum, (“nature does not make leaps”).

The only way to account for such a sudden expansion in homo sapiens’ cognitive faculties is by the emergence of an incredibly simple genetic development, one which satisfies the requirements of minimal computation of the basic property, namely Merge. In this way, a simple change to the human genotype (a section of DNA that encodes a trait) can yield a complex and rich change in the human phenotype (an observable characteristic of a species). That no analogue of human language is found in any other species, not to mention a homologue, supports this hypothesis: the totality of lineages in genetic history which did not lead to human language demonstrate the rarity of the basic property.

Let us outline a possible scenario for the emergence of language: a random genetic mutation causes a slight re-wiring of the brain in a single homo sapiens, yielding the computational procedure Merge. This member of his tribe is not able to use Merge to communicate with others since only he possesses the genotype, nor are external forms of the language yet in existence for the internal language to be mapped onto the sensorimotor system, nevertheless, Merge likely presents some selective advantages through an expansion of other cognitive tools (perhaps logical thought and planning), causing the genotype to be passed on through generations. Only once a significant number of members of his tribe actually possesses the phenotype is externalisation of the basic property possible. Yet solving the problem of externalisation is no easy feat, and If we ask ourselves why there are so many languages, the reason might be that the problem of externalisation can be solved in many different and independent ways. Presumably then, the basic property remained as an internal language for many generations before externalisation and subsequently communication became possible.

Darwin’s original vision of evolution conceived only of adaptive selection on individuals, a highly deterministic vision of infinitesimal gradualism, which ignored the stochastic effects associated with large populations. Developments in mathematical modelling since the publication of Darwin’s Origin in 1859 have demonstrated that sometimes adaptive evolutionary change is indeed very slow and gradual, but other times, even large-scale behavioural changes, can be breathtakingly rapid, as is the case with the emergence of language.

This post is an abridged synthesis of Robert C. Berwick and Noam Chomsky’s book “Why Only Us: Language and Evolution” (2016)

Written by Roland Witherow, Director of Witherow Brooke

Recent Posts

See All

The Most Popular Alternative Education Philosophies

Every child is different, and often a mainstream education philosophy isn’t always suited as a ‘one size fits all’. There are many different alternative education philosophies, with varying degrees of


bottom of page