Linguistic Darwinism, compliments of Big Data

With the shift from printed to digitized word  the languages finally could be analyzed in ways not possible before.

Over 5,000,000 books have been scanned, digitized and plugged into Internet maelstrom, and unstructured data analysis techniques  have evolved to the point where it could yield insights some might have intuitively anticipated but could never quite prove it. It took Big Data and Google's Culturomics project to make the breakthrough, and the results are in - it's a linguistic jungle out there, and the "survival of the fittest" principle governs the life and death of the words.

A team of authors in the article "Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death" published in the current issue of Science magazine examines these principles, and Christopher Shea of WSJ popularizes the results in his

Turns out that English language has over a million words, and continues to grow at the rate of ~8,500 words per year (the 2002 Webster's Third New International Dictionary has 348,000). And sporting career of a new word is about 30 to 50 years after which a word either disappears into the quick sands of archives or enters permanent lexicon. The process was undoubtedly sped up with the advent of the Internet, and proliferation of spellcheckers could have made it more rigorous.

The pattern is virtually identical across the three analyzed languages (English, Spanish and Hebrew).


Forest behind trees: Story of Enterprise Architecture

I was listening to Story of Human Language audio course the other day. Dr. John McWhorter was explaining how European languages came up with the idea of gender for inanimate objects. The example he was using was silverware in German, with spoon being “he” (der Löffel), fork being “she” (die Gabel), and knife being of neuter gender (das Messer). The current theory maintains that this is the result of gradual changes, small steps taken one at the time that lead to the situation as we see it now. And each of the steps made perfect sense to the people at the time. Yet, the notion of gender in a language, left alone attribution of a specific gender to an object, appears manifestly arbitrary to non-native speakers.

It had occurred to me that this could be a perfect metaphor for ad-hoc Enterprise Architecture without roadmaps: a series of decisions that were a good idea at the time leading to a sorry state of chaos because there was no life-line stretching from "as-is" into the future state

The second distinction made by the professor was that of a language complexity inversely reflecting advancement of a society. Despite popular notion that the more evolved society would have more complex languages, in fact the opposite is true. A language used in a fast paced society loses many accoutrements considered necessary in less advanced societies (e.g. compare the etiquette of a French Royal court of Louis XIV with that of modern France) .

Applied to Enterprise Architecture this would imply that in the organizations with evolved EA programs the IT systems landscape will be less – not more – complex, and more efficient at the same time.