all bits considered data to information to knowledge

18Mar/120

Linguistic Darwinism, compliments of Big Data

With the shift from printed to digitized word  the languages finally could be analyzed in ways not possible before.

Over 5,000,000 books have been scanned, digitized and plugged into Internet maelstrom, and unstructured data analysis techniques  have evolved to the point where it could yield insights some might have intuitively anticipated but could never quite prove it. It took Big Data and Google's Culturomics project to make the breakthrough, and the results are in - it's a linguistic jungle out there, and the "survival of the fittest" principle governs the life and death of the words.

A team of authors in the article "Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death" published in the current issue of Science magazine examines these principles, and Christopher Shea of WSJ popularizes the results in his

Turns out that English language has over a million words, and continues to grow at the rate of ~8,500 words per year (the 2002 Webster's Third New International Dictionary has 348,000). And sporting career of a new word is about 30 to 50 years after which a word either disappears into the quick sands of archives or enters permanent lexicon. The process was undoubtedly sped up with the advent of the Internet, and proliferation of spellcheckers could have made it more rigorous.

The pattern is virtually identical across the three analyzed languages (English, Spanish and Hebrew).

25Aug/100

QED: a glimpse of the future for Java

I blogged about uncertainty facing Java (and Sun's software stack) now that is being owned by Oracle. It appears that my fears are about to come true....:(

Jeffrey S. Hammond, senior analyst at Forrester Research, said he worries that Oracle's lawsuit will not only dampen Android's market momentum, but slow overall adoption of Java in mobile environments and elsewhere.

Tagged as: , , No Comments
26Jan/100

Tunnel vision(s)

At the inaugural meeting of the New York Technology Council Thursday night, Google Vice President of Research Alfred Spector and Microsoft architect evangelist Bill Zack debated their views on how data will be stored and shared in the future.

Google leads shift to the "web as platform" paradigm, and Microsoft has a grip on desktop, and - to a significant degree - on the server market. Not surprisingly, they see the world through their respective rosy glasses: Google wants everything to be in the cloud ("network computing", anyone?), and Microsoft puts forward his "three screens" strategy blancing its cash cows - Windows + MS Office - with a bet on cloud computing, the new Azure platform. Google does not have the legacy ties, it was in the cloud business from day one, though recent developments such as Android and Chrome OS indicate that they might be bridging the gap in opposite direction...

If the only tool one has is a hammer suddenly every problem starts looking like a nail..

11Jan/100

Power tends to corrupt…

Researcher exposes Google spyware connections

"According to Ben Edelman, an assistant professor at the Harvard Business School and a staunch anti-spyware advocate, Google is charging advertisers for what he described as “conversion-inflation” traffic from the WhenU spyware program."

This might be a wake up call for some of anti-Microsoft crowd happily marching under Google liberation banner.

John Emerich Edward Dalberg Acton once famously remarked: "Power tends to corrupt, and absolute power corrupts absolutely. Great men are almost always bad men." To paraphrase: "great companies are almost always bad companies". This was true of Microsoft, Oracle, Sun, IBM - all of which used and abused their power in their respective heydays. Google will be (is?) no exception. As Bedouin saying has it: "trust in Allah, but tie your camel first".