all bits considered data to information to knowledge

18Feb/110

Elementary, my dear Watson!

A new era of was officially introduced on February 14, 2011 with an IBM Watson computer has taken upon a “uniquely human” activity - playing Jeopardy games. The machine was named after IBM founder Thomas J. Watson (in case anyone was wondering about why it was not named after Sherlock Holmes), and it represents a next giant step towards something that was dubbed “artificial intelligence” in 1956, and was almost exclusively in the domain of science fiction ever since.

For a long time it has been understood that simply to possess information does not equal ability to answer questions, let alone the intelligent ones. A search engine, even the most advanced one, relies on keywords to search for information; it is up to humans to come up with clever string of keywords, and it is ultimately human task to decide whether information returned constitutes an answer to the question. Watson takes it a step further - it has to figure out the question, deduct the context, and come up with statistically most-probable answer. This is very different from the Deep Blue computer which beat chess grandmaster Garry Kasparov in 1997. The chess game can be reduced to a set of well defined mathematical problems in combinatorics, a very large set to be sure, but ultimately susceptible to number-crunching power of the computer - no ambiguity, no contextual variations. The IBM Watson had to deal with uncertainty of human language; it had to interpret metaphors, it had to understand nuances of human language.

The tables had turned again - instead of humans learning machine’s language to query for answers it’s the machine who learned to understand questions posted with all ambiguity of the human language. With clever programming algorithms the computer was able to “understand” natural language query, and come up with a correct answer - most of the times, that is.

Does Watson use SQL to come up with the answer? The details of implementation is a closely guarded secret, at least for now. Given the limitations imposed by the Jeopardy rules, narrowly focused purpose and relatively modest computing power (around 2,000 CPU even though “connected in a very special way”- according to Dr. Christopher Welty, a member of the IBM artificial intelligence group, a far cry from 750,000 cores the IBM Mira super computer being built for DOE’s Argonne National Library), it is most probably did not use relational database to store data but rather relied on proprietary data structures and algorithms to search and retrieve the information. Eventually, these advances will make it into the mainstream database technology, and the way we transform data into information into knowledge will change, again. The future is near.

Update: IBM will incorporate Nuance CLU speech-recognition applications into the Watson supercomputer to provide information that assists doctors as they make diagnoses.