all bits considered data to information to knowledge


Diverging realities

An ability to filter information based upon one’s personal preferences and tastes has never been greater even as variety of the available information explodes. Therein lays a problem - rarely if ever do we choose to be exposed to something that we do not want to hear, view or read, and as a result we are surrounded with “yes-information” (by analogy with “yes-man” definition from Merriam-Webster).

Some of these filtering criteria are based on our own conscious choices, some - on insights into our subconscious preferences gathered from a social media trail… This ultimately leads to stratification of the society into “interest groups”, each being fed a different information diet, each assembling its own version of reality from the information that gets through the filters.

This “freedom of association” is not a new phenomenon, to be sure; it’s the emerging totality of it that might be transmogrifying one's freedom into a self-imposed exile.

P.S. A recent article in The Wall Street Journal by Evgeni Morozov  highlights other dangers of so-called "smart gadgets" - surrendering our ability to make mistakes. The smart gadgets are giving a new twist to the social engineering attempts as "a number of thinkers in Silicon Valley see these technologies as a way not just to give consumers new products that they want but to push them to behave better."  Of course, this implies that this "better" is a well understood universal concept; yet it has been proven - times and again - that the road to hell is paved with good intentions...


Just-in-Time vs. Just-in-Case Information

The term Just-in-Time information has been around for quite awhile... It has been used in various contexts - ad-hoc BI dashboards, personal and corporate time management technique, career/life organizing principles and so on; I am rather sure that there will be more domains where JIT Information will enter in the future as both data and information become increasingly liberated.

I blogged about evolution of our information consumption patterns following my observation on how my son interacts with the outside world - smartphone, laptop, facebook, twitter, google+, youtube - endless stream of seemingly superfluous data in constantly changing contexts... seems like a perfect recipe for chaos. Yet somehow they manage to stay on-track, graduate from schools, and go about their lives. I maintain that ubiquitous easily accessible data will be the main driver of the next evolutionary cycle, and, as Yogi Berra used to say, the future is not what it used to be...

This leads me to an observation in a more predictable and controlled environment - that of databases, relational and otherwise. I can't help but notice uncanny parallels in evolution of the electronic data storage and retrieval systems with that of paper-based storage and retrieval systems (yes, a fancy name for the ordinary "book"). A database, or - more specifically - data warehouse, came into existence when data was scarce, and data access was slow, expensive and unreliable; most of the data stored in a given data warehouse stays dormant for years, rarely - if ever - accessed; I would dub this model "Just-in-Case" information.

As these challenges get addressed there is a shift underway from centralized data warehouse to federated to ad-hoc models. The Raw Data movement has already started - in many scenarios the no need to hoard data, all one needs to know is where to find relevant data (one can imagine ever higher level of hierarchies - directories of directories of directories... meta-meta-meta+n data ).

Proliferation of NoSQL databases, the concept of "Big Data" - are all part of this shift towards "Just-in-Time" information, data freed from the shackles of  schemas and structure... A piece of advice - cast thy data upon the waters: for thou shalt find it after many days   (adapted after Ecclesiastes 11:1  🙂



A glimpse of future: Just-In-Time Information

Ever-shrinking attention span of the younger generation gets quite a bit of attention (pun intended) from the researchers and educators (e.g. "How Social Media Is Ruining Our Lives" - over the course of the last ten years the average attention span has dropped from 12 minutes to a staggeringly short 5 minutes )

Yet I wonder. Maybe we do not need long attention span in the era of informational deluge pouring through smartphones, tablets and laptops? The pervasive nature of internet is changing the way we collect and process information. No longer do we need to own information, we only need to know where/how to find it, and how to connect it with other bits we've already found.

Memorizing information was the staple of a rote learning for centuries - people traveled to read a copy of the book in particular library or listen to particular lecture; movable type and audio/video recording changed this - books/records/movies become more readily available, in a library or purchased from a bookstore. As time passed, books became ever more affordable - but they still were self contained: the information in a book/magazine/movie was distilled and structured to provide all the components needed. With the advent of Internet and electronic media this began to change - it became possible to transform raw data into information just in time. And the premium is not on ownership but on speed of finding and processing the data, ability to evaluate and integrate it on-the-fly, and - what's the word- the critical thinking.


Data Math

The motto of this site is: "Data to Information to Knowledge". I'd like to think that it was I who came up with this though I did not dig around to prove it 🙂

Regardless of the origin, it does capture an important relationship:



How does context transforms data into information? Consider the following example.

Think of today's date in history  - May 11, 2011.

Written as 5112011 it is just a number.  Perhaps, annual compensation of a company's CEO? Number of atoms in 848869.25212257286350480987421015 E-17 of molar units?  If we interpret it as, say, a date then it might be the anniversary of the "day in 1934, (when) a massive storm sends millions of tons of topsoil flying from across the parched Great Plains region of the United States as far east as New York, Boston and Atlanta.", according to the

The context gives the data - a number, in my case - its meaning, and  transforms it into Information.

The analysis of the information takes it a step further - to the Knowledge. If I am dealing with the CEO's compensation t I could use it in charting out my investment strategy: is it time to go long on the company? to short the stock?

If I am a chemist, I might think of working with more manageable volumes of a particular compound; and if I am a farmer I might ponder the consequences of sustainable agriculture. This is knowledge, a human - possibly, uniquely human - trait.

While carrying vaguely theoretical overtones, these formulas have very practical implications to data architecture and systems engineering providing foundation for designing and constructing Enterprise Data Models, Enterprise Dashboards and ETL architectures.

And, the last but not least, keep this quote - by Albert Einstein - in mind:

Imagination is more important than knowledge


Entropy of content

The information out there becomes ever more fragmented, and ever less coherent. Arguably,  this could be a sign of "do-it-yourself" democratization of the information itself - no longer will the high priests of information be shaping the data to feed the unwashed masses (it’s might not have been an accident that no new “sacred texts” came into existence since the beginning of the last century… ) Rather, the masses themselves are now free to mix’n’match bits and pieces of data in any way they please. The positive aspect of the democratization is that access to it improved dramatically; the flip side of reducing information to the basic “elementary particles” is that the quality - the coherence of the data - went down just as dramatically.

Consider the following analogy: all music out there can be represented with the same seven notes, and their arrangement could produce Mozart’s symphony or Jingle Bells tune or - depending on the composer’s abilities - just a noise. The Internet brough about a new twist - now you don’t have to buy album of your favorite band, you can only buy a single tune; you could buy a book by chapter, or choose a cut out of famous work of art for a poster.  The rules of packaged deal where author alone decided the structure of his or her masterpiece - the chapters or songs sequence, the arrangement of elements, their colors and shades - is no longer apply with its implicit “not labeled for individual sale” label removed forever.

With so much data out there sloshing around the world 24/7 the question is why do people still pay for information, buying books and paintings?  The answer is the same as it was millennia ago: the ability to tell a good story, to transform raw ingredients into a delightful dish, is a talent which many do not have, and are willing to pay for.