all bits considered data to information to knowledge


Big Data vs. Lots of Data

A short presentation intriguingly titled "Top Five Questions to Answer Before Starting on Big Data" caught my attention. There is a lot of noise around "Big Data" phenomenon already proclaimed to be The Next Big Thing. Quite a few folks disagreed, including Stephen Few of Perceptual Edge who published paper with a title "Big Data, Big Ruse" (pdf).

Don't get me wrong - I do believe that Big Data IS a big thing, and that its introduction will bring about a proverbial paradigm shift (another arguably over-used term of the last decade). Yet many people, while talking about Big Data, have a rather vague idea what it is, and many believe that is is equal to "Lots of Data" which underwent qualitative transformation a la Karl Marx ("Merely quantitative differences, beyond a certain point, pass into qualitative changes." --Karl Marx, Das Kapital , Vol. 1.)

Sorry to contradict some aficionados of dialectical materialism but.. it ain't so. Which is exactly the point of the slide #3 in the aforementioned deck.

The current incarnation of Big Data is mostly about machine-generated data. There might be lots of nuances and exceptions to this affirmation but humans simply cannot match machine's ability to generate data 24/7. True, lots of this data is generated in response to human activity (e.g. clickstreams) but even then it is enhanced with machine-generated information (e.g. date/time stamps, geocoding etc); a single tweet could generate additional kilobytes of contextual data which can enhance the semantic value of the tweet itself - to the business, not the tweeter, of course!... Say, was it tweeted from a mobile device or a laptop? which operating system? what browser/application? what time of day/night? geographical location? time elapsed between first syllable and the last? language used?  and so on and so on.

This is what Big Data is all about. And this is why the question on slide #3 - "Do you have Big Data problem or just Lots of Data problem?" comes right after "What do you need to know?" on slide 2.

9 out of 10 times people talking about Big Data are referring to the data locked in their enterprise database, documents and web pages; some of it might even include metadata. But the machine generated component - the proverbial 800 pound gorilla in the room - flies under the radar. The enterprise data - a domain of BI -  is but a tip of the iceberg which is the Big Data.



Prescription for Healthy Code

The following is a PDF version of the presentation I gave in October 2009 at an event organized by  Software Association of Oregon.  It outlines general principles of creating software quality culture for the development team, as well as lists specific examples of tools and processes available:

Prescription for Healthy Code

Here are an absolute minumum without which any software development effort  becomes amateurish:

  1. Thou shall not develop without version control
  2. Thou shall not develop without issue tracking system
  3. Thou shall perform code and design reviews
  4. Thou shall use patterns and frameworks

These apply to professional software development regardless of methodology, technology and acquired tastes. As highly recommended come these (in no particular order):

  • Unit testing
  • Coding standards
  • Continuous integration
  • Automated testing (functional, integration etc)
  • Developer documentation compiler
  • Coverage analysis
  • Refactoring tools/frameworks


Introduction into Test-Driven Development
TDD in C# with NUnit
Best practices for test-driven development [examples in Java]