all bits considered data to information to knowledge

9Feb/180

The Tao of the Software Architect

Ever since Tao of Physics by Fritjof Capra came out more than 30 years ago the awareness of unified

http://www.ibm.com/developerworks/rational/library/4032.html

Filed under: Uncategorized No Comments
20Mar/140

Just say NO to data moochers!

The other day I stopped by a Great Clips salon to get a haircut. I was greeted with "Hi! What's your phone number?".  Then the following dialog ensued:

- Well, that's a bit personal, don't you think?

- I need it to enter into the computer! - the lady looked a bit pensive.

- I could give you my name. It's Alex  - I didn't want to cause any trouble for her, just wanted a haircut.

- Is this the name you're usually using here? I can't find you in the computer! - she sounded annoyed.

That was it for me. I muttered my thanks, and left the premises with a firm intention to boycott Great Clips from now on.

I got my haircut from a friendly neighborhood salon down the road - no questions asked.

Everybody tracks everybody nowadays. The loyalty cards, online cookies, single sign-on apps seems to be proliferating with a speed of electrical current. And I get it - Facebook collects information in exchange for providing me with a valuable service, Safeway collects my information in exchange for giving a discount etc. All this is spelled out upfront, with clear understanding of what this transaction brings to both parties. But why would Great Clips expect me to share my personal information with them for free?

On the other hand, I am wondering whether there is such thing as "data addiction", and if so - what are the health implications for the company that got into the habit? After all, a mix of data and predictive models can be, well, unpredictable.

Gaining insight from the data is great... if both data and the assumptions and the predictive models are correct. And this is a big IF.

9Sep/130

Data Scientists or… Psychohistorians?

Before the Big Data, social Data Science/Data Mining and Machine Learning there was … Psychohistory!

The concept was introduced in 1951 by Isaac Asimov in his monumental Sci-Fi trilogy “ The Foundation”, and is very closely correlated with this “new” phenomenon of statistical modeling of the social interactions.

Proof? The definition from Encyclopedia Galactica quoted at the beginning of the 4th Chapter of The Foundation Trilogy:

Gaal Dornick, using non-mathematical concepts, has defined psychohistory to be that branch of mathematics which deals with reaction of human conglomerates to fixed social and economic stimuli …

… Implicit in all these definitions is the assumption that the human conglomerate being dealt with is sufficiently large for valid statistical treatment. The necessary size of such conglomerate may be determined by Seldon’s First Theorem which… A further necessary assumption is that the human conglomerate be itself unaware of psychohistoric analysis in order for its reactions to be truly random…

The basis of all valid psychohistory lies in the development of the Seldon Functions which exhibit properties congruent to these of such social and economic forces as …”

 

Asimov correctly points out the boundary conditions  of this statistical analysis – for this to work the society must be unaware of the analysis taking place and/or how it works as this would skew the distribution curve. After all, if the people stop clicking on these links and like-me-buttons, and stop sharing their information  (or worse – start feeding in some garbage data) all these sophisticated models would go haywire.

To continue analogy, the "Mule" character represents the "Black Swan" event that invalidates the entire premise based on normal distribution.

12Jul/130

Supersede vs Supercede: a humble proposition

The Merriam Webster authoritatively informs that "supersede" is the only correct spelling, and "Supercede has occurred as a spelling variant of supersede since the 17th century, and it is common in current published writing. It continues, however, to be widely regarded as an error." Fair enough.

I merely propose to adopt "superCede" word with a new semantic load...

'Cedere" means "to yield to, give way for" which could lead to

SUPERCEDE meaning " to give more than asked for"

e.g.

"I supercede the power!"     "I most gladly supercede my responsibility" 🙂

20Jun/130

Finding your natural habitat @work: It’s what you do in your free time that counts

When comes to computer programming there are two broad categories of companies - these that produce IT products, and those that consume IT  products(with countless variations in-between).

The skills set for each is somewhat similar yet there's enough difference to apply different criteria to interviewing candidates:

  • A company that lives and breathes technology will be looking for candidates with similar traits. 
  • A company which primary business is anything but - building materials, food, clothes - will be looking for a person with primary interest in business complemented by IT savvy.

So, a question - what do you do with your free time? - helps to clarify your "ideal environment", your natural working habitat.

Additional questions might be - what is your primary sources of information? what site do you open first thing in the morning? Is it Wall Street Journal or SlashDot?

8Apr/130

New Meaning of “Investing in Your Health”: an idea for Health Insurance Exchange

As states race to implement Health Insurance Exchanges mandated under Affordable Care Act, I wonder whether they have chosen a wrong model - that of an overseer, an information provider and a mediator... Maybe we could have borrowed a paradigm from Stock Exchange market?
Some of the major hurdles facing Health Insurance Exchange implementation include
  • inherent complexity of the endeavor
  • insufficient experience on the state part in operating exchanges (as opposed to financial industry)
  • need to attract sufficient number of participants to become efficient and self-sustaining
It is a common pattern in software engineering to deal with complexity by introducing an abstraction layer,
and financial industry did just that with the concept of Exchange-Traded Funds and Mutual Funds to ease complexity of picking individual stocks. I believe that Health Insurance Exchange might have much more in common with exchange than insurance, and that the very same concepts are applicable here.
Imagine health insurance pools structured in a way similar to that of mutual funds/ETF according to some predefined criteria, and designed to cater to a certain category of consumer (again, analogy of industry sector funds). It is then sold as units of insurance to consumer through the exchange.
The role of the Exchange operators would be that of mutual fund managers:
  • design portfolios of insurance plans and sell units of insurance to consumer (after proper validation and categorization)
  • handle fund-to-fund exchange (when situation of the customer changes)
  • process refunds and assess charges
  • provide apples-to-apples comparison
  • etc.
The insurer and the insured will be decoupled: the former will roll out insurance plans, and the latter will buy as much insurance or as little as they need, and the State's Health Insurance Exchange would provide platforms for "health insurance pool" comparison and rating, and handle financial transfers (including state/fed subsidy portions)..
In a commercial twist to the idea: since the funds have an expiration date, the insurers might even pay dividends to the units holders based upon un-used portion of the plan (insert actuarial voodoo here :).
There might be even a secondary market where investors might buy funds from customers (original units buyers) if they can reasonably expect positive return due to under-utilization of the plan.
7Mar/130

The fine line between “Big Data” and “Big Brother”

The was never a lack of desire to collect as much data as possible on the part of business or governments; it was capabilities that always got in the way. With advent of "Big Data" technology the barrier had just been lowered.

Monitoring employees interactions in minute detail to analyze patterns, and get ideas on productivity improvements is not illegal per se...but it takes us one step further towards this proverbial slippery slope.

The recent article in The Wall Street Journal by Rachel Emma Silverman highlights the indisputable advantages but somehow glosses over the potential dangers:

As Big Data becomes a fixture of office life, companies are turning to tracking devices to gather real-time information on how teams of employees work and interact. Sensors, worn on lanyards or placed on office furniture, record how often staffers get up from their desks, consult other teams and hold meetings.

Businesses say the data offer otherwise hard-to-glean insights about how workers do their jobs, and are using the information to make changes large and small, ranging from the timing of coffee breaks to how work groups are composed, to spur collaboration and productivity.

 

[06.17.2013] Here's a blog post addressing the very same issues by Michael Walker, with benefit of hindsight after revelations on PRISM surveillance program: http://www.datasciencecentral.com/profiles/blogs/privacy-vs-security-and-data-science

22Feb/130

Diverging realities

An ability to filter information based upon one’s personal preferences and tastes has never been greater even as variety of the available information explodes. Therein lays a problem - rarely if ever do we choose to be exposed to something that we do not want to hear, view or read, and as a result we are surrounded with “yes-information” (by analogy with “yes-man” definition from Merriam-Webster).

Some of these filtering criteria are based on our own conscious choices, some - on insights into our subconscious preferences gathered from a social media trail… This ultimately leads to stratification of the society into “interest groups”, each being fed a different information diet, each assembling its own version of reality from the information that gets through the filters.

This “freedom of association” is not a new phenomenon, to be sure; it’s the emerging totality of it that might be transmogrifying one's freedom into a self-imposed exile.

P.S. A recent article in The Wall Street Journal by Evgeni Morozov  highlights other dangers of so-called "smart gadgets" - surrendering our ability to make mistakes. The smart gadgets are giving a new twist to the social engineering attempts as "a number of thinkers in Silicon Valley see these technologies as a way not just to give consumers new products that they want but to push them to behave better."  Of course, this implies that this "better" is a well understood universal concept; yet it has been proven - times and again - that the road to hell is paved with good intentions...

18Feb/130

Fighting back with data!

A highly publicized fight between  NY Times John Broder and Tesla motors CEO Elon Musk is destined to become a case study in how data became not only "corporate asset" but a weapon of choice in protecting firm.

After a test-driving Tesla S car, the NY Times journalist John Broder published a review 'Stalled Out on Tesla’s Electric Highway" claiming that the car's poor performance under cold weather condition lead to it being towed before it reached its destination. Instead of issuing usual "confidence in quality of our product" and promises to "thoroughly review the incident" the Tesla Motors CEO fought back with data, first calling the Broder's review a fake on Twitter, and then supporting his claim with extensive data in form of graphs and tables in the article "The most peculiar test drive" posted on the corporate blog on February 13, 2013

Turns out that Tesla Motors had established a comprehensive data collection and analysis program:  distance, time spent at charging stations, the speed and geographical location of every vehicle used by the media at any given moment - tons and tons of machine generated data - were recorded and stored in the Tesla's corporate database. The data clearly shows that the car never "run out of battery", and makes several other claims made by John Broder dubious at best.

P.S. Mr. Broder since posted a response but his "recollections", "memorized facts" and hand-written notes look terribly suspicious and woefully inadequate against cold hard numbers presented by Elon Musk

16Feb/130

An excellent bit of advice from the trenches: Startup DNA

Yevgeniy Brikman, staff engineer at LinkedIN who had "front row seats at very successful start-ups" (LinkedIN, TripAdvisor) shares his observations and insights in a few (well, 106) slides here (http://www.slideshare.net/brikis98/startup-dna)

A very interesting insight into minimizing "trial and error" cycle with dynamic languages (#35 ) and development methodologies/framework (#37) :  shortening the feedback loop allows for earlier maturity. Or, as he puts it quoting Jeff Atwood (of StackOverflow fame), " Speed of iteration beats quality of iteration".  Of course, without discipline and support of well defined processes and frameworks the speed alone could be a runaway train-wreck 🙂

Second observation that struck a chord with me on slide #50: "If you cannot measure it, you cannot fix it" .  [NB: Of course, this has a long history (and even longer attribution list) of being said at different times in various contexts by Lord Kelvin, Bill Hewlett, Tom Peters and Peter Drucker ]. The advice "Measure Everything" should not be taken too far, though:

"Not everything that counts can be counted, and not everything that can be counted counts. "  Albert Einstein

but in the context of the presentation Evgeniy's advice ought to be taken to the heart: collect server metrics, database metrics, client side metrics, profile metrics, activity metrics, bug metrics, build metrics, test metrics etc!

And the last (but not least) observation on sharing is arguably the most important one (slides #93-102).

"The best way to learn is to teach" -

Frank Oppenheimer