all bits considered data to information to knowledge

16May/130

Big Data vs. Lots of Data

A short presentation intriguingly titled "Top Five Questions to Answer Before Starting on Big Data" caught my attention. There is a lot of noise around "Big Data" phenomenon already proclaimed to be The Next Big Thing. Quite a few folks disagreed, including Stephen Few of Perceptual Edge who published paper with a title "Big Data, Big Ruse" (pdf).

Don't get me wrong - I do believe that Big Data IS a big thing, and that its introduction will bring about a proverbial paradigm shift (another arguably over-used term of the last decade). Yet many people, while talking about Big Data, have a rather vague idea what it is, and many believe that is is equal to "Lots of Data" which underwent qualitative transformation a la Karl Marx ("Merely quantitative differences, beyond a certain point, pass into qualitative changes." --Karl Marx, Das Kapital , Vol. 1.)

Sorry to contradict some aficionados of dialectical materialism but.. it ain't so. Which is exactly the point of the slide #3 in the aforementioned deck.

The current incarnation of Big Data is mostly about machine-generated data. There might be lots of nuances and exceptions to this affirmation but humans simply cannot match machine's ability to generate data 24/7. True, lots of this data is generated in response to human activity (e.g. clickstreams) but even then it is enhanced with machine-generated information (e.g. date/time stamps, geocoding etc); a single tweet could generate additional kilobytes of contextual data which can enhance the semantic value of the tweet itself - to the business, not the tweeter, of course!... Say, was it tweeted from a mobile device or a laptop? which operating system? what browser/application? what time of day/night? geographical location? time elapsed between first syllable and the last? language used?  and so on and so on.

This is what Big Data is all about. And this is why the question on slide #3 - "Do you have Big Data problem or just Lots of Data problem?" comes right after "What do you need to know?" on slide 2.

9 out of 10 times people talking about Big Data are referring to the data locked in their enterprise database, documents and web pages; some of it might even include metadata. But the machine generated component - the proverbial 800 pound gorilla in the room - flies under the radar. The enterprise data - a domain of BI -  is but a tip of the iceberg which is the Big Data.

 

23Jan/130

OBIEE – what’s in a name?

The unwieldy acronym OBIEE stands for Oracle Business Intelligence Enterprise Edition.

The offering is a loosely coupled assembly of a dozen plus components (eight – by some other counts) both acquired and homegrown. Its beginnings go back 12 years ago to nQuire product which first became Siebel Analytics only to be reborn as OBIEE after Oracle's acquisition of Siebel in 2005 and then Hyperion in 2007. The story does not end here as Oracle continues its acquisition spree with the recent (2012) purchase of Endeca for its e-Commerce search and analytics capabilities.

The current intermediate result is a solid contender for the Enterprise BI Platform, firmly placed at the top-right of Gartner's Magic Quadrant along with Microstrategy, Microsoft, IBM, SAP and SAS.

Oracle's page for Oracle Business Intelligence Enterprise Edition 11g summarizes the suite's functionality in following terms (direct quote, with claims about “cost reduction” and “ease of implementation” left TBD)

• Provides a common infrastructure for producing and delivering enterprise reports, scorecards, dashboards, ad-hoc analysis, and OLAP analysis
• Includes rich visualization, interactive dashboards, a vast range of animated charting options, OLAP-style interactions and innovative search, and actionable collaboration capabilities to increase user adoption

And – by and large - it does deliver on the promises.

One of the important features for the enterprise is integration with Microsoft Office (Word, Excel and PowerPoint). What Oracle has dubbed as “Spacial Intelligence via Map Based Visualization” represents a decent integration of mapping capabilities (not quite ESRI ArcGIS but a nice bundled option nevertheless – and no third party components!)

Among other things to consider is tighter integration with Oracle's ERP/CRM ecosystems (no surprises here as every vendor sooner or later tries to be everything for everybody), and for the organizations with significant Oracle presence this would be an important selling point.

Being redesigned with SOA principles in mind, OBIEE yields itself nicely to integration into SOA- compliant infrastructure. Most organizations choose Oracle Fusion Middleware for the task due to more coherence with OBIEE and the rest of Oracle's stack; but it is by no means a requirement– it can be run with any SOA infrastructures, including open source ones.

For mobile BI capabilities, OBIEE offers Oracle Business Intelligence Mobile (for OBIEE 11g), currently only for Apple's devices – iPad and iPhone – downloadable from Apple iTunes App store. Most features of the OBIEE available in the corporate environment are supported on mobile devices, including geo spacial data integration.

NB: Predictive modeling and data mining are not part of OBIEE per se (it cannot even access data mining functions built into Oracle dialect of SQL!) but they could be surfaced through it. Oracle Advanced Analytics platform represents Oracle's offering in this market.

OBIEE ranks second from the bottom in difficulty of implementation (SAS holding the current record); coupled with a relative dearth of expertise on the market and below-average customer support, this should be considered in evaluation of the OBIEE for adoption in the enterprise.

One interesting twist in OBIEE story is Oracle's introduction of Exalytics In-Memory Machine in 2011 – an appliance that integrates OBIEE with some other components such as Oracle Essbase and Oracle TimesTen in-memory database. The appliance trend resurrects the idea of a self-contained system in a new context of interconnected world, and Oracle fully embraces it with the array of products such as Exadata, Exalogic and now – Exalytics. By virtue of coming fully integrated and preconfigured it supposedly addresses the difficulties of installation and integration – at a price; this is designed to be a turn-key solution for an enterprise but its full impact (and validity of the claim) remains to be seen.

So, to sum it up:

Pro:

It is a solid enterprise class BI platform with all standard features of a robust BI – reports, scorecards, dashboards (interactive and otherwise), OLAP capabilities, mobile apps,
integration with Microsoft Office, SOA compliant architecture. It also includes pre-defined analytics applications for horizontal business processes (e.g. finance, procurement, sales) as well as additional vertical analytical models for the industries (to help to establish common data model)

Contra:

It is evolving through acquisitions and integration thereof which affects coherence and completeness of vision; no integrated predictive modeling and data mining capabilities,
ranks rather low on ease of deployment and use as well as on quality of support; rather shallow (and therefore expensive) talent pool; with all being factored in, the TCO could
potentially be higher than comparable offerings from other vendors.

 

29Nov/120

Wisdom, the final frontier!

Rob Addy, bloggin' for Gartner:

Analytical prowess will be the battleground for service providers in 2013 and beyond. Are you ready to take the statistical fight to your competitors or will you be on the back foot when the time to run the numbers comes?  Ascending the knowledge pyramid from noise and misinformation to achieve wisdom is not easy.

How exactly the knowledge gets transformed into wisdom is beyond me... the qualitative quantum leaps occur at every step of the pyramid climbing: a rock might be a very large grain of sand but a planet is much more than a giant rock.

15Jul/120

Retina tracking BI: consent is not required

Business Intelligence quickly moves past analyzing our conscious responses... it is after what you really think - not what you report thinking.  Forget polls and questionnaires - the best data is collected from the subjects not even aware of the process. Take the heat-map generated by Unilever analyzing shoppers eye movements... or compare designs produced in0house with these vetted by consumers subconsciously preferring one shape over another...

No consent required - the very fact that you stepped into the store (or visited an online site)  implies that the retailer is free to track your every movement, or ambush you with colors, music or odors designed to induce specific behavior.  I could easily imagine a system tracking reaction on images of politicians, say, Romney and Obama; I bet it would be a much better indication of voting patterns than old-fashioned door-to-door pollsters - and also would open a giant can of worms on invasion of privacy issues...

Where  do we draw the line in legitimate use of data?

Tagged as: , , No Comments
29Jun/120

I know what you read last summer… And I know what you’re reading now

With proliferation of electronic reading devices we surrender many personal liberties we've taken for granted for so long: now it is possible not only to find what and when you bought a book but also whether you've read it, for how long, on what days of week, at what time, what drew your attention... As Wall Street Journal's article puts it "Your E-Book is reading You".

Convenience comes with many strings attached though. What would electronic equivalent of Bradbury's Farenheit 451 look like? The entire messy business of replacing hard-copy of newspapers and books detailed in Orwell's 1984 went away replaced by infinitely malleable bits and bytes. Nobody misses developing films - what about times when a photographic negative was a considered an irrefutable proof?

Personal reading experience becomes a raw material for data analysis, and I, for one, am rather uneasy with this brave new world. This adds to yet another piece of puzzle for constructing your personality on social networks, where people and organizations with BI savvy are mining your personal experiences in hopes to sell you ever more stuff (e.g. How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did) - or for some other, not always as benign, reasons.

 

3Jun/120

Ethical limits of Business Intelligence

Intelligence of all kinds can be gleaned from the mounds of data accumulated from our daily interactions with the outside world such as business intelligence or social intelligence. It then can be used to manipulate our behavior to the benefit of the data collector/analyst.

Here is, for example,  how IKEA and Costco utilize information "to turn browsers into buyers, and making buyers to spend more". A new layout of the store floor or combination of sounds/lights/olfactory stimuli to put us in "buying mode", targeted advertising, mass customization based upon data collected from purchasing history, Facebook, LinkedIn, Google+... For example:

"In research yet to be published, a University of Alberta team has proven that what we smell and hear affects what we buy: When a sample group smelled the relaxing scent of lavender, 77% wanted a soothing iced tea, but when the same group smelled the arousing aroma of grapefruit, 70% reached for an energy drink. When the researchers played Mozart’s Sonata in D Major at a slow tempo, 71% wanted iced tea, but when the piano piece was sped up, 71% wanted an energy drink — an exact reversal."

Where does "legitimate use" stop and "Brave New World"/"1984" take over?

Where is this limit after which these "insights into consumers" behavior become invasion of privacy?

 

27Dec/110

Getting started with Oracle BI: a virtual experience – Part IV

(continued from Part III)

Step 6

Once the VirtualBox app informs you that the process completed successfully the imported VM will show up in the left pane with “Powered Off” label  as shown on Figure 1.

You’re almost ready to start the machine; there is one more task that needs to be completed before you could launch your OBIEE sample - make sure that hardware virtualization support (VT-x for Intel platforms, and AMD-V for AMD based machines) is enabled. The setting is in your system’s BIOS, normally under Security menu (here is a link to an article explaining how to access BIOS settings on a computer )

Step 7

After you’ve enabled the VT-x setting, save it and exit the BIOS; allow the machine to boot up. Start up VirtualBox and select the imported VM image.

Before you start the image you need make sure that your system’s settings are within recommended optimum. Click on Settings button, and then on System menu option as shown on the Fig.2

Make sure that Base Memory is in the green area of the ruler (click and drag the central marker to adjust settings). While there might be temptation to increase memory allocated to VM (with the idea that it might speed things up), allocating too much might crash the system; keep in mind that your Windows and VM Linux+Oracle Applications will be competing for the same RAM.

Click OK to exit the screen

Step 8.

With SampleApp_V107 entry selected (see Figure 1), click Start button on toolbar (alternatively you may select Start option from right-click pop-up menu).  The VirtualBox will load VM image containing Oracle Enterprise Linux system which might take some time (20 min for my machine), as shown on Figures 3 and 4

Along the way you might see several pop-up messages informing you that you have “Auto capture keyboard turned on” or that “Host OS does not support mouse pointer integration” as shown on the pictures below -  click OK each time, you may also check the “Do not show this message again” box at the bottom of each pop-up message (Figures 5 and 6)

Finally, the VM will be loaded and you will be prompted to enter your user name to log onto system (Figure 3). The user name is “oracle”, password is “oracle” (both lower case; press Enter after typing in eachoracle).  NB: You could also selects a different the default language for the system by clicking on Languages option at the bottom of the screen (see Figure7)

Upon login the desktop (GNOME) would look similar to the one shown on Fig. 8

(continued in Part V of  Getting started with Oracle BI:  a virtual experience)

 

 

 

 

21Dec/110

Getting started with Oracle BI: a virtual experience – Part III

(continued from Part II)

Step 5

You are ready to assemble the VMDK files into a working virtual machine. Make sure that VMDK and OVF files are all in the same directory (Figure 1)

 

Start up VirtualBox application (Figure 1; disregard already imported appliance). The virtual appliance will be created in the default directory - be sure to set up the directory that has enough free space to accommodate files, logs etc.(virtual size for the running appliance will increase the size of VMDK file by ~30%; you need all free space on the hard-drive you can get!)

 

By default, on Windows machines, the directory will be located on C:\ drive; to change it, go to File > Preference… option. The Default Machine Folder will be under tab [General] - select [Other…] choice from the drop-down box as shown on Figure 2

From the File menu select  [Import Appliance…] option. The “Appliance Import Wizard” screen would appear. Click on [Choose…] button to navigate to the directory where the [Sampleapp_v107_GA.ovf] file is located, and select it.

The next screen will present the summary of the virtual appliance settings including location of the .VMDK files (they have to be in the same directory where .OVF file is).

Click [Import] button to start the process which can take up to several hours - depending on the computer’s caharacteristics.

(continued in Part IV of  Getting started with Oracle BI:  a virtual experience)

 

21Dec/110

Getting started with Oracle BI: a virtual experience – Part II

(continued from Part I)

Step 3.

Download OBIEE 11.1.1.5 Sample application (V107) from the Oracle’s site. The download includes deployment guide, VirtualBox VB Image Key - the deployment descriptor which is needed to convert VMDK image disk file into a working virtual machine, and the VMDK files themselves; the downloads descriptor on the Oracle site is shown on Figure 1.

The VMDK files are hosted at FTP server and you could use an FTP client of your choice (default port 21, user “robic1”, password “1pertg9edq”), or use your browser’s FTP capabilities. I went with the latter option; Figure 2 shows the directory structure for the FTP site:

The number of archives available for download was a bit puzzling, and for some reason- in my experience - the downloaded files were invariably corrupted upon assembly; downloading straight VMDK  files from the Unzipped_Version directory worked for me.

(NB: verfying CHECKSUM - see file [checksum.md5] on Fig.2 - would provide reasonable assurance that the files were not tampered with; for instance, you could use FastSum free utility for this)

As part of the download, click on VB Image Key (.ovf) link shown on the Figure 1; both the OVF key file and four .VMDK files must be in the same directory.

The downloads take approximately 25GB of space. Make sure that you have plenty of space for the download and installation.

(continued in Part III of  Getting started with Oracle BI:  a virtual experience)

27Jul/110

Who mines the miners?

Organizations like to keep their cards close to the chest.  For a long time BI/analytics was all in-house affair: tools, skills and - especially! - data. The shift towards distributed computing models such SaaS and PaaS change everything.

The data needed for analysis might not be owned by the company; it might live - virtually - anywhere: public domain, subscription service, social networks such as Facebook, geographical data from Google Maps or Microsoft Earth. This is the secret ingredient for the analysis, and just as every true secret it hides in plain sight.

SAP has announced that its flagship analytics BI - Business Objects 4.1 - will have even tighter integration with Google Maps API, going beyond location services…

One can’t help but wonder  what data Google gets to keep for its own analytic endeavors as it tracks each call to its services.  Could it be that the corporate secrets are leaking out through usage patterns?