How clean do you want your data?
Yet another presentation by an RDBMS vendor leaves me scratching my head... Master Data Management is the buzz word de jour , and everybody has "just what you need"; naturally, I am sitting through all the presentations trying not to choke on chaff.
"Clean, consistent, and accurate data" - how many semantic overlaps do we have here? Does "clean" implies "consistent"? or maybe just "accurate"? If so, how do you measure cleanliness? degrees of consistency?
Wikipedia article on "Data Cleansing" makes a distinction between "data cleaning" and "data validation" (which it should) but for all the wrong reasons - "validation almost invariably means data is rejected from the system at entry and is performed at entry time, rather than on batches of data." This would be news for the thousands of customers of SAS and Informatica struggling to validate (and re-validate) their legacy systems choke full of dirty data. Rather the distinction should be made on context - the Wikipedia article surreptitiously switches the context from database to software development; no mater how much validation the gatekeepers might apply the sad fact is that once inside the data can (and often does) become invalid.
Unless there are measurable indicators quantifying these attributes of the data, the lofty goals of clean, accurate and consistent data are just a wishful thinking. A project manager on a Data Quality project must get all the stakeholders on the same page regarding what exactly each means by "clean data", and establish measures to gauge the progress and acceptance criteria.
Quality Attributes: either binary or quantifiable
Architectural document that reads like a promotional materials always makes me wonder... What exactly does the author mean by "easy to maintain"? Does adjusting mere 20+ configuration files on several machines in a cluster, and umpteen start up parameters for the dozens of processes qualify? How about going through several tabs with dozens of conflicting options on every instance? "An easy" is in the eye of the beholder, architects must do better than that.
In 1983, Tom DeMarco in his seminal work - Controlling Software Projects: Management, Measurement and Estimation - famously remarked: "You can’t control what you can't measure"
This rhymes with a similar sentiment expressed by Lord Kelvin almost a hundred years earlier in somewhat more convoluted form characteristic of the times:
"...I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be."
-Baron William Thomson Kelvin From 'Electrical Units of Measurement', a lecture delivered at the Institution of Civil Engineers, London (3 May 1883), Popular Lectures and Addresses (1889), Vol. 1, 73. Quoted in American Association for the Advancement of Science, Science (Jan-Jun 1892), 19, 127.This correlates strongly with system quality attributes used to control system architecture development process. Any quality attribute to which the architecture must conform has either to be measurable (numbers!) or be binary - yes/no - in nature, relative terms just would not cut it (easier? more flexible? robust-ier?). If you state that the system is "adaptable" - it would mean that it is designed to accommodate some anticipated changes; it does not mean that it can swallow any change that comes after the system was designed; the value of such an attribute could only be relative as it provides no ways to quantify it. On the other hand, if "responsiveness" is the desired quality attribute then it has to be measurable (unless you are talking about Leibnitz’s monads, that's it ); milliseconds response time per user per concurrent users might be one example...
There unquantifiable adjectives and/or adverbs used in quality attributes must be kept to an absolute minimum, and be qualified (e.g. "adaptable within current technological paradigm")
The Single Version of Truth by Fiat
The one version of truth is an elusive goal of many corporate data initiatives. Getting people from all over the enterprise to agree on a single definition of anything - be it attributes of Customer or of a Product - can be a daunting task.
Jeanne Ross, Director of MIT’s Center for Information Systems Research, has a simple solution she had presentede last month on TechTomorrow conference - just declare it.
The Single Version of Truth by fiat can be a proverbial line in the sand, or, better yet, Archimedes’ fulcrum… but it certainly ends paralysis. Once the truth is declared the project can move forward with “good enough” set of data; instead of debating merits of this or that data sets, and bemoaning its imperfections the focus shifts towards “How can we make it better?”
The ultimate goal of the single version of truth is a business one - not IT, and should be treated as such. The “good enough” criteria should be set by the business, and any further cleansing and conformance processes should be also viewed in terms of business value.
Getting started with Oracle BI: a virtual experience – Part VI
Step 10
You are ready to access OBIEE web interface. Launch Firefox browser either from desktop icon, or from the top toolbar; its home page is set to the Oracle Business Intelligence login page (http://localhost:7001/analytics/saw.dll?bieehome&startPage=1)
Note that several bookmarks are set up (see Fig. 1):
- OBIEE (http://localhost:7001/analytics)
- WLS Console (http://localhost:7001/console)
- Enterprise Manager (EM) (http://localhost:7001/em)
- BI Composer (http://localhost:7001/analytics/bicomposer/faces/answersWizard.jspx)
NB: I was unable to launch BI Composer, ether due to configuration problems and/or Essbase services not running
Log into OBIEE (case sensitive)
User ID: Prodney Password: Admin123
This will sign you in as “Paulo Rodney”; other user ID(s) and passwords are set up within the system; the document seems to imply that BISAMPLE/BISAMPLE credentials pair is set up on the system - it does not work
OBIEE opens on General Index page which is packed with links to all the capabilities built into the application.
The following users are set up in the system:
Here is an example of a BI dashboard demo - one of the few supplied in this SampleApp_107 virtual machine
To log onto WebLogic Server console (WLS), click WLS console bookmark in the browser.
User Name: weblogic
Password: Admin123
Note: the same user id and password are used to log in onto Enterprise Management (EM) console
Here is an example of the WLS administrative console
You can access Oracle Enterprise Manager (EM) in the same way; with the same user ID and password.
User Name: weblogic
Password: Admin123
And here's a sample of the EM administrative console
That's all. Hope this will spare you a few moments of aggrawation
Comments and suggestions are always welcome.
Feel free to explore the applications on your own; you can also shut down system by killing off the VM environment - or you could do it in an orderly fashion using Oracle's provided scripts (see Part VII of the post)
Getting started with Oracle BI: virtual experience – Part V
Step 9
Starting OBIEE application
Oracle warns about potential need to adjust networking settings by entering IP address assigned to the image (192.168.56.101) into the /etc/hosts file for demo.us.oracle.com entry. This was not required for my specific configuration, and there is good change it won’t be required for yours either. Please refer to the Oracle VM VirtualBox Image SampleApp v107 Deployment Guide for more information.
Oracle provides startup scripts to launch application, they are in the eponymous folder on the desktop. The Figure 1 presents contents of the folder (double-click on the folder icon will do the trick)
Start up OID services (Oracle Application Server) by double-clicking on the 1-startOID.sh file in the StartupScripts folder
Click Run in Terminal button. A new terminal window pops up; you need to wait until the window disappears plus additional 5-10 seconds before proceeding to the next startup script which starts Oracle WebLogic Server
The procedure for starting up WebLogic Server is almost identical - double-click on [2-startWLS.sh] file and select Run in Terminal button. The terminal window explodes with informational and warning messages (see figure 1) - be patient, it might last for quite a few minutes.
Proceed to the next step after you see at the bottom of the terminal window message informing you that “Server started in RUNNING mode” (see Figure 2).
Important: do not close this terminal window; leave it running (you can minimize it)
The next step is to launch [3-startBI.sh] script; follow the same steps as for the previous scripts. Wait until the terminal window for the BI processes will disappear automatically - which might take awhile (4 min in my case)
Oracle advises against starting Essbase services unless you can allocate 3MB to the VM alone; this pretty much precludes any 32 bit machine from running these services. This tutorial does not deploy [4-startESSB.sh]. Most samples, assures Oracle, will work without these services running.
(continued in Part VI of Getting started with Oracle BI: a virtual experience)
Getting started with Oracle BI: a virtual experience – Part IV
Step 6
Once the VirtualBox app informs you that the process completed successfully the imported VM will show up in the left pane with “Powered Off” label as shown on Figure 1.
You’re almost ready to start the machine; there is one more task that needs to be completed before you could launch your OBIEE sample - make sure that hardware virtualization support (VT-x for Intel platforms, and AMD-V for AMD based machines) is enabled. The setting is in your system’s BIOS, normally under Security menu (here is a link to an article explaining how to access BIOS settings on a computer )
Step 7
After you’ve enabled the VT-x setting, save it and exit the BIOS; allow the machine to boot up. Start up VirtualBox and select the imported VM image.
Before you start the image you need make sure that your system’s settings are within recommended optimum. Click on Settings button, and then on System menu option as shown on the Fig.2
Make sure that Base Memory is in the green area of the ruler (click and drag the central marker to adjust settings). While there might be temptation to increase memory allocated to VM (with the idea that it might speed things up), allocating too much might crash the system; keep in mind that your Windows and VM Linux+Oracle Applications will be competing for the same RAM.
Click OK to exit the screen
Step 8.
With SampleApp_V107 entry selected (see Figure 1), click Start button on toolbar (alternatively you may select Start option from right-click pop-up menu). The VirtualBox will load VM image containing Oracle Enterprise Linux system which might take some time (20 min for my machine), as shown on Figures 3 and 4
Along the way you might see several pop-up messages informing you that you have “Auto capture keyboard turned on” or that “Host OS does not support mouse pointer integration” as shown on the pictures below - click OK each time, you may also check the “Do not show this message again” box at the bottom of each pop-up message (Figures 5 and 6)
Finally, the VM will be loaded and you will be prompted to enter your user name to log onto system (Figure 3). The user name is “oracle”, password is “oracle” (both lower case; press Enter after typing in eachoracle). NB: You could also selects a different the default language for the system by clicking on Languages option at the bottom of the screen (see Figure7)
Upon login the desktop (GNOME) would look similar to the one shown on Fig. 8
(continued in Part V of Getting started with Oracle BI: a virtual experience)
Getting started with Oracle BI: a virtual experience – Part III
Step 5
You are ready to assemble the VMDK files into a working virtual machine. Make sure that VMDK and OVF files are all in the same directory (Figure 1)
Start up VirtualBox application (Figure 1; disregard already imported appliance). The virtual appliance will be created in the default directory - be sure to set up the directory that has enough free space to accommodate files, logs etc.(virtual size for the running appliance will increase the size of VMDK file by ~30%; you need all free space on the hard-drive you can get!)
By default, on Windows machines, the directory will be located on C:\ drive; to change it, go to File > Preference… option. The Default Machine Folder will be under tab [General] - select [Other…] choice from the drop-down box as shown on Figure 2
From the File menu select [Import Appliance…] option. The “Appliance Import Wizard” screen would appear. Click on [Choose…] button to navigate to the directory where the [Sampleapp_v107_GA.ovf] file is located, and select it.
The next screen will present the summary of the virtual appliance settings including location of the .VMDK files (they have to be in the same directory where .OVF file is).
Click [Import] button to start the process which can take up to several hours - depending on the computer’s caharacteristics.
(continued in Part IV of Getting started with Oracle BI: a virtual experience)
Getting started with Oracle BI: a virtual experience – Part II
Step 3.
Download OBIEE 11.1.1.5 Sample application (V107) from the Oracle’s site. The download includes deployment guide, VirtualBox VB Image Key - the deployment descriptor which is needed to convert VMDK image disk file into a working virtual machine, and the VMDK files themselves; the downloads descriptor on the Oracle site is shown on Figure 1.
The VMDK files are hosted at FTP server and you could use an FTP client of your choice (default port 21, user “robic1”, password “1pertg9edq”), or use your browser’s FTP capabilities. I went with the latter option; Figure 2 shows the directory structure for the FTP site:
The number of archives available for download was a bit puzzling, and for some reason- in my experience - the downloaded files were invariably corrupted upon assembly; downloading straight VMDK files from the Unzipped_Version directory worked for me.
(NB: verfying CHECKSUM - see file [checksum.md5] on Fig.2 - would provide reasonable assurance that the files were not tampered with; for instance, you could use FastSum free utility for this)
As part of the download, click on VB Image Key (.ovf) link shown on the Figure 1; both the OVF key file and four .VMDK files must be in the same directory.
The downloads take approximately 25GB of space. Make sure that you have plenty of space for the download and installation.
(continued in Part III of Getting started with Oracle BI: a virtual experience)
Getting started with Oracle BI: a virtual experience – Part I
Oracle Business Intelligence Enterprise Edition is a suite of that integrates a number of different applications acquired by Oracle during its shopping spree for the last 10 years. At the heart of the system is Siebel Analytics which Oracle had acquired in 2005, and at the heart of Siebel Analytics is nQuire which it gobbled up in 2002.
The Oracle analytics platform is facing stiff competition in the enterprise arena from the entrenched rivals in a rapidly consolidating market such as Cognos (acquired by IBM in 2008 ), Business Objects SA (acquired by SAP in 2007 ), Microstrategy Inc. (the only remaining independent heavy-weight) and Microsoft Business Intelligence Solutions ( MS SQL Server, MS Office and Microsoft SharePoint Server). In the Gartner’s Magic Quadrant for business Intelligence Platforms all these are in the Leaders quadrant (along with SAS, Information Builders and QlikTech).
As with every enterprise piece of software the users who want to have a hands-on experience face a challenge - procuring, installing and configuring OBIEE components is a daunting experience. Realizing this Oracle had provided prebuilt virtual environments that could be downloaded free of charge - a full-blown installation of OBIEE, complete with database server , application server and demo applications built on top of the stack.
This post (and a couple of follow up) describe my personal journey of installing OBIEE on a woefully under-powered 32-bit Windows XP Pro/SP3two-core 1.86GHz machine which barely met requirements set by Oracle:
- 4GB of RAM (the max under 32bit systems)
- 75GB free space (double this number for better performance!)
- NTFS file system (non-negotiable on Windows machines!)
It took me several hours to download, unzip, install and configure the appliance - and I had to start from scratch a couple times... Here is my step-by-step journey of downloading, installing and running OBIEE 11.1.1.5 - Sample Application (V107) virtual machine - hopefully it might help you along the way!
Step 1.
Get Oracle VM VirtualBox application here (you might need to create an Oracle Technology Network account to download trial products); select the installation for your particular operating system - Windows, Mac OS X, Solaris or various flavors of Linux. The download is about 90MB in size.
Step 2.
Install Oracle VM VirtualBox. On my Windows XP 32-bit machine the process was rather straightforward - accept all defaults by clicking Next button.... The installation takes additional ~120MB.
The application appears in the Programs menu as shown on Fig. 1
(continued in Part II of Getting started with Oracle BI: a virtual experience)




























