all bits considered data to information to knowledge

4May/120

Adapting DbFit unit testing framework to team based environment

Introduction

There is a lot to be said about the virtue of simplicity. The unit testing frameworks out there could be daunting for uninitiated and many require to go through the hoops of additional technologies to get the job done. The xUnit derivatives such as JUnit or MSTest (and even more DB-specific adaptations such as dbUnit) require familiarity with the respective technologies – in addition to your database structures, code and data,. This is OK if you are using one of the ORM frameworks such as Hibernate, NHibernate, LINQ or Microsoft EF as you are writing Java or C# code anyway… but what if your database developers are so immersed into intricacies of SQL, Transact-SQL and PL/SQL that they do not have time to learn yet another language?
An interesting approach is offered by DbFit for Fitnesse… a wiki-based unit testing framework. Yes, you’ve heard this right – wiki-based. A test is written in a human readable markup language which is represented as a page in a Wiki website (the framework installation comes with embedded web server); upon test execution the wiki markup is passed to an engine that interprets and executes it against the code you want to test. The DbFit extension to the Fitness framework is unique in the sense that it does not require knowledge of Java or C#, and Wiki markup language’s conventions can be learned in but a couple of hours.
Getting started

So, what do you need to get started?

  1. Download full distribution of DbFit framework from Sourceforge . It includes all you need to run tests, no additional components are required – the framework, the web server, user manuals – it’s all in there.
  2. Unzip the contents of the package in the directory on your machine (both Windows and Linux environments are supported)
  3. Run start up script (startFitness.bat on Windows machines)

The webserver is java based so this is most likely picture you’d see after executing the script. Leave the window running, just minimize it (closing it would kill web server process, and the entire application)

The entire interface is web based, so you need to fire up your favorite browser and navigate to http://localhost:8085/ (note that by default the web server starts on port 8085, so you need to add this port to the address to access the page).

NB:The package also includes a comprehensive manual which takes you from the first steps to the mastery in but five chapters.
Now you are ready to run the batteries of the tests supplied with the framework… They come in two flavors: Java and .Net, your choice of either one will depend on personal preferences and the setup environment (i.e. you’ll be more likely to choose Java version if you are running a Linux machine). Click on [.Net Acceptance tests] link

The DbFit framework currently supports testing of Oracle, Microsoft SQL Server, IBM DB2 and MySQL testing though there are some limitations as to what you can test in each environment (e.g. currently, MySQL is not supported in .Net flavor). Here I have SQL Server 2008 Express edition set on my local machine - the same that I run dbFit.Fitnesse on, so let select [SQLServerTests] link

As you can see there are quite a few predefined test that you can modify for your environment (and you can add your own tests as wee - see manual for instructions).
Clicking on [Suite] button in the right pane would launch the battery of the tests you see in the Contents pane. But you are not quite ready yet - you still need to set up the connection string to your database; it could be shared through the entire series of test – or you could set up separate connection for each test. Select a test you’d like to rum ((here’s DateTests in my case), and click (Edit) link

Modify string to be a valid connection. For example: “!|Connect|.\SQLEXPRESS|fit|fit|library|” means : “coonect to local instance od SQLEXPRESS (.\SQLEXPRESS), user ID “fit”, password “fit”, database “library”.
Built-in tests are very good place to start with, and taking look into the already written tests would give a good idea about the syntax and structure of the markup scripts. You run the tests manually by clicking the [Test] button, and observe the results in the very same page (you can edit the test by clicking [Edit] button to the right; alternatively, yopu could import tests written in the markup in Microsoft Excel or Word formats). The results of the test are displayed in on the same page as it gets refreshed:

 

Running dbFit in team environment

The entire framework is conceived as personal productivity tool for individual developers, yet with a bit of tinkering it could be introduced into the team environment.

Here’s the setup that I came up with trying to solve problems of multi-user in wiki environment… First, all script pages need to be checked into version control (e.g. Subversion)

  1. Each developer has full DbFit/Fitnesse install on his/her workstation to run tests locally
  2. If applicable, each developer can run the tests against a local and/or dedicated database; if not – some thought needs to be given as to have to run tests in shared database environment – especially around security as database connection is embedded into the wiki pages and is unencrypted
  3. Since the team works in predominantly Windows/.Net environments we are using TortoiseSVN client to synchronize folders manually (it could also be automated)
  4. If unit tests history is required a process to harvest results’ files would have to be created (e.g. at the end of execution the results’ file gets transferred to version control repository, or gets renamed and time-stamped to prevents being overwritten; the files then can be parsed and trend graphs created)

This will give your developers a pretty good insight into formal quality of their code. Keep in mind that unit tests cannot vouch for validity of the business logic implemented in your code – after all, running text of your novel through a spell-checker does not guarantee a bestseller.
Final thoughts

I liked the DbFit/Fitness wiki concept – it provides simple, lightweight, intuitive framework for developing unit tests. It is also apparent that very little thought was given to logistics - integration with existing software development ecosystems, reporting and analysis.
The biggest missing piece for me is using DbFit with continuous integration server – such as Hudson/Jenkins. Since the entire framework is wiki based the automated execution presents a bit of challenge.
I can envision a Hudson job (possibly based on Maven/MsBuild packages) which checks out of SVN latest version of scripts, starts Fitnesse web server (or it might be running constantly), and then execute http requests to launch tests, then read the resulting pages (possibly convert them into JUnit format for displaying; there is also Fitnesse plugin for Jenkins). All this is doable but seems a bit flakey.
Also, this does not address traceability issues (linking test results with particular piece of code, build/release, all the way to requirements/use cases/user stories/scenarios) , trend analysis etc. Maybe in the future iterations the framework will close the gaps with enterprise level agile development, including affinity to continuous integration.

24Apr/120

Degrees of simplicity

Finally, there is a simple answer to all those tasked with organizing chaos! And the answer is ...drums rolling... 42!

Well, not really. The actual answer is C = F^3.11

where C is the complexity in Standard Complexity Units (SCUs) and F is the number of business functions implemented within the system. In his whitepaper - Mathematics of IT Simplification - Roger Sessions (of ObjectWatch)lays out foundation for quantifying complexity (with a U.S. Patent 7,756,735 for a mathematically-based methodology for minimizing the complexity of large IT systems and enterprise architectures).

This is a huge step forward - "you cannot control what you cannot measure" as Tom DeMarco noted in his "Controlling Software Projects" book

18Mar/120

Linguistic Darwinism, compliments of Big Data

With the shift from printed to digitized word  the languages finally could be analyzed in ways not possible before.

Over 5,000,000 books have been scanned, digitized and plugged into Internet maelstrom, and unstructured data analysis techniques  have evolved to the point where it could yield insights some might have intuitively anticipated but could never quite prove it. It took Big Data and Google's Culturomics project to make the breakthrough, and the results are in - it's a linguistic jungle out there, and the "survival of the fittest" principle governs the life and death of the words.

A team of authors in the article "Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death" published in the current issue of Science magazine examines these principles, and Christopher Shea of WSJ popularizes the results in his

Turns out that English language has over a million words, and continues to grow at the rate of ~8,500 words per year (the 2002 Webster's Third New International Dictionary has 348,000). And sporting career of a new word is about 30 to 50 years after which a word either disappears into the quick sands of archives or enters permanent lexicon. The process was undoubtedly sped up with the advent of the Internet, and proliferation of spellcheckers could have made it more rigorous.

The pattern is virtually identical across the three analyzed languages (English, Spanish and Hebrew).

15Mar/120

Analogy trap: Codermetrics

Recently I have read Codermetrics book by Jonathan Alexander, and was left with mixed feelings about it’s proposition: to introduce sports' style analytics to optimize performance of software development teams…

The similarities seem uncanny:  developing software is a team endeavor with people in one or more roles working towards a shared goal (release). Yet in my opinion the differences contradict similarities – often to the point of making the analogy superficial and altogether flawed. It is one thing to score 10 goals in the field, and quite another to produce 10 good software-intensive systems – emphasis on “good”; the sports metaphor ignores quality aspect almost entirely - a goal is a goal is a goal... and one good system design might be diametrically opposite of an equally good system design. While it could be argued that “winning software” is also quantifiable the approach is much more nuanced, and trends to change over long period of time. In addition, applying metrics to software development team immediately affects dynamics of the team and might introduce more problems that it is supposed to solve. The temporal dimension is also is all but left out - months of team training followed by a couple of hours of team effort is hardly analogous to a solitary task of learning and honing one's skills followed by months or even years of development marathon...

12Mar/120

Apple’s Freudian Slip?

An Apple's ad in the InformationWeek magazine (03/12/2012) reads:

Apple is looking for qualified individuals for following 40/hr/wk positions.

To apply, mail your resume to 1 Infinite Loop 84-GM, attn: LJ, Cupertino, CA 95014...

Given that the ad appears in an IT trade magazine and is placed by a company heavily immersed in IT there is a hope that the pun won't be lost on both the employer and its prospective employees.  :)

NB: I'll give Apple the benefit of doubt and will not pursue the H1B slippery slope of reasoning

1Mar/120

Data Virtualization vs. Data Federation

Data Virtualization takes the idea of the Data Federation one step further - both abstract data sources for the users but the virtual data adds logic in an effort to present coherent data structure to the clients. This ties directly into Master Data Management domain: no longer will you access columns and rows but you will access logical data entities which behind the scenes could be a composite construct. For example, virtualized "Customer" attributes might come from within a dozen of data sources defined as a part of "golden record".

In my opinion, this implies that any Data Virtualization effort must rest on form foundation of Master Data Management while Data Federation can skip this step as optional.

18Feb/120

Lots of little brothers… all watching you

Predictive analytics at its best... and worst.  Charles Duhigg's article How Companies Learn Your Secrets published in New York Times opens a big can of worms here. The truth is that we are getting better and better with predictive analysis aided by ever powerful computers and software, and better mathematical models... and we are getting closer to the point where our secrets do not even have to be stolen as they could be inferred from mountains of tiny clues we left behind as we are going after our daily lives.

The key to make this happen, the facilitator is unique identifiers we acquire with our credit cards, loyalty cards and other numbers that could be used to track your activities. It has its uses - such as prevent fraud, prepare for an eventual disaster and so on.. But there is more insidious side to the predictive nalytics - instead of Big Brother watching we have hundreds of small ones actively engaged into collecting and trading our personally identifiable information - something we are only too happy to give away for a few pennies in discounts on overpriced merchandise.  So goes our privacy - not with a bang but with a whimper

 

13Feb/120

HIPAA compliance in the cloud: cover your bases

Are there any HIPAA certified hosting providers out there? The short answer is - No.

Since there is no certifying body for HIPAA the best you can do is to to make sure that your hosting provider did his homework: conducted independent HIPAA audit, employee training etc.

In short, the provider has covered his and your respective behinds from potential litigation in case there will be data breach.

Yes, the chances for the safeguarding of your data are higher - after all there is a document stating that all 54 HIPAA citations and 136 audited components have been examined by a Certified HIPAA Practitioner...

Five Questions to Ask Your HIPAA Hosting Provider

 

26Jan/120

How clean do you want your data?

Yet another presentation by an RDBMS vendor leaves me scratching my head... Master Data Management is the buzz word de  jour , and everybody has "just what you need"; naturally, I am sitting through all the presentations trying not to choke on chaff.

"Clean, consistent, and accurate data" - how many semantic overlaps do we have here? Does "clean" implies "consistent"? or maybe just "accurate"? If so, how do you measure cleanliness? degrees of consistency?

Wikipedia article on "Data Cleansing" makes a distinction between "data cleaning" and "data validation" (which it should) but for all the wrong reasons - "validation almost invariably means data is rejected from the system at entry and is performed at entry time, rather than on batches of data." This would be news for the thousands of customers of SAS and Informatica struggling to validate (and re-validate) their legacy systems choke full of dirty data. Rather the distinction should be made on context - the Wikipedia article surreptitiously switches the context from database to software development;  no mater how much validation the gatekeepers might apply the sad fact is that once inside the data can (and often does) become invalid.

Unless there are measurable indicators quantifying these attributes of the data, the lofty goals of clean, accurate and consistent data are just a wishful thinking. A project manager on a Data Quality project must get all the stakeholders on the same page regarding what exactly each means by "clean data", and establish measures to gauge the progress and acceptance criteria.

6Jan/120

Walking the dog on Facebook

For over a year now I was observing a puzzling behavior of my friends and relatives who spend considerable amount of their free time socializing on Facebook - posting bits of news, pictures, responding to posts of their friends, liking and "unliking" their messages, tagging their photographs etc...

The epiphany came while I was walking my schnauzer through the neighborhood lawns and bushes: the semblance was uncanny - scout the grounds, check messages left by other dogs, mark the spot (I can only surmise whether this activity means creating a new message of his own, or merely responding to somebody else's message) ...

Could be that obsessive socializing is an inherent trait of all sentient beings?

Full disclosure: I also have a Facebook account... don't use it much though. :)