all bits considered data to information to knowledge


Documenting Open Source Software with Doxygen

By now, open source software has found its ways into enterprise development, and it is no longer a subject for discussion - whether it could be used or not. It can. It is being used by major corporations, and entirely new business models were created around supporting open source, often also free, software.

The prime examples, such as Apache, JBoss, PostgreSQL, MySQL, Drupal, Subversion, Pentaho - to name but a few - are counting their deployments into hundreds of thousands. And then there are less known projects, hosted at sites dedicated to open source such as Apache Foundation,   SourceForge and Codeplex Foundation, which provide components that could be used in your own development (checking licensing terms is highly recommended!)

The good news is that these projects could be used to solve your particular problems; the bad news is that because of limited developers resources these projects might have inadequate documentation, in some cases - non-existent. Here's where the "open" nature of the software is at its best. You can do it yourself.

My current favourite tool to document source code is Doxygen. The tool was developed by Dmitry van Heesch, and released under GNU General public license. It compiles superb documentation  for C++, C, Java, Objective-C, Python, IDL (Corba and Microsoft flavors), Fortran, VHDL, PHP, C#, and to some extent D. Here are but two examples of of the documentation I've generated from the open source code:

iTextSharp library (a port of the hugely popular iText open source Java library for PDF generation written entirely in C# for the .NET platform) and  SharpSSH (a secure Shell library for .Net, created by Tamir Gal and released under BSD style license).

Doxygen generated documentation for iTextSharp 5.0.2

Doxygen generated documentation for SharpSSH


Time to Move on: James Gosling leaves Oracle

As of April 2, 2010 the "Father of Java" is no longer with Oracle. This follows departures of Monty Widenius (2009) and Ken Jacobs (2010)   Oracle might have acquired the body but the soul is gone...

A cute slide presentation from eWeek: The Life and Times of Java and James Gosling

Tagged as: , , No Comments

FUD for thought

"To be uncertain is to be uncomfortable,but to be certain is to be ridiculous. " Chinese Proverb

The European Commission today (January 21, 2010) cleared Oracle's agreement to acquire Sun Microsystems. What does it mean for the development community, specifically for the future of Sun's crown jewels: MySQL, OpenOffice, GlassFish EE server, NetBeans... Oracle had almost a year to figure things out.

NetBeansis especially vulnerable given tha Oracle has competing JDeveloper (and Bea Java Dev tool); maybe it will be released as open source project to the community? Rolled into JDeveloper? Discontinued?

Why would Oracle need GlassFish when it already has Bea and Oracle AS? Cannibalization is very likely.

MySQL? Anybody's guess, but I bet that it will be supported and development will continue; maybe will undergo Oracle-ization (for example, replace MySQL procedural extensions - just introduced in version 5.0 - with robust mature PL/SQL). Will it still be free? Given $1 bln Sun had spent acquiring it, and $7+ bln Oracle spent acquiring Sun, it seems plausible to assume that Oracle would try to squeeze some dough out of it. Its own flagship database sales were stung by ascending SQL Server and IBM.. I see PostgreSQL as a winner, the only enterprise capable true open source RDBMS on the market.

Java. Once positioned as a spear at Microsoft's heart; not anymore - the landscape has changed, notably with Google becoming a major player, and Microsoft wisely playing its cards by releasing C# as open standard. Yet, I do not see Oracle donating Java to the open source community, most likely we'll see variations of Sun's controlled "Community Development Process". Oracle made significant investment into Java, supporting it inside its products, and even creating its own IDE... but what is going to happen to infant JavaFX ? RIA market is getting saturated - Flash/FlexSilverlight, AJAX (and Ajax support frameworks such as GWT)... Apache Pivot looks darn promising..  Will Oracle have enough resources to spread around?

Solaris. SUN's very own implementation of Unix operating system, arguable the best out there, AIX and HP-UX market penetration notwithstanding. For a long time Oracle and Solaris were inseparable; if an Oracle DBA did not run his database on Solaris he was somewhat deemed less competent. Then Linux came of age, and Oracle made huge bet on it (remember "Linux makes Oracle Unbreakable!",or  was it other way around?). Now they OWN the platform that they flagship database was designed for. Will they ditch Linux? Unlikely. Linux is on upswing, it is robust, reliable and has enterprise level support. Will Oracle push Solaris? Not exactly their domain of expertise, and market of operating systems is not as lucrative as it used to be. Then there is issue of the Sun's proprietary hardware - hugely overpriced, increasingly obsolete... Sun recognized that they cannot charge premium prices for the hardware that is becoming a commodity, and released x86 version of Solaris; it flopped (why x86 Solaris when I can run x86 Linux?). Apple seems to be able to create perception of superiority of both software (Mac OS) and hardware (Apple), but I credit Steve Jobs for it (to support my suspicion, follow the ups and downs of Apple stock plotted against timeline of Steve's health news; also, reliability of Apple laptops lags that of Asus , Toshiba and Sony - yet there is unshakeable perception that Mac is light years ahead of lowly PC... yalk about selling sizzle!)

My bet is that Solaris will be retired over period of time in favour of Linux.... R.I.P.

NB: FUD  - Fear, Uncertainty and Doubt


Look Ma, no SQL!

Is the Structured Query Language  goes the way of dinosaurs?
First proposed back in 1970s, the relational database technologies have flourished, taking over the entire data processing domain (with an occasional non-relational data storage hiding in long shadows of the [t]rusty mainframes). The days of glory may be over, and the reason could be  ... yes, you've guessed it - a paradigm shift.

The relational databases brought order into chaotic world of unstructured data; for years the ultimate goal was to normalize data, organize it in some fashion, chop it into entities and attributes so it could be further sliced and diced to construct information... There was a price to pay though t - need for a set-based language to manipulate the data, namely, Structured Query Language - SQL  (with some procedural and multidimensional extensions trown in...)

The Holy Grail was to get data to 5NF, and then create a litter of data warehoses - either dimensional or normalized to analyze the data.... Then again, maybe we could just leave the data the way it is, stop torturing it into relational model - and gain speed and flexibility at the same time?  That's what I call a paradigm shift!

Enter MapReduce: Simplified Data Processing on Large Clusters, another idea from Google (which also inspired Hadoop - open source implementation of the idea)

Google is doing it, Adobe is doing it, FaceBook is doing it, and hordes of other, relatively unknown, vendors are doing it ( lots of tacky names - CouchDB, MongoDB, Dynomite, HadoopDB, Cassandra,Voldemort, Hypertable ... 🙂

IBM, Oracle and Microsoft have announced additional features for their flagship products: the M2 Data Analysis Platform based upon Hadoop, and Microsoft extending its LINQ  (which goes past relational data) to include similar features... Sybase has recently announced that it implementes MapReduce in its SybaseIQ database.

To be true, the data still undergo some pre-processing to be fully managed by these technologies, but to a much lesser degree. The technology is designed to abstract intricacies of parallel processing, and to facilitate managementr of large distributed data sets;  it aims not to eliminate need for relational storage but the need for SQL to manipulate the data... the idea is to allow analytic processing of the data where it lives, without expensive ETL, and with minimal performance hit. The line is blurring between ORM, DBMS, OODBMS and programming environment; between data and data processing..

With all that said, it might not be the time to ditch your trusty RDBMS ( just yet...:)  A team of researchers concluded that "Databases "were significantly faster and required less code to implement each task, but took longer to tune and load the data," the researchers write. Database clusters were between 3.1 and 6.5 times faster on a "variety of analytic tasks."