Documenting Open Source Software with Doxygen

July 28th, 2010

By now, open source software has found its ways into enterprise development, and it is no longer a subject for discussion – whether it could be used or not. It can. It is being used by major corporations, and entirely new business models were created around supporting open source, often also free, software.

The prime examples, such as Apache, JBoss, PostgreSQL, MySQL, Drupal, Subversion, Pentaho – to name but a few – are counting their deployments into hundreds of thousands. And then there are less known projects, hosted at sites dedicated to open source such as Apache Foundation,   SourceForge and Codeplex Foundation, which provide components that could be used in your own development (checking licensing terms is highly recommended!)

The good news is that these projects could be used to solve your particular problems; the bad news is that because of limited developers resources these projects might have inadequate documentation, in some cases – non-existent. Here’s where the “open” nature of the software is at its best. You can do it yourself.

My current favourite tool to document source code is Doxygen. The tool was developed by Dmitry van Heesch, and released under GNU General public license. It compiles superb documentation  for C++, C, Java, Objective-C, Python, IDL (Corba and Microsoft flavors), Fortran, VHDL, PHP, C#, and to some extent D. Here are but two examples of of the documentation I’ve generated from the open source code:

iTextSharp library (a port of the hugely popular iText open source Java library for PDF generation written entirely in C# for the .NET platform) and  SharpSSH (a secure Shell library for .Net, created by Tamir Gal and released under BSD style license).

Doxygen generated documentation for iTextSharp 5.0.2

Doxygen generated documentation for SharpSSH 1.1.1.13





Lost in translation: Language and Perception

July 26th, 2010

The idea of a language defining our perception was supposedly disproved by Noam Chomsky’s introduction of the “Universal Grammar”. And yet, this new study opens the very same can of worms again, along the Sapir- Whorf Hypothesis lines.

If Russian speakers could see more shades of blue because the have more words describing it, and Japanese and Spanish speakers struggle recalling agents of accidental events because of the way their respective languages work, maybe our ability to learn and understand semantically significant concepts is also influenced by the medium through which we absorb these concepts – a language in this case?

Maybe there was something to “the golden key” – Latin and Greek Languages, common languages of the European scholars - that kept link to antiquity in the darkness of Middle Ages? Maybe, there is language uniquely suited to learning some specific subject?  Domain Specific Languages are relatively common in computer programming; maybe the concept could be applied back to “natural” languages and comprehension?

A word of caution to the tale…

There is a short story by Robert Sheckley - ”The Language of Love” (1957, Notions: Unlimited); in it a young man sets out to learn the almost forgotten Language of Love, developed by the now extinct inhabitants of a distant planet. After maatering the language, he discovers the reason behind the extinction of that alien race – the Language of Love is so precise and complex that learning (and then using it) becomes an endeavor unto itself, impeding communications with uninitiated, and leaving no time for anything else… ;)

Getting creative with Hudson CI Plugins

June 29th, 2010

I’ve been using Hudson continuous integration server for some time now, and – by and large – I’m very happy with the tool. It enjoys popularity in the open source community, and because of this popularity one has a wide spectrum of high-quality plugins  to extend Hudson’s functionality.
Sometimes, it is possible to find an un-intended use for a plugin (which might be also an indication to clone it, and make it new-use specific) . Here’s one such an example: I was looking for a way to scrub my source files for hard-coded values, and came up with a reasonably fast command line executable (C#) which recursively crawls directories and produces verbose report pinpointing each occurence of the specific string tokens; I wanted to see the results surfaced through Hudson, and then, right before I started thinking about formatting HTML and hooking into Hudson’s extensibility model, I got a better idea.

 I’ve been using Task Scanner plugin for Hudson by Ulli Hafner for awhile, and found it very helpful – stable, highly configurable; then it occurred to me that this plugin can be repurposed to look for hard-coded values.

 
While not often, but my team had been burned by hard-coded database credentials, IP addresses and such a number of times. These issues usually manifest themselves when an application I being deployed in an environment different from the one developers are using. For instance, a developer might have been using local instance of RDBMS for speed and convenience reasons, and might have – again, for convenience, put a connection string into his code (“yes, I  know about configuration files, but it is just this only time, and I will change it right back, as soon as I am done”).  Now your build is broken, and you might spend hours tracking down the problem.
One solution would be to instruct your Task Scanner plugin to look for any part of the following connection string – or take it as a whole (pay attention to special characters in the token strings) :

Data Source=localhost; Initial Catalog=myDataBase;Integrated Security=True;

The results of the code scan not only would summarize all occurrences of the specified string, but would take you straight to the line of the code in the specific module, display trend in a clickable graph and provide at-glance report view.
Cloning the plugin to change appearance, captions etc would allow you to distinguish between the usages – whether you are looking for TODO tasks or for hard-coded values.

Keeping up with database changes

June 11th, 2010

Scenario: several developers are hard at work cranking out code. The application under development relies on RDBMS back-end for persistent storage (in this particular case, the database is Microsoft SQL Server 2005, but the technique described applies to any RDBMS supporting DDL triggers). Developers are making changes to the client application code, creating/altering/dropping database objects (stored procedures, tables, views etc.) and, in the heat of the moment, forgetting to communicate the changes to their teammates left alone the project manager…

Yes, I know – this is not how it supposed to happen, and yet in the world out there, more often than not, it does happen… Here are some do-it-yourself ideas on how you could alleviate the pain and spare you some nasty surprises without buying more tools…

Enter DDL Triggers. This is relatively new feature with Microsoft SQL Server (though Oracle had them for ages), and, among many other things (rolling back changes, for instance), it could be used to solve the problem stated above.

A DDL (Data Definition Language) trigger in MS SQL Server can have two scopes – server and database. The Table 1.1 at the end of this post lists all the events for which DDL trigger could be created, grouped by scope. For the full syntax in creating a DDL trigger please see vendor’s documentation; here I will only touch basics needed to illustrate a solution.

Here’s a database scop trigger we are going to use to monitor events:

CREATE TRIGGER [tr_DDL_ALERT] ON DATABASE —- trigger is created in context of a given database
FOR CREATE_TABLE, DROP_TABLE, ALTER_TABLE    —- which events to capture; see Table 1.1 for full list
AS         —-
use DDL_DATABASE_LEVEL_EVENTS captures all DB events
SET NOCOUNT ON
DECLARE @xmlEventData XML —- the generated event data is in XML format
SET @xmlEventData = eventdata() —- get data from the EVENTDATA() function

Now, this trigger would not be much of use to anybody; you need to parse information contained in the XML message passed into your trigger upon the event. You could parse it and send an email message, or you could save it into a database, or both.

The following code saves it into a table [tbDDL_ALERT] – which, of course, has to be created beforehand:

INSERT INTO dbo.tbDDLEventLog
(
EventTime
,EventType
,ServerName
,DatabaseName
,ObjectType
,ObjectName
,UserName
,CommandText
)
SELECT REPLACE(CONVERT(VARCHAR(50), @xmlEventData.query(‘data(/EVENT_INSTANCE/PostTime)’)),’T‘, ‘ ‘)
,CONVERT(VARCHAR(100), @xmlEventData.query(‘data(/EVENT_INSTANCE/EventType)‘))
,CONVERT(VARCHAR(100), @xmlEventData.query(‘data(/EVENT_INSTANCE/ServerName)‘))
,CONVERT(VARCHAR(100), @xmlEventData.query(‘data(/EVENT_INSTANCE/DatabaseName)‘))
,CONVERT(VARCHAR(100), @xmlEventData.query(‘data(/EVENT_INSTANCE/ObjectType)‘))
,CONVERT(VARCHAR(100), @xmlEventData.query(‘data(/EVENT_INSTANCE/ObjectName)‘))
,CONVERT(VARCHAR(100), @xmlEventData.query(‘data(/EVENT_INSTANCE/UserName)‘))
,CONVERT(VARCHAR(MAX), @xmlEventData.query(‘data(/EVENT_INSTANCE/TSQLCommand/CommandText)‘))

And sends out email notifications using potentially obsolete extended stored procedure (assemble message (@body variable) from the elements of the XML message as shown in the example above):

EXEC master..xp_smtp_sendmail
@TO = ‘me@somewhere.com
,@from = ‘someone@somewhere.com
,@message = @body
,@subject = ‘database was modified
,@server = ‘smtp.mydomain.com’

Long-term solution would be, of course, configuring SQL Server Database Mail.

In my next post I will describe how database triggers could be integrated with Hudson - an open source Continuous Integration (CI) server.

Table 1. List of the values to use with server and database scope DDL triggers

Server Scope Database Scope
ALTER_AUTHORIZATION_SERVER
CREATE_DATABASE
ALTER_DATABASE
DROP_DATABASE
CREATE_ENDPOINT
DROP_ENDPOINT
CREATE_LOGIN
ALTER_LOGIN
DROP_LOGIN
GRANT_SERVER
DENY_SERVER
REVOKE_SERVER
CREATE_APPLICATION_ROLE
ALTER_APPLICATION_ROLE
DROP_APPLICATION_ROLE
CREATE_ASSEMBLY
ALTER_ASSEMBLY
DROP_ASSEMBLY
ALTER_AUTHORIZATION_DATABASE
CREATE_CERTIFICATE
ALTER_CERTIFICATE
DROP_CERTIFICATE
CREATE_CONTRACT
DROP_CONTRACT
GRANT_DATABASE
DENY_DATABASE
REVOKE_DATABASE
CREATE_EVENT_NOTIFICATION
DROP_EVENT_NOTIFICATION
CREATE_FUNCTION
ALTER_FUNCTION
DROP_FUNCTION
CREATE_INDEX
ALTER_INDEX
DROP_INDEX
CREATE_MESSAGE_TYPE
ALTER_MESSAGE_TYPE
DROP_MESSAGE_TYPE
CREATE_PARTITION_FUNCTION
ALTER_PARTITION_FUNCTION
DROP_PARTITION_FUNCTION
CREATE_PARTITION_SCHEME
ALTER_PARTITION_SCHEME
DROP_PARTITION_SCHEME
CREATE_PROCEDURE
ALTER_PROCEDURE
DROP_PROCEDURE
CREATE_QUEUE
ALTER_QUEUE
DROP_QUEUE
CREATE_REMOTE_SERVICE_BINDING
ALTER_REMOTE_SERVICE_BINDING
DROP_REMOTE_SERVICE_BINDING
CREATE_ROLE
ALTER_ROLE
DROP_ROLE
CREATE_ROUTE
ALTER_ROUTE
DROP_ROUTE
CREATE_SCHEMA
ALTER_SCHEMA
DROP_SCHEMA
CREATE_SERVICE
ALTER_SERVICE
DROP_SERVICE
CREATE_STATISTICS
DROP_STATISTICS
UPDATE_STATISTICS
CREATE_SYNONYM
DROP_SYNONYM
CREATE_TABLE
ALTER_TABLE
DROP_TABLE
CREATE_TRIGGER
ALTER_TRIGGER
DROP_TRIGGER
CREATE_TYPE
DROP_TYPE
CREATE_USER
ALTER_USER
DROP_USER
CREATE_VIEW
ALTER_VIEW
DROP_VIEW
CREATE_XML_SCHEMA_COLLECTION
ALTER_XML_SCHEMA_COLLECTION
DROP_XML_SCHEMA_COLLECTION

Continuous integration with SQLCMD and Hudson

June 3rd, 2010

If you are not doing continuous integration, you should; and if you are – then you ought to consider database install as integral a part of your build process.

Most CI servers out there would allow you to execute batch or shell commands, and virtually every RDBMS provides a command line utility (and creating one on your own – if needed – is rather trivial).

Installing a database as part of your build process, and populating it with data could play role in your unit testing strategy, and should definitely be considered integral part of functional and regression testing procedures.

The following gives but an example of how to make MS SQL Server database install a part of your build process utilizing Microsoft command line utility SQLCMD and open source continuous integration server Hudson. This could be applied to any other RDBMS package – MySQL, PostgreSQL, Oracle, DB2 or Sybase – with minor adjustments.

The command line utility can be downloaded separately, or installed as part of SQL Server 200X installation. If your unit tests require database support, it might be a good idea to install free SQL Server Express Edition which could be started as part of the build process and shut down afterwards.

“The sqlcmd utilitylets you enter Transact-SQL statements, system procedures, and script files at the command prompt, in Query Editor in SQLCMD mode, in a Windows script file or in an operating system (Cmd.exe) job step of a SQL Server Agent job. This utility uses OLE DB to execute Transact-SQL batches.”

This provides an opportunity to make creation of a database and all dependent database objects a part in your continuous integration build process with Hudson – an open source continuous integration serverthrough executing scripts – either integrated with your build management utility such as Maven, Ant or MSBuild – depending on your platform, or just plain batch or shell commands.

A very basic Windows batch command in Hudson installing database through SQLCMD might look like this:

sqlcmd –S<IP address>,[port]  -U<user> -P<password> -dmaster  -i%WORKSPACE% \exec.sql
  • -S indicates IP of the SQL Server instance to connect to
  • - U and –P  - user ID and password, respectively (this example uses SQL server Authentication)
  • -d specifies the default database to connect to, and [master] database is the one you would want if creating a database is part of your build process.

NB: for complete commands list see documentation. Keep in mind that UserID/Password are in clear text, and will be sent over the network as such (unless you are using DAC). To minimize amount of hard-coded use include files in your script.

Here is an example as SQL code could be organized, in order of execution (I will link script files soon):

1 exec.sql main controller of the database installation process
2 constants.config contains declaration of all variables to be used in the script; note that file extension is irrelevant for execution
3 backupDB.sql backup existing database (if present); note that backup directory must exist on remote computer
4 createDB.sql create new database; note that all the paths must exist on the remote computer
5 createTables.sql creates all tables in the database; it might include creation of indices and constraints as part of the script but I would advise against it because of the potential dependencies conflicts
6 createFunctions.sql creates all the user-defined functions for the database; the order in which objects are created in the database is important, placing functions before [views] and [stored procedures] reflects common dependency pattern as both could use the functions.
7 createViews.sql creates all views
8 createProcedures.sql creates all procedures
9 createConstraints.sql adds constraints to the objects: primary keys, foreign keys, indices etc.
10 importData.sql if your database has static data this could be used to add it at creation time; you may want to switch 9 and 10 as your data might potentially violate constraints (e.g. orphaned records); this also could be used in unit testing strategies
11 createUsers.sql add all users; this script assumes that logins are already created (if not, add script to create logins first)
12 grantPrivileges.sql grant privileges to the objects (e.g. EXECUTE)

Gotchas:

It is important to understand that GO command completes the batch execution and flushes the buffer; it makes SQLCMD “forget” everything you might have declared prior to executing the command. In the above example, all variables declared in [constants.config ] are no longer part of the script once the GO command was issued.

When creating scripts, keep in mind differences between local (Hudson) directories and remote (SQL Server) ones. The former refer to location of the SQL script files checked out by Hudson from your source control, understood by SQLCMD and Hudson only;  the latter specifies directories that  SQL Server understands – backup and database locations.

SQLCMD takes in arguments in clear text which constitutes potential security breach; use it in fully trusted environment. Alternative would be implement workaround such as local batch files in secure directories with hard-coded userID/Passwords, and rely on Hudson security matrix; only users with access to the server would be able to see it. This does increase maintenance butb is relatively easy to implement.

If you want SQLCMD generated messages to be displayed in Hudson console output do not specify output file. Alternatively,  I could envision a plugin that would parse the output file, and present it nicely in Hudson environment; I might take a stab at it, time permitting.

The successful execution of the scripts relies on correct order of creation – you must figure out object dependencies, and factor it in your scripts. Unfortunately, this is classical Catch 22 – the reliable way to determine dependencies is to query SQL Server after the objects has been created… Which means that you ‘d have to run all the script manually first, and adjust your scripts accordingly.

Here are some clever scripts that allow for discovery of the correct sequence for stored procedures and functions.

Sybase, an SAP company

May 16th, 2010

SAP has acquired Sybase for about $6bln…I might  be missing something but it seems to be an act of desperation – on both sides –  all this talk about synergy and efficacy notwithstanding.

The way I see it, Sybase was floundering for years, first squandering their RDBMS position by neglecting markets (as witnessed by, for instance, in their pathetic TPC benchmarks and truly archaic dialect of Transact-SQL), then foregoing initial success of PowerBuilder ($3,000 for an IDE?! this what comes out of $1 billion dealin 1994 with PowerSoft) unable to compete with more nimble VisualBasic and Delphi (though not necessarily more technologically advanced) in the data access applications arena..

SAP was steamrolling businesses into what they defined as set of “best practices” until their Borg-like message  sunk in in the wake of high profile implementation failures, as well as increased competition from Oracle, Microsoft and others following of ERP market consolidation (Siebel, JD Edwards, Lawson).  Then there are number of departures from SAP management team (Leo Apotheker being the latest)…

What will happen of all suite of Sybase products, what will happen to dozens of overlapping competing technologies and solutions – is any body’s guess. I would stay away from SAP stock for awhile.

Losing browser wars

May 12th, 2010

Microsoft is losing browser wars on younger generation. The main culprit – it is slow. It is annoyingly slow to start up (what is it doing these minutes while opening on my computer? Connecting to Microsoft to log my session? Initializing umpteen+ plugins and components?), it is slow to render graphics, it behaves erratically with downloads… Wikipedia supplies some stats on browser usage out there: IE @53%, Firefox @31% and Google Chrome @8%.  It was almost 90% of the market for the Microsoft’s IE as recent as 2005…  A bit of anecdotal evidence : my 16 years old hates Internet Explorer for all the reasons listed above – and he grew up with IE using it exclusively up until last year (that’s 8+ years!) , ditching it for Chrome. “It does what I need, and it is sooo fast!”.

I believe that Microsoft became too preoccupied with today’s corporate suits losing the younger generation; after all, they are in business selling Office products. Of course, they are paying lip service with flops like Zune and occasional successes like XBox… but lacking Google’s razor sharp focus. After all, Gioogle is doing exactly what Microsoft did back at the beginning of the 1990s, when facing uphill battle against entrenched UNIX boxes with Windows 3.11 and languages like Visual Basic 3.0. These were FUN!

Microsoft is not fun anymore, it is a serious business. And this is the problem it will face in the future when today’s kids graduate into corporate boardrooms.

.Net as Will and Representation

May 4th, 2010

It’s been a long run for .Net in the wild… The experiment with letting go is about to end, and .Net is to become yet another Windows “component”.

The update to .NET Framework policy states that beginning with .NET Framework 3.5 Service Pack 1 (SP1) the .NET Framework will be defined as a “Component”. As a Component, .NET version 3.5 Service Pack 1 (SP1) will assume the same Support Lifecycle policy as its parent product or platform.

As Yogi Berra might have remarked: “It’s déjà vu all over again!”  Yes, I am referring to Internet Explorer 4.0 being “integral part of Microsoft Windows“.
I think this is a major blunder on Microsoft’s part, and an opening for Java to regain some of the lost ground (the last time I’ve checked JVM was still a separate product…)

A bite of Apple

May 1st, 2010

It is official – Steve Jobs does not like Flash. The reasons explaned in a long essay cite everything under the sun – from open standards to battery life to lack of H.264 standard support… Job’s assertion that “Flash is the number one reason Macs crash” is all but certain to anger many on both sides of the dispute..

While I am not discounting Apple’s anal urge to control the straw that saved it from near death experience in the beginning of 2000, I think Steve Jobs got a point: Flash is a crutch obliterated by nascent open standards (same probably goes for Microsoft’s Silverlight) and will be obsolete withing few short years. The pendulum has swung the other direction – the hardware processing.

[update] Microsoft decided to side with Apple on this, probably given up its hopes for Silverlight to upend Flash.  This eerily reminds of the tactics applied to Java rivalry when C# was released as an ECMA standard

Forest behind trees: Story of Enterprise Architecture

April 29th, 2010

I was listening to Story of Human Language audio course the other day. Dr. John McWhorter was explaining how European languages came up with the idea of gender for inanimate objects. The example he was using was silverware in German, with spoon being “he” (der Löffel), fork being “she” (die Gabel), and knife being of neuter gender (das Messer). The current theory maintains that this is the result of gradual changes, small steps taken one at the time that lead to the situation as we see it now. And each of the steps made perfect sense to the people at the time. Yet, the notion of gender in a language, left alone attribution of a specific gender to an object, appears manifestly arbitrary to non-native speakers.

It had occurred to me that this could be a perfect metaphor for ad-hoc Enterprise Architecture without roadmaps: a series of decisions that were a good idea at the time leading to a sorry state of chaos because there was no life-line stretching from “as-is” into the future state

The second distinction made by the professor was that of a language complexity inversely reflecting advancement of a society. Despite popular notion that the more evolved society would have more complex languages, in fact the opposite is true. A language used in a fast paced society loses many accoutrements considered necessary in less advanced societies (e.g. compare the etiquette of a French Royal court of Louis XIV with that of modern France) .

Applied to Enterprise Architecture this would imply that in the organizations with evolved EA programs the IT systems landscape will be less – not more – complex, and more efficient at the same time.