Why and How to Document Software

I have commented before on how important it is that programmers document the software that they write and working with code produced by numerous other programmers, as I am currently at the Digital Innovation Lab at UNC, only confirms to me the importance of doing so and how rarely it is done well if at all.

Why It Is Important

Just as writing good, clear expository prose is a sign of (and exercise for) good, clear critical thinking, so is writing good, clear documentation a sign of (and exercise for) good, clear software design and architecture.

Programming is almost never a write-once exercise. Documentation is a means to communicate your design and implementation to others who will be maintaining and extending your code, learning from and improving your ideas, and helping to sustain the initial investment in creating the software in the first place.

The worst case scenario is that a critical programmer leaves because s/he gets a job at another company or institution (programmers tend to enjoy a lot of employment mobility), or dies in a tragic caffeine overdose accident, and someone unfamiliar with the code is left with what s/he left, in whatever state of (in)completion or (im)perfection.

However, even the best case scenario of programmer continuity is challenging, given that code is complicated, a programmer isn’t going to remember all of the details of how s/he designed something and his/her code is inevitably going to be seen and used by others. Just because it looks nice and runs doesn’t mean that all is well under the hood.

How The Internet Ruined Software Design

Even in the old days (I’ve been programming on and off for over 30 years!), software engineers needed a way to handle the growing complexities and inter-dependencies of computer systems, and object-oriented programming was a novel solution to managing complexity and encapsulating design details.

An object, in this conceptualization (first pioneered by XEROX PARC’s Smalltalk language in the 1980s), presents a certain set of features and behaviors to the outside but exactly how those things are implemented is hidden and up to the programmer. The object must be a self-contained black box that acts like a responsible citizen in its local environment, simply making good on the set of contracts it presents to its neighbors, allowing requests to be made on its internals and providing useful information about itself when requested.

Along comes the internet, which allowed users on their PCs to interact (exchange information) with servers and other PCs elsewhere. The result has been the fragmentation of code and data, breaking up the neat and tidy conceptualization of objects and delegation of responsibilities for implementation that existed in the OOP world.

Code on a server (often in the crude PHP language) deals with its data, creates data to pass to browsers on PCs which then gets read and analyzed by completely different code (usually in JavaScript) which produces and is interdependent on CSS and HTML. Add the complexity that servers are often running Content Management Systems like WordPress or Drupal, and the result is a rat’s nest of complexity and interdependencies that can be overwhelmingly difficult to understand and debug. Which makes good, clear documentation all the more important.

No program is an island, and the network of interdependencies is as complex as ever. Code can break because the operating system changes, the programming language changes (PHP, JavaScript, …), the content management system changes, the browser (or the correlated HTML or CSS specification) is updated, the libraries (jQuery, Bootstrap, Foundation, YUI …) are updated, etc. It is reasonable expect things to break and need to be fixed — sometimes requiring extensive changes — every two years at most. And this means being clear about what your program uses, what version(s), why and how.

Why I Eschew JavaDoc Style

Any system that encourages and facilitates documenting software is probably good, but I’m disappointed that the JavaDoc style seems to have become an industry de facto standard. Why? Although it produces nice results in HTML for looking at interfaces outside of the code itself – which is great for large-scale programming languages, libraries and systems used by large numbers of people – this system is ill-suited to the most common usage scenario: programmers looking at documentation in the context of the software itself in smaller code bases to fix bugs, make changes, and figure out what  is going on. JavaDoc is ill-suited to human consumption where it is needed the most: in the context of the code itself.

Here’s an example of JavaDoc to contrast with a human-readable style of the sort I write. JavaDoc:

 * Returns the localized preferences values for the key.
 * @param  key the preferences key
 * @param  useDefault whether to use the default language if no localization
 *		   exists for the requested language
 * @return the localized preferences values, or empty array (if key undefined)
public String[] getPreferencesValues(String key, boolean useDefault);

My style for the same function:

// getPreferencesValues()
// PURPOSE: Returns the localized preferences values for the key
// INPUT:   key = preferences key (PortletPreferences)
//          useDefault = true if use default language if no localization exists
// RETURNS: the localized preferences values, or empty array (if key undefined)

What to Document

At the beginning of each large entity – an object, a library, a JavaScript file, etc – you should document at the minimum:

  1. What is the purpose and responsibility of this entity?
  2. What code libraries is it dependent upon and which versions of those are known to work? (For example, “Uses jQuery 1.7”)
  3. What side-effects does this entity cause? For example, creating or modifying global variables, effecting or undoing hooks, passing JSON objects, etc.
  4. What are the variables in the outer-most scope, what are their purposes and designs? This is particularly important where the contents can be dynamically created via JSON and other run-time methods, and no clear preset of fields are provided in code.

A function or method should still be treated as a black box that offers and obeys a strict contract which is implicit in the parameters which come in and the results that come out. However, as I’ve implied above, the black box is more permeable than ever (especially in online contexts) thanks to the fragmentation of code and the use of libraries.

While exactly what you document at the beginning of each function or method will depend upon its language, purpose, context and size, I usually include the following whenever they are appropriate for the function/method’s “contract”:

  1. What is the purpose of this code?
  2. What are the input parameters and what is the allowable range of values (or special values with special meanings)?
  3. What value(s) or structure(s) does the code return?
  4. What are the assumptions of the code – that is, what values or conditions outside of the black box (particularly global states and variables) does it rely upon?
  5. What side-effects does it cause outside itself? That is, how does it effect the global context and thus break the black box principle by changing global variables or other aspects of the environment?
  6. What exceptions does this code trigger and/or handle?
  7. What is left to do in this code? Explain known bugs, shortcomings, code yet to be optimized or updated.

Digital Humanities Post-Script

One of the interesting branches of the emerging field of Digital Humanities examines critically the socially constructed nature of technology, how, as an artifact it reflects, reinforces and participates in the privileging of certain groups, epistemologies and ideologies. (Postcolonial Digital Humanities is a great forum for some of these issues,  there have been interesting critiques from the vantage point of feminism and volumes such as The Cultural Logic of Computation critiquing the ideology of “computationalism.”)

As a practicing humanities scholar, I’m  quite aware of the cultural values and imperatives implicit in the “world-order” of software design: a tightly managed, well-behaved world of contracts, etc. I’m sure there is much more to say about this product and process from the vantage point of cultural criticism – even if it is unlikely to have much effect on the exploitation of computer-based technologies.

One thought on “Why and How to Document Software

  1. “Just because it looks nice and runs doesn’t mean that all is well under the hood.” Innit the truth. Conversely it can look really nice and be well documented but not run very well.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s