Agile Architecture: Documentation

Sometimes I think that documentation is created following the principle "you can never have enough documentation". I wonder if this has been adopted from "you can never have enough of a good thing"?

When talking about documentation we are discussing knowledge transfer. And knowledge transfer is a good thing, right?

Sounds good, but what about "we have thousands of pages of documentation - and none of it is current"? Or "if people would just read MY documentation, then they'd know"? And the sighs when you point new recruits to your "internal knowledge base"?

These are all symptoms of non-effective knowledge transfer.

What does that have to do with architecture?

Agile promotes "working code over comprehensive documentation". This statement makes many people (usually the people who don't have to write documentation) a little uneasy. How can they distribute the same information to people, and without it getting changed in the process? What can they do with a blurb of code without knowing how it works, how to test it, how to deploy it, once the last of the team has left? Living software changes constantly - thus if it can't be changed it will die.

The architecture itself guides the implementation. It has an impact on design, execution and technology choices - which, although often neglected, have a huge impact on documentation requirements.

Face-to-face should be the default

Scott Ambler describes face-to-face communication (borrowing from Alistair Cockburn's Agile Software Development) as the most effective mode of communication - for co-located teams and stakeholders that is.

What if that isn't possible - as the statements above already highlight for a few areas? Distilling these results in a few observations:

We document knowledge that can't be delivered person-to-person, because

  • it needs to be communicated to stakeholders we don't have access to
  • it is tied to a person or group that can disappear

We also document knowledge when face-to-face communication is not comprehensive enough, because

  • it is too complex to be described within a single context
  • it is likely to deviate over time, or when passed

Documentation debt

All sorts of documentation are created over time - several views of the architecture for different stakeholders, presentations, detailed design snippets, firewall change request forms, another network diagram for the security guys ... whenever one of these is created the debt cycle begins - just like technical debt.

It is kept because ... well, the more we think about it, the stranger it gets: hoarding ambitions, personal accomplishments connected with it, "it has been paid for", data retention policies ... anyhow, it's certainly not being read. So, at least, hide it.

Systems architecture vs. solution architecture

Have you ever needed to read three architectures to understand the system? With some explanation on which chapters still apply, or are going to apply in 3 months time?

I propose to make a distinction between systems architecture and solution architecture:

Systems architecture describes the concerns, business concepts and logical structure of one system/vertical (the why and what). It provides a map to decide where to look for and add a particular functionality. A systems architecture document is always project-independent and is tied to the lifecycle of a system - and it always describes the current state.

Solution architecture describes the architectural changes to (perhaps multiple) existing systems. While they may be structured similarly to systems architecture documents, they are tied to the lifecycle of a project. They are folded back into systems architecture documents upon completion.

Solution architectures make poor candidates for documentation: They either describe what isn't there (and perhaps never will be), or worse, incorrectly describes what is there, or what was there - as seen in the examples above. None of this is helpful.

Design is not documentation

Design and documentation are somewhat mixed up: Design describes the implementation specifics of a planned application; documentation describes the (design of the) current state.

Don't ever mix them (from own experience ...), and throw the design away as quickly as possible. Just like a solution architecture, it is a disposable artefact, like a pizza box - documentation debt. I know it's hard, so much effort, so many good memories ... just do it.

Generation over documentation

The drift between actual state and "designed" state is one of the biggest cause of issues with documentation - which leads back to documentation debt: The greater the degree of separation between documentation and application the higher the interest rate.

The code and configuration though hold all the keys:

  • Interfaces
  • Data Model
  • Patterns
  • Behaviour
  • Deployment
  • Operations

Shouldn't this be the best place for documentation - rather than writing it, let it write itself?

Let's have a closer look:

Show me your API ...

... and I will know how your software works. In addition to documenting the interface to the application - that in itself should already make it worth it -, the API reveals a lot about the internals of an application. It is a window into the data model, how state is handled, and from a consumer's point of view the business logic1.

Reversing this statement: All efforts to improve the data model, state and behaviour translates automatically into better documentation.

And you even get additional benefits:

  • The API doesn't need to be "translated" for consumers
  • Tests for the API act as documentation
  • The code is more descriptive and serves as documentation

Why wouldn't you?

What does the code say?

Well, some code says nothing (or it says it in such complicated ways, it equates to nothing ...). Most code though says a lot: it describes patterns, data model, behaviour, security, dependencies, deployment, and occasionally even provides an insight into the disagreements and frustrations of the developers.

... your developers already document every class and method?

Code level documentation, while not good or bad itself, can only paraphrase what is already there - and most developers I met instantly recognise it as paraphrasing, thus duplicate information, thus not worth updating ... If the code itself speaks through its structure, naming, dependencies, and build/deployment instruction, then additional code documentation is not required.

Self-documenting code requires an active investment into good coding practises, defining domain concepts, contexts and boundaries, appropriate patterns - and drive the selection of language, libraries and build/integration tools.

The cost of enterprise products

Enterprise products, I believe, invented a new category of required documentation: The "as-built" document. Endless sequences of configuration parameters, screenshots of installer screens ... I don't have suicidal tendencies, but that pushes even me to the brink ...

I subscribe very much to the buy-before-build paradigm. The nature of product development though is that it usually starts with a simple & lean product - but as product companies need to continue selling, combined with competitive pressure, the product becomes more adaptable, and loaded with features. As these products become more to anyone the complexity and configuration increases - and with it documentation.

I wrote before about the impact enterprise products have on delivery speed - see Agile Architecture: Dependency Management. The impact extends to documentation as well: the costs of documentation at times exceeds the costs of purchase. And often the design of the product forces a high degree of separation between product and documentation - which is further worsened when this documentation becomes a project artefact, never to be updated afterwards.

And it gets worse ...

Enterprise resource management

And if installing and configuring a feature-rich product wasn't enough, new abysses open up when trying to integrate products into a reasonably complex (enterprise) environment. Database schemas need to be allocated, IP addresses requested and registered, firewalls opened for these custom ports ... and while we are at it, perhaps a clean-up of those resources we "know" aren't in use anymore (or are they?).

The impact of products in an enterprise often spreads further than the product itself, into the operation area: The product uses infrastructure resources like databases, message queues, servers, network resources (e.g. IP addresses) - all of these which need to be managed separately (good old Excel ...). These documentation requirements are driven by design choices which neglect the management (creation, maintenance, deletion) of external resources - which then need to be managed manually, which again requires ... documentation.

Product developers (not only for enterprise) must step up their game and make products that document themselves during installation/update in a way that is easily re-usable for documentation, as well as manage their dependency on external resources - otherwise, they will also be left behind in yet another area. Purchasers of products need to and will make selections based not only on the purchase price but also the time costs and knowledge transfer costs.

We have our own data centre

I can't count how many times I have documented the structure of our client's data centres, servers deployed, firewalls opened, load balancers configured, certificates ... too many times, for sure.

Data centres come with many twists, but within this context specifically the treatment of environments: Replicating everything for different environments (dev, QA, pre-prod, prod, ...) is expensive, so the development environment is often "simplified". No firewalls, all one network; and even QA & pre-prod look a little different to production ... sure, QA doesn't need high-availability, backup is simpler, monitoring too expensive ... and the security guys didn't like central logging integration either.

That somewhat makes sense ... except that now a lot of time is spent on documenting and re-documenting the environments, and also translating between these environments, sharing the knowledge ... documentation again. And what happens when you need to reverse the setup because your services are going to be decommissioned ... and those shared requests because you wanted to help this other project out? Documenting (and doing) this isn't only a mundane and costly exercise ... it's so difficult to get right that some call it "art".

In this context cloud environments (public and private; can be within your own data centre), especially with newer developments, start to look very appealing: deployment is configuration-based, it is always the same (means less knowledge required), and documents itself. Your data centre operators must take these on board - or both time and cost will either stop you from deploying, or force business units to other data centre providers who can offer this service.


Documentation is good. Knowledge transfer it good. Though the costs of not documenting effectively are high, both immediate and long-term. The costs for documentation are too often not considered, and thus not factored into the choice of the process of design, testing, development and deployment, or the selection of products and infrastructure.

Architecture drives a lot of these decisions. Even in a guiding role, architecture is responsible for the project and BAU costs (and delays) of documentation, and can't continue only to look at architectural patterns and practices - while delegating the problem to project managers, developers and operations.

  1. That is assuming your API is in the same context as your application; de-coupling the API from the application is not a great idea, as described earlier in "Design-first APIs".