Thu, 20 Aug 2009

How and Why to Pay Your (Technical) Debts

Disclaimer: While I relate stories about my work experiences, my views are my own and do not necessarily reflect those of my employer.

Recently, my company made a decision to shift from a deliberative, sequential and often secretive (work is done without visibility to the stakeholders until the stuff is ready to go out the door) software development process (called "waterfall") to a iterative, adaptive (agile), and transparent process called Scrum where business owners are a key part of the team and participate in story (work) definition and acceptance.

Technical Debt
Scrum describes a concept called "technical debt". Think of "technical debt" as the remnant work not done because it's easy enough to define the finished working product loosely enough that just putting the stamp of approval counts as "good enough".

In other words, technical debt is stuff that you're going to have to contend with eventually, but expediency, procrastination or deadlines result in putting off dealing with it.

If your house is a mess, when your friend calls you on the phone saying "hey, we're going out, wanna come?" you either say "no, I really need to do some chores around the house" or you go out and don't have a good time because all the while you're thinking of the mess you left behind at home, or you go out, but realize that when the weekend comes, you've got a huge mess on your hands.

Essentially, your ability to quickly respond to the needs of the circumstances (a friend wants you to drop everything and come with him) and your ability to respond appropriately (nobody wants to hang out with someone stressed out about the house chores they have to do) without suffering undue difficulty (the weekend is shot if you don't do the dishes and cleaning as the needs arise) is a function of how far you let things slide in the normal course of things.

So in this way, Agile encourages the conversations between engineering and the business to negotiate what work should be handled now, and what can be put off, by expressing things as relative priorities. For example, if your house is on fire, you don't tell the firemen "I'll be right out, I need to finish vacuuming". At the same time you don't install hardwood floors (customer feature request) over rotting wood (infrastructure or foundation) either.

Sins of the Past
Organizations that have historically done "waterfall" work also tend to lack having clear "definitions of done" (another Scrum term that suggests teams establish standards that must be met to assert that the product is complete and ready to ship), so products are shipped with quality varying not whether the product has evolved to a level of appropriate maturity, but rather on how well the team was able to predict exactly how long it would take to build a quality shippable product. That is, poor estimation (an endemic problem) equals poor quality. So, definitions of done tend to nip this in the bud by describing the properties of a completed product as the standard by which something is declared "ready to go" rather than some deadline that comes whizzing by.

This also means that teams moving to Scrum from Waterfall have to contend with a lot of historical debt that was incurred previously. The code they are working on may be bug-ridden (or difficult to prove is not bug-ridden due to lack of automation or repeatable functional tests), poorly documented and hard to extend and maintain. (By contrast, definitions of done in Scrum tend to aim to make well documented, well tested, using industry best practices to identify and address problems early and often)

But Engineering teams tend to want to build the software that way to start, yet typically have their their hands tied because the business expected a shippable product at the deadline and don't notice or care that the software lacks automation testing (throw more QA people at it) or proper documentation, for example.

The Scrum model says "ah, but they should care, and the rationale justifying the work should be conveyed to them." For example, convince them that poorly documented code is harder to fix later. For these reasons, Agile processes assert that teams can increase their velocity as they pay off their technical debts and become unencumbered by the sins of the past.

Healthy Skepticism
So Scrum was introduced in my organization and I was initially skeptical. The "sins of the past" seemed like they were largely incurred by an overexuberant business who wanted the product ASAP, and didn't know or care about the relationship between available time, definitions of done, and product quality.

The "transparency" aspect of Scrum suggests that you "lay down your cards", meaning that if engineering management wants the teams to spend some time backfilling automation tests of individual units of code (repeatable assertions that the code is doing what you expect such that when you change some behavior of the code, everything else continues to work as you expect), you write a story for that work, express the benefit to the organization, and let the business prioritize that story amongst all of their stories.

"Surely, this can't work", I assured myself. "An organization who expressed priorities as 'I want it ASAP' couldn't shift culturally fast enough to say 'sure, take the time to do it right'. I must 'hold back' a certain amount of my teams' time to ensure we do things the right way."

Our Scrum coaches argued against me, and I wouldn't budge... at first.

Articulation of Value
I began to come around when the business showed a willingness to accommodate technical stories. I knew, deep down, that the reason why I insisted on "doing it right" was because there was, in fact, a benefit to the business to doing it that way, I just hadn't spent the time to articulate it in a way that allowed prioritization against other stories in the queue.

As engineering was able to explain the "why this ought to be done", the business began to concede that "yes, we'd like to pay off a little of that debt to gain a little bit of velocity".

To help my teams in conveying the value of technical stories into business value, I prepared a "so that" wiki article. You see, each user story (a unit of work where the desired outcome is described in a form like "as a business owner, I want to add blinkers to my car so that drivers behind and ahead of me can know that I intend to turn right or left") has an optional "so that" clause that expresses the value of the work to be done. Often times, the "I want" part of the phrase is easy to write, but the benefit (the "so that") is much harder to articulate.

Transparency, Incentives and Planning
One of the patterns in Scrum involves maintaining an up-to-date "wall" of user stories and their progress for transparency. "Burn down charts" depict how much work is remaining in the 1-3 week window of a "sprint". And given daily reminders of their progress and "velocity" of getting work done against the goal, the team begins to take pride in their numbers, and strive to resolve issues and increase the number of points they can complete in a sprint.

Obviously, if I "held back" backfilling automation testing or documentation, for example, this would be work the teams would be mandated to do, but not given credit in their velocity calculations. And, given that they are tracked for the other stories in the sprint, the technical debt stories not tracked through scrum would mean my tech stories would be constantly pushed out or delayed.

In other words, this lack of transparency would also dis-incentivize employees to do my technical debt work.

Sure, there was an opportunity to create my own "backlog", try to run it as well as the business backlog, but the time commitment both from me and the team would have been non-trivial.

It also helped to have a backlog of technical stories "ready to go" in case a last minute business priority decision or some dependency (such as approvals for creatives, for example) delay meant lost productivity on the team. These stories would be added to the sprint, meaning that the estimated "points" to complete them would be counted in the team's velocity.

Finally, we added objectives to the team that partially grade teams on their velocity. Thus, any story that was "planned work", would count for velocity, where unplanned work wouldn't. So I was reminded of the adage, "your failure to plan does not constitute an emergency on my part”.

Overcoming Challenges
That's not to say that this was entirely panacea. Crucial conversations with team members, business owners, and stakeholders in both IT and the business had to be addressed. And I had to accept some measure of compromise when the business needed functionality done ASAP, in exchange for greater capacity in future sprints. We had to ensure that we had a process for handling high urgency unplanned work (P1/P2 bugs for example), as these could quickly derail sprints in progress.

So, in summary, if you can’t trust the business to prioritize your technical debt stories, then you need to become a better story teller, or your business has more significant problems because a key stakeholder’s properly articulated concerns are not being heard. I assert that “if it’s worth doing, it’s worth covering in either your Definition of Done, in your Acceptance Criteria, or prioritized in the Product Backlog.”

Scrum promises not only to speed up the course of software lifecycle management, it promises to increase the capacity and problem-solving abilities of your teams, and, via transparency, bring the technology and business units into greater alignment than can ordinarily be seen in a non-Scrum environment.

Khan Klatt

Khan Klatt's photo