5 min read

The Authority

The Authority
An eclectic bunch of bastards. All credit to Wildstorm

The value and benefit that a software system provides to its users is heavily dependent on the decisions that it makes.

It might be weird to think of a piece of software making decisions, but that's really what a lot of algorithms come down to in the end; a programmatic way to arrive at a conclusion of some sort given a set of input parameters.

Of course, if you're not always making consistent decisions, then things can get a bit dicey.

We're Here To Give You A Second Chance

Sometimes software system can seem to make inconsistent decisions.

For example, if you do not correctly isolate the decision-making engine, the same set of inputs may generate different outputs depending on unexpected factors, like the state of the system.

Of course, the engine is still technically making consistent decisions, it's just that the set of input parameters isn't limited to what was supplied, as some of those parameters are being drawn from the state of the system.

That example isn't real inconsistency though.

Real inconsistency comes from having different algorithms make the same sort of decision at different times.

It's easy to get into this situation too, especially as a software system grows and changes over time. Even a small initial divergence around the making of a specific decision can fragment further and further as more and more engineers contribute.

The effects can be devastating.

For example, if an issue is discovered, fixing that issue can become incredibly challenging and time consuming. You might not even know that there are multiple places where the decision is being made, so you may fix one issue only to see a remarkably similar one crop up at some point in the future.

Like playing a game of whack-a-mole.

Even if you do know that there are multiple decision making engines in play, you might not be able to synchronize the fixes properly, perhaps because the different engines belong to different teams who are working through their own priorities.

Extending or otherwise improving the decisions being made by the software system is similarly challenging, for the same reasons.

But all is not lost.

To Engineer A System Worth Extending

There are two main parts to fixing inconsistent decisions in a software system.

The first part is to clearly identify the decisions being made.

That may sound obvious, but the reality is that it is remarkably easy to have a bunch of disparate components in a software system making very similar decisions. Doing some analysis, reasoning about the decisions being made, connecting the dots and drawing everything together is a necessary step before moving forward with actually making things better.

If you can't reason about the decisions being made, if there isn't a common language that you can use to communicate with the engineers involved, you don't stand a chance at fixing the problem.

Speaking of fixing, the second part of the solution is to centralise the decision making.

I'm not saying that all decisions should be centralised into a single engine. That would be madness. What I mean is that for a specific decision, there should be a single source of truth that is well understood and well known across the entire cohort of engineers that are working on the system.

In reality, the root cause of the inconsistent decisions problem is duplication of logic.

The moment you duplicate the same conceptual logic in more than one place, things start to diverge, and divergence leads to differing behaviour, which leads to inconsistent decisions.

Just ensuring that those decisions are all flowing through a single, centralised engine is likely to result in resolving the problem entirely.

Simple, right?

We Are The Authority

Like anything involving software engineering, it's obviously not that simple.

The first problem you'll likely run into is justifying the investment required to the business.

That's not to say that it's difficult to understand the cost/benefit trade-offs involved, it's more that they are not really a compelling story to people who are focused on maximising either customer or shareholder value.

Centralising the making of a specific decision is basically a refactor after all, and refactors are intended to improve the underlying quality of the system without actually impacting the functionality.

Of course, the moment there is a problem with the decision making, something concrete that has customer impact, and the fix for that problem explodes in both complexity and cost because of the lack of a centralised decision-making engine, the business will get real interested, real fast.

So use that, if it happens.

If not, clearly identify the risks involved in inconsistent decision making, framing them in a lens that the business understands and make the case appropriately. If you're part of a good company they will listen and if they don't, well, at least you tried.

The second problem is more on the delivery side and is agreeing on where the source of truth should be.

If your system is simple, this is barely a blip, because you can probably make some sort of unilateral decision and then just push forward with it.

Once the system gets larger and more complex, with more people, more teams and more politics, it can get a bit more difficult.

Really the only thing you can do here is find the most appropriate place and make a case for it, perhaps even accepting that maybe someone else should own the decision-making engine, even if you have a vested interest in its existence.

The last problem that I've run into is understanding the ramifications of centralisation.

Putting all of your eggs one basket is definitely good from the perspective of making it easier to maintain and extend the decision-making engine, but there are downsides as well.

For example, if all decisions of a certain type have to go through a certain component, be it a service or a module or whatever, then you have just introduced a single point of failure to whatever capabilities rely on those decisions. That may be an unacceptable trade-off for your system, which means you'll need to do some extra engineering to plug the gap.

Like everything in software engineering, there are always trade-offs. Just make sure you're making them consciously.

Behave

The idea of consolidating decisions into a single, easy to reason about place is something that is close to my heart.

Within my department, we have three separate systems that are responsible for making placement decisions during the provisioning and migration of various internal entities.

That's like...two more than I would like.

We've known about the problem for quite a while, but it has been difficult to prioritise the consolidation when compared against other business priorities.

The good news is that it's become more of a focus recently, and we've started to make some meaningful headway into consolidating all of the decisions into a single place.

It's not because of my amazing persuasive skills though.

Well, not entirely.

Sure, we need to consolidate prior to building out brand new generic functionality to offer placement to more internal services within Atlassian, which is a long-term initiative that I'm running.

But really, the most convincing argument was when we suffered a series of incidents relating to placement during provisioning and a much larger group of people suddenly became aware of the split-brain situation and how much effort it was going to take to make things consistent.

There's nothing quite like a big old fire to convince people to fix something.