Buckets of Fun

Service sharding can be a difficult topic to wrap your head around, especially if it's not something you've been living and breathing for the last three years of your life.

The good news is that I have a handy dandy analogy for just this occasion.

Before we get to that though, you should know that this entire blog post is going to be a stream of consciousness inspired by a discarded thought from last weeks post, so if you had any expectations, you should lower them.

Hopefully it will at least be entertaining.

Anyway, let's recap some concepts.

Service sharding is an architectural approach that allows you to horizontally scale your multi-tenanted software system, while also increasing its resiliency in the face of bad deployments, failures, or single users who are putting more load on the system than is expected.

Once your service has multiple shards, distributed into whatever topology that is appropriate for your use case, you have to deal with a new set of problems, like knowing whether or not you need more shards and making good decisions about which shard a user should be located in.

The process through which you deal with that particular set of problems is called capacity management.

Well, at least that's what we call it in within Atlassian.

With that boring old technical explanation out of the way, it's time to get to the good stuff.

The analogy.

Think of a shard as a bucket. A bucket is a familiar concept that is easy to understand. It can hold a certain amount of stuff, at which point it can't hold anything else without overflowing and causing issues.

Think of users as water. Water is a common thing that buckets contain and whenever you look at a bucket that is partially full of water you can get a bit of a sense of how much more water you can fit in.

That last bit is important, because when you want to make a decision about whether or not you can put more water into the bucket, you look at where the water is currently up to and compare that against how much water you think you need to deal with.

Also, whenever the bucket that is currently being filled gets full, or gets close to being full, you prepare a new bucket, maybe getting someone else to grab one from the storage room and situate it right beside the existing one, ready for use.

Easy, right?

Just like capacity management.

Except, like all analogies, it is an approximation of reality. Good for illustrating a high-level concept, but with a strong tendency to fall apart entirely when you give it a good poke.

A shard is not really anything like a bucket.

Buckets tend to deform only slightly as you add water to them. As you add users to a shard, the shard may actually change its size and shape to respond to the load the users are causing. That means that the capacity of the shard, the amount of users it can support, is not a static measure, it varies.

The behaviour of a bucket doesn't really change as you add water to it. It's not like all of the existing water suddenly turns red when you fill the bucket up to the 80% mark, but adding additional users to a shard can actually change the experience for the users already on the shard. It's not quite non-deterministic, but it can be very hard to reason about in any sort of understandable way.

Users are not really anything like water.

Water is consistent. It is made up of molecules that are all basically the same size, making it a fluid. It fills a volume without any issues. Users are all different; some are big and strange and pointy, while others are small and normal and smooth. When you decide which shard to put a user in, you need to know how big or small the user is, because it changes the decision you make. Put a big user in a shard that is already struggling and you're going to give everyone a bad time.

Water also doesn't tend to change its size arbitrarily over time. You don't fill a bucket with water, move on to the next bucket and look back and the old bucket is now overflowing. That would be weird. Users change over time. A decision that you made a month ago might not be valid today, because the users are different now, or maybe the software is different and that has changed the impact that the users create when they use it.

Honestly, I could keep going about the many ways that the buckets and water analogy isn't accurate.

But I won't, because, the original analogy still has value.

Just because it falls apart when you poke it doesn't mean that you can't use it to get a foot in the door of understanding.

Just remember, that once you're through that door, shards are not buckets and users are not water.

Not really.