Scalability Essentials

This blog post is based on one key thing I learnt from Chapter 1: Reliable, Scalable, and Maintainable Applications in Martin Kleppmann's Designing Data-Intensive Applications published in 2017 by O'Reilly.

The Scalability Imperative

Scalability in the Context of Data Growth

Scalability means considering questions like, 'if the system grows in a particular way, what are our options for coping with the growth?'

This is one of those words that we throw around, but what actually is scalability? In the summary of chapter 1, Kleppmann says:

As the system grows (in data volume, traffic volume, or complexity), there should be reasonable ways of dealing with that growth.

That's what we're trying to do when we design a system for scalability.

What's complexity doing in that list? Isn't that a concern for maintainability (covered later in the chapter) instead of scalability? When I first read this, I could see how scalability could lead to complexity, but not how an increase in complexity would lead to a need to scale a system.

Eventually it clicked, we need to scale any time load is increased. And certain kinds of complexity will increase load. If, for example, we add features that require more complex relationships in our data, or we need to query that data more heavily, and so on.

Kleppmann goes on to give a simpler definition:

Scalability is the term we use to describe a system's ability to cope with increased load.

Scalability is not just the ability for a system to cope with increased load – we wouldn't say that a system had scaled if we did nothing but we were able to handle more load. Scalability involves making some sort of change to allow the system to continue function in the presence of increased load.

I've landed on the following as a working definition:

Scalability is the ability of a system to handle increased load (as a result of increased data, traffic, or functional complexity) by adding resources.

The Impact of Scalability on User Experience and Business Growth

We call an application data-intensive if data is its primary challenge—the quantity of data, the complexity of data, or the speed at which it is changing—as opposed to compute-intensive, where CPU cycles are the bottleneck.

In our definition of scalability, we said that increased load (the reason we'd be scaling a system) can come from more data, more traffic, or more functional complexity. The first and the third are primarily data-driven load increases. If we're working with a successful data system, we are likely to see growth in these areas which would push us to scale our system.

We've said that scalability is all about how well a system can handle increased load, but what does that mean? This is not explicit in the definition, but I think we can say that a system can only be said to handle increased load well if it can continue to operate performantly (i.e., without impacting the user experience) and where the costs scale at worst linearly.

If the load on our system has grown and either we slow down or our costs spiral out of control, our users will have a terrible experience and the business won't be able to sustain the growth over the medium-to-long term.

X (Twitter)'s Scalability Saga

The Real-World Challenges of Scaling

Kleppmann uses the X (Twitter) 'celebrity problem' as a practical example of a real-world scalability challenge. This information is based on a talk from a X (Twitter) engineer in 2012.

In brief, the celebrity problem arose due to how X (Twitter) had previously handled scaling in its system. They found that they had many more reads to a user's timeline (between 300k requests/sec) than writes of new tweets (between 4.6k and 12k requests/sec) so they cached the timeline and updated it when a new tweet was created. This is a classic example of data denormalisation – storing multiple copies of the data to speed up reads while sacrificing the performance of writes somewhat.

This was all well and good for most users who had a low number of followers (resulting in at most hundreds of additional writes per tweet) but when there were celebrities who had millions of followers (with the classic example being Elon Musk) the number of writes that would need to be made was much larger. This is called fan-out and is a known issue with denormalisation.

How X (Twitter)'s Hybrid Approach Addresses Fan-Out

Twitter addressed this by implementing a hybrid approach:

For the majority of users who had a reasonably low number of followers, write every tweet to every follower's timelines when the tweets are posted
For a select few celebrity users, don't write their tweets proactively, instead load them when users' timelines are being generated and inset them in to the timeline at the correct point

Let's take an example:

Regular users who post 4.6k tweets/sec and have on average 75 followers each
1,000 celebrity users who post, let's say, on average 20 new tweets per day and have on average 10 million followers each
For the first group of users using the fan-out approach, this would be about 345k writes/sec
For the second group, using the fan-out approach, this would be about 2.3M writes/sec
Using the hybrid approach, we can make many fewer writes per second

Measuring Scalability Success

The Role of Load Parameters in Scalability

How can we add computing resources to handle the additional load?

This is the question that we are asking when we ask ourselves, 'How does this system scale?' Sometimes the answer will be 'not well', and we'll need to adopt a different architecture so that we can handle a new level of growth. In order to have good visibility into how well our system is scaling, we need to understand what Kleppmann calls 'load parameters'.

Load can be described with a few numbers which we call load parameters. (emphasis original)

What are the best load parameters for a system? It's one of those it depends questions (isn't everything?). It depends on what your system is doing and what kind of load you're attempting to measure.

According to Kleppmann, it could be:

requests per second to a web server
the ratio of reads to writes in a database
the number of simultaneously active users in a chat room
the hit rate on a cache

For every system, you'll need to understand what are the load parameters you need to measure and pay attention to. For a chat system like WhatsApp, it might be the number of messages sent and the number of messages received (which won't be a one for one in the case of multiple devices and group chats – similar in some ways to the problems Twitter had to face).

Throughput and Response Time as Performance Indicators

In the previous section, I've included several of Kleppman's examples for load parameters that could be used by different systems. He summarises that for batch processing systems, we usually care about throughput while for online systems, we usually care about response time.

Conclusion

I hope you've enjoyed this brief look at one of the key takeaways I had from the first chapter of Martin Kleppmann's Designing Data-Intensive Applications book. If you have, I'd encourage you to buy a copy and have a read for yourself, there's a lot more in there than what I've covered here.