Chapter 5 Reactive Architecture

5.1 Motivation

In the last 10-15 years, the technical challenges of software have changed quite dramatically. Where 10-15 years ago, large systems comprised of a few nodes with the max in up to tens of nodes, nowadays large systems go up into the 100s and 1000s of nodes. Also, 10-15 years ago, a system could be down for maintenance for quite a while and it was no big deal. Such behaviour is not acceptable anymore today, and users (and other systems) expect an uptime of up to 100%. Also the nature of data processing has changed: 10-15 years data was at rest, which means it was stored and then consumed later in a batch process. Nowadays data is constantly processed and changing as it is being produced.

This dramatic shift occurred due to an interplay between shifting user experience and ever increasing bandwidth and computing power, which was able to satisfy these expectations. People increasingly became dependent on critical online services for their work, for example various Google applications, GitHub,… It is simply unacceptable that such a service is down for even a few hours. Also nowadays users expect an immediate response from services - it is simply not acceptable to wait for a response for a few seconds, because users expect a reaction from the service immediately, better within the very second. So there is a tremendous expectation for responsiveness.

However, increased bandwidth and computing power alone is not enough to deliver services which are available 24/7 with immediate responsiveness - what is required is a proper software architecture, which exploits these technological advances to deliver the expected user experience. In the last years so called Reactive Architecture has turned out to be a very viable and powerful architecture to deliver these very requirements. Although Reactive Architecture deals with technical challenges, in the end we can say that the primary goal of reactive architecture is to provide an experience that is responsive under all conditions.

Unresponsive services can have huge consequences:

  • Due to a security breach Sony decided to take their Playstation Network down for 23 days.
  • In 2015 HSBC (British Bank) had an outtage of their electronical payment system, which had the effect that people didn’t get paid before a holiday weekend.
  • In 2015, Bloomberg, a company active in High Frequency Trading, experienced software and hardware failures, which prevented critical trading for 2 hours.

5.2 Goals

The following goals are put into place to solve the problem of building responsive software. The question is how to build software that

  • Scales from 10 to 10.000.000 users, which happens in start-up scenarios. If the software is not architected for that, it will collapse.
  • Consume only the resources necessary to support the current load. Although the system could handle 10 million users, we don’t want to consume the resources required to handle 10 million users when curently only 1000 users are accessing the system.
  • Handles failures with little to no effect on the user. Ideally, there is no effect, however this is not always possible but the effect should be as small as possible.
  • Scalability and failure handling / tolerance is achieved by distributing the software across multiple machines. So the software must be able to be distributed across 10s, 100s or even 1000s of machines.
  • When scaling across a large number of machines, it is important to maintain a consistent level of quality and responsiveness despite the complexity of the software. Therefore, even if the software is distributed across 10s or 100s of machines, the responsiveness must not increase 10 or 100 fold and should stay roughly the same. Variations are ok, however the experience should be roughly the same, no matter how many users are currently accessing the system.

If all these things can be addressed by the software system, which is able to maintain a consistent level of quality and responsiveness despite all of these things, then the system can be said to be a reactive system.

5.3 Reactive Principles

Created in response to companies trying to cope with changes in the software landscape. Multiple groups independently developed similar patterns for solving similar problems. Aspects of reactive systems were initially recognised by these groups, with the Reactive Manifesto bringing together these common ideas into a single set of principles. It was authored by Jonas Boner, Dave Farley, Roland Kuhn and Martin Thompson, by looking at what was happening in the industry and reconciling it into a common set of principles. Inherent to reactive system are four basic principles, see Figure 5.1.

Reactive Principles.

Figure 5.1: Reactive Principles.

  1. Responsive A reactive system consistently responds in a timely fashion. It is the most important principle and all the other principles are there to ultimately manifest this principle.
  2. Resilient A reactive system remains responsive, even if failures occur.
  3. Elastic A reactive system remains responsive, despite changes to system load. In older versions of the manifesto it was called Scalability but was subsequently renamed to also emphasise the need for a system to scale dowen after a spike in system load.
  4. Message Driven A reactive system is built on a foundation of asynchronous, non-blocking messages. In older versions of the manifesto it was called Event-Driven but was subsequently renamed to avoid confusion with certain connotations of the term.

An example for a responsive software is Git: - Message Driven through various asynchronous messages such as commit, checkout, diff,… - Resilient and elastic due to working copy on the local machines, which then merge with each other, causing isolation from each other.

5.3.1 Responsive

Responsiveness is the cornerstone of usability, which is the absolute corner stone. If a system is already responsiveness, then one is done, however large scale systems of today rely on the other principles to enable responsiveness - it is basically not possible to provide a responsive user experience without resilience, elasticity and a system that is message driven.

Ultimately it means that systems must respond in a fast and consistent fashion if at all possible. There may be some cases where it’s not possible if all of the hardware, it literally crashes, then there’s not a lot one can do. The goal is to make it fast and responsive whenever possible and as often as possible.

Such responsive systems build user confidence. On the other hand, unresponsive systems cause users to walk away and look for alternatives, resulting in loss of business opportunity.

5.3.2 Resilient

Resilience provides responsiveness despite any failures. This is achieved through a number of techniques including replication (there are multiple copies of services running), isolation (services can function on their own), containment (failure does not propagate to other services) and delegation (recovery is handled by an external component).

The key here is that any failures are isolated into a single component. They don’t propagate and bring down the whole system. That’s very important because if we have a failure that brings down the whole system, obviously that’s a problem and that’s a problem that occurs quite frequently in monolithic applications.

5.3.3 Elastic

Elasticity provides responsiveness despite increases or decreases in load. It implies zero contention and no central bottlenecks. It is not possible to absolutely achieve this but the goal is to get as close as possible.

Predictive auto caling techniques can be used to support elasticity.

Scaling up provides responsiveness during peak, while scaling down improves cost effectiveness.

5.3.4 Message Driven

Enables all the other principles. They provide loose coupling, isolation and location transparency.

5.4 Reactive Systems and Reactive Programming

When you’re investigating Reactive Systems or Reactive Architecture, you will come across another term fairly frequently, which is Reactive Programming. It’s important to understand the difference between Reactive Systems or Reactive Architecture and Reactive Programming, because they are different, see Figure 5.2.

Reactive Programming vs. Reactive Systems.

Figure 5.2: Reactive Programming vs. Reactive Systems.

  • Reactive systems apply the reactive principles on an architectural level. Reactive systems are built using the principles from the reactive manifesto. In such systems all major architectural components interact in a reactive way, which are separated along asynchronous boundaries.

  • Reactive programming can be used to support building reactive systems - but are not a necessity for building reactive systems! Also, just because reactive programming is used, it does not mean you have a reactive system. It supports to break up the system into small discrete steps which are then executed in an asynchrounous non-blocking fashion such as Futures/Promises, Streams, RxJava.

In this course we introduced the Actor Model as a means to reactive programming. There are also other reactive programming techniques such as Streams and RxJava, which go beyond the scope of this course and are therefore not covered.

The actor model provides facilities to support all reactive principles. It is message driven by default. The location transparency14 is there to support elasticity and resilience through distribution. The elasticity and resilience then provide responsiveness under a wide variety of circumstances.

Note that it is still possible to write a system with the Actor Model and not be reactive. But with the Actor Model, and the tools that are based on it, it is easier to write a reactive system. What this boils down to is you can build reactive systems without actors or without the Actor Model but if you start with a very strong foundation like the Actor Model it’s going to make a lot of things easier later on.

5.5 Reactive Architecture and Domain-Driven Design

An important milestone in software engineering in the recent decade was the development and adoption of Domain-Driven Design (DDD). DDD refers to an architectural approach that is used to design large systems. The guidelines or the rules that are set out by domain driven design are actually highly compatible with those of reactive architecture. As it turns out many of the goals of the two are very similar, they’re very compatible, and so as a result you’ll often see them used together.

DDD originates from a book called Domain-driven design: tackling complexity in the heart of software that was written by Eric Evans [10]. One of the key goals in the book is to try to create a software implementation that is based on an evolving model that is understood by the domain experts. So it’s largely about creating a communication channel you could say between the domain experts and the software developers.

One of the key goals within DDD is to take a large system, or a large domain, and to break it into smaller and smaller pieces. This gives us a way to determine boundaries essentially between those smaller pieces within the larger domain.

Reactive microservices specifically, have a similar goal: they need to be separated along clear boundaries. In the case of reactive microservices those boundaries need to be asynchronous: each microservice has to have a clearly defined API and a specific set of responsibilities. DDD can help design these very boundaries and responsibilities, as it give us a set of guidelines and a set of techniques that we can use to try to help us break larger domains into smaller domains.

The key point here is that DDD can be used in the absence of a reactive microservice or a reactive architecture. And you can build reactive architectures without domain driven design but because the two are very compatible - and they are, they do have similar goals - you’ll often find them use together.

5.5.1 Domain

A domain is going to be a sphere of knowledge. In the context of software it is referring to a business or idea that we are trying to model. Typically when we build software there is some sort of concept or some sort of business that we’re trying to model, and that is our domain. Experts in the domain are therefore people who understand that business or that idea not necessarily the software.

The goal of domain driven design then is to build a model that the domain experts can understand.

It is important to understand that the model is not the software. The model is not the software. The model represents our understanding of the domain. Our knowledge and understanding of the domain is the model. The software is an implementation of the model.

In order to do this we need a language that both the domain experts and the software developers can understand and can communicate with. This language is called Ubiquitous Language in DDD. Terminology in the ubiquitous language comes from the domain experts. In DDD we want to try to avoid taking software terms and introducing them into the ubiquitous language and forcing our domain experts to use them.

Large domains may be subdivided into subdomains, with subdomains grouping related ideas, actions and rules. Some parts concepts may exist in multiple subdomains. However, these concepts, although they exist in multiple subdomains, might not be the same and vary and evolve differently. Therefore, it is important to avoid the temptation to abstract these concepts across all subdomains. For example a customer in the domain of a restaurant business might have a different meaning in the subdomain of reservation than in the subdomain of delivery, e.g. in reservation you might need the telephone number but not the address, but in delivery you rely much more on the address than on the telephone number.

Therefore each subdomain ends up with its own ubiquitous language and model. This language and model of a subdomain is called Bounded Context in DDD. Bounded Context

Subdomains and bounded contexts are good starting points for building Reactive Microservices. Therefore, when initially defining microservices, it’s often a good idea to start with your bounded contexts: one bounded context = one microservice.

How to determine Bounded Contexts follows no universal answers but there are some guidelines: - Consider human culture and interaction. Different areas of the domain are handled by different groups of people, suggesting a natural division. - Look for changes in Ubiquitous Language. If the use of language or the meaning of that language changes, that may suggest a new context. - Look for varying or unnecessary information. Employee id is very important in an employee but meaningless in a customer. - Strongly separated bounded contexts will result in smooth workflows. An awkward workflow may signal a misunderstanding of the domain. If a bounded context has too many dependencies it may be overcomplicated.

Originally, DDD focused on the objects within a domain e.g. a Customer, User, Cook, Car,… However, in recent years the technique evolved into Event First DDD, which places the focus on the activies and events that occur in the domain, e.g. a Customer makes a reservation, Food is served to the customer,… In event first DDD, first activities are defined, which are then grouped to find logical system boundaries. A useful approach is called Event Storming [4], which is beyond the scope of this course.

To visualise which bounded contexts exists and their dependencies, context maps are used, see Figure 5.3.

Context map of bounded contexts.

Figure 5.3: Context map of bounded contexts. Anti-Corruption Layer

Once the bounded context is separated into these clean boundaries those clear boundaries needs to be maintained. Each bounded context may have unique domain concepts and the purity of those bounded contexts needs to be preserved. Such concepts are not always compatible from one context to the other and therefore to preserve purity, there has to happen a translation between concepts of the same name between different bounded contexts. This is done using a so-called Anti-Corruption Layer. Such an anti-corruption layer will then prevent bounded contexts from leaking into each other and help bounded contexts to stand alone.

For example when considering a customer representation from a customer context such as customer management, which is used in a reservation service, an anti-corruption layer translates the customer into a representation that is unique to the reservation service, for example by stiping out unnecessary information such as address.

A more technical benefit of an anti-corruption layer would be to have a caching layer inside of that anti-corruption layer. If the customer context disappears for some reason, the anti-corruption layer still may have a cache of that information and still may be able to operate despite the fact that the customer context is gone now.

An anti-corruption layer is commonly implemented using abstract interfaces.

An anti-corruption layer also helps interfacing with a legacy system which domain is messy or unclear. Using an anti-corruption layer prevents your own domain from dealing with the mess of the legacy system. However, now there are two anti-corruption layers: both in the host system and the legacy system.

5.5.2 Domain Activities

As already noted before, in a domain many activities are happening. Commands

One of the types of activities that we deal with in domain driven design is called a command. A command is a type of activity that occurs in the domain. It represents a request to perform an action. As the action has not yet happened it can be rejected.

Commands are typically delivered to a specific destination. They usually have a specific recipient in mind whether that’s a specific micro service or something else. When the command is received it will cause a change to the state of the domain. After the command has been completed, the domain won’t be in the same state that it was prior to issuing that command.

Examples of commands would be to Add an Item to an Order, Pay a Bill, Prepare a Meal,… Again, it is important to understand that this is a request - e.g. the meal has not been served yet and the receiver of the command might reject the request because it is too busy or has ran out of resources to prepare the meal. Events

In addition to commands, we also have events. An event represents an action that has happened in the past. The action has already completed and can therefore not be rejected. An event is often broadcast to many destinations. Whereas a command causes a change in the system, an event records the a change in the system - often as the result of a command.

Examples of events would be to An Item was added to the Order, The Bill was paid, The Meal was served,… Queries

Queries represent a request for information about the domain. As you’d expect with a query, you always expect a response. With a command or an event, that’s not necessarily the case. You’re usually going to like have a specific microservice, for example, saying please give me information about this domain entity. The other thing is that queries should not alter the state of the domains. When we do a query, if we do that query multiple times, we should always get the same response assuming nothing else has changed.

Examples of queries are Get the details of an Order, Check if a Bill has been paid,… Reactive Systems

In a reactive system, the commands, events and queries represent the messages that we talked about when we talked about it being a message-driven approach. These messages basically form the API of a bounded context or a microservice. As a result, because we’re aiming for this asynchronous message driven approach, these are often issued in an asynchronous fashion. When you say make reservation and provide it with the details, you don’t immediately get back a response saying that reservation is now completed what you get back is maybe an acknowledgement saying yes I got your request and I will process it later - the point is that we do not have to wait for a response right away.

5.5.3 Domain Objects

In DDD the domain objects are classified into three types of objects. Value Object

A Value Object (VO) is defined by its attributes. Two VOs are equivalent if their attributes are the same. VOs are immutable. In addition to immutable state, VOs can contain business logic.

Messages in reactive systems are implemented as Value Objects. Entity

An entity is defined by a unique identity like an ID or a key. An entity is mutable: it may change its attribute, but not its identity. If the identity changes, it is a new entity, regardless of its attributes. An example is a customer with a uniquely identifying name.

Entities are the single source of truth for a particular id and can also contain business logic.

Actors in Akka are Entities. Aggregate

An Aggregate is a collection of domain objects bound to a root Entity. The root Entity is called the Aggregate Root. Objects in an aggregate can be treated as a single unit. Access to objects in the aggregate always have to go through the aggregate root. Transactions should not spawn multiple aggregate roots to avoid contention for locks.

Aggregates are good candidates for distribution in Reactive Systems.

Choosing an aggregate root is not always straighforward and can be different from context to context. It is common that a context has one aggregate root, however some contexts may require multiple aggregate roots. Some questions to choosing an aggregate root are:

  • Is the entity involved in most operations in that bounded context?
  • If you delete the entity, does it require you to delete other entities?
  • Will a single transaction span multiple entities?

5.5.4 Domain Abstractions

In addition to the activities and objects that occur inside of our domain, there are certain abstractions that we leverage as well when building using domain driven design. These abstractions can be useful for a number of different reasons. Service

One of the abstractions that we use is what we call a service. We talked about the fact that we have entities and value objects. And that those entities and value objects can contain business logic as well as state. But sometimes you get cases where there’s a particular piece of business logic that doesn’t really fit into an entity or a value object. In that case this can be encapsulated by a service. Services have some special criteria:

  • They are stateless. If it includes state it is either an entity or value object.
  • They are often used to abstract away an anti-corruption layer. For example having a email service, the state is encapsulated in an email object, which is taken by the service, resulting in a sent email. The actual email service implementation might then implement SMTP for sending emails.

Too many services lead to an aneemic domain. Look for a missing domain object before resorting to a service. Factories

When we go to create a new domain object - often entities or aggregate roots - the logic to construct those domain objects may not be trivial. There may be a lot of work involved with creating that domain object. It may have to access external resources.

For example when we create a new Reservation maybe we need to assign an unique identifier to it, maybe that requires us going to the database and looking at a table. So there may be database access we might have to look at, files or REST APIs. There’s all sorts of complexity that may come along with building that new object. A factory allows us to abstract away that logic. It’s again usually implemented as a domain interface with one or more concrete implementations. Repositories

Repositories are similar to factories but instead of abstracting away creation they abstract away the retrieving of existing objects. Factories are used to get new objects. Repositories are used to get or modify existing objects. If we use the CRUD terminology (Create, Read, Update and Delete) then repositories are the RUD. They are the Read, the Update and the Delete. They often operate again as abstraction layers over top of databases but they can also work with files, REST API, etc.

Factories and Repositories are related. Factories correspond to the C in CRUD, Respositories to RUD in CRUD. Therefore, they are often combined.

5.5.5 Hexagonal Architecture

A particular technique that is often combined with domain-driven design is called hexagonal architecture. Hexagonal architecture is not directly related to domain driven design. You can use domain driven design without using hexagonal architecture however it is very compatible with domain driven design.

What is hexagonal architecture? It’s also known as ports and adapters. It was the idea of ports and adapters and the hexagonal architecture was proposed by a man named Alistair Cockburn. Basically it’s an alternative to the layered or n-tier architecture, see Figure 5.4.

Hexagonal Architecture.

Figure 5.4: Hexagonal Architecture.

Hexagonal architecture takes a sort of a different approach than n-tier architecture The idea here is that instead of the domain being in the middle of kind of that sandwich of the n-tiered architecture, the domain is at the core of something that it more closely resembles in this case a hexagon. The the idea here is that the domain is core. It is absolutely the most important thing in your application therefore it should be at the center.

What you have then is that you have ports that are exposed by the domain. These ports act as essentially an API. They are per a preferred way of communicating with the domain. And those adapters communicate with that those ports, they communicate through that API. There’s different sides to the hexagon. On one side we have something like the sort of the user side API and on the other side we have the data side API. See Figure 5.5 for an alternative view on hexagonal architecture.

Hexagonal Architecture as an Onion.

Figure 5.5: Hexagonal Architecture as an Onion.

One of the keys here is the dependencies. The outer layers are allowed to depend on the inner layers but the reverse is not true. The infrastructure can depend on the API the API can depend on the domain, but the domain has no knowledge that the API or the infrastructure exists.

What it does is it means that hexagonal architecture ensures a proper separation of infrastructure from domain. You can guarantee that the domain has no knowledge of the infrastructure. It doesn’t know about the database. It doesn’t know about the user interface.

This allows to swap to a different database implementation. The other thing that it does is it allows you to make changes in the domain without affecting the API or the infrastructure potentially. Because you have this clean separation it means that you can make these kinds of substitutions and external clients of your application don’t have to know about them. They don’t know that anything has changed because you have this nice level of isolation. Systems designed around hexagonal architecture can therefore be very flexible.

5.6 Microservice Architecture

In the early days of software development, a single developer, or a small group, would build an application to support a small number of users. In this environment building a monolithic style application made sense. It was the simplest thing we could do. However, over the years, software complexity has grown. Today, multiple teams are working with many different technologies, supporting millions of users. In this environment, the monolithic application becomes a burden, rather than a benefit.

In the modern world of software development, we need to find techniques that allow us to isolate the inherent complexity, not just of our software, but of our development processes as well. This is where microservices enter the picture.

However, what exactly and how big a microservice is is not clear. Also microservices and monotliths exist on a spectrum with monoliths on one end and microservices on the other, with actual applications somewhere in between. It is important to accept that a Microservices vs. Monolith mindset does not make sense as both microservices and monoliths have advantages and disadvantages. It’s a big part of a software architects job to look at the differences between them and look at the benefits or disadvantages to each and try to balance them. In doing so the architect places themselves somewhere on that spectrum and therefore, a single system can have some characteristics of a Monolith and other characteristics of Microservices. In order to do that though we need to really understand the difference between microservices and monoliths.

5.6.1 Monoliths

A worst-case scenario of a monolith is the so-called Big Ball of Mud, see Figure 5.6. Many monoliths don’t look like this - in the Software Engineering Project you learned how to architect a clean and well designed monolith.

Big Ball of Mud.

Figure 5.6: Big Ball of Mud.

In the big ball of mud there is no clear isolation in the application, where everything depends on everything else. Due to this complex dependencies, and lack of isolation, the application is very difficult to understand and even harder to modify and maintain. Such architectural styles produce working solutions very quickly but then stagnate and changes become exponentially expensive.

To clean up such a big ball of mud, isolation is introduced, which means that the application is divided along clear domain boundaries and libraries are used to help isolate pieces of code, for example ORM, DB access, see Figure 5.7.

Cleaning a Big Ball of Mud.

Figure 5.7: Cleaning a Big Ball of Mud.

The key characteristics of a monolith are:

  • It is deployed as a single unit. The monolith could be deployed multiple times, but always comes as a single unit.
  • Single shared database, usual relational.
  • Communication with synchronous message calls. This is a request-response scenario, where the caller waits for the callee.
  • Deep coupling between libraries and components, often through the database, where all rely on the same data from the database. This creates deep coupling.
  • Big Bang style releases. Releasing and updating the application is all-or-nothing.
  • Teams have to carefully synchronise features and releases (as you have seen in the software engineering project).

Scaling a monolith is straighforward. Multiple independent copies are deployed, which do not communicate directly with each other, so a monolith does not know there are other copies deployed. The “communication” happens implicitly through the database, which provides consistency between deployed instances. See Figures 5.8, 5.9 for a diagram how to scale a monolith.

Scaling a Monolith.

Figure 5.8: Scaling a Monolith.

Scaling a Monolith.

Figure 5.9: Scaling a Monolith.

The advantages of a monolith are:

  • Easy cross-module refactoring. All code is in one project, therefore refactoring with the help of an IDE is quite easy.
  • Easier to maintain consistency due to a single “shared” database. For example you can always rely on that when you write some data and don’t make any other modifications, you will be reading back this same data.
  • Single deployment process. Although there is this big bang release, you are only doing it once.
  • Single thing to monitor.
  • Simple scalability model. Just deploy more copies.

The disadvantages of a monolith are:

  • Limited by the maximum size of a single physical machine. If a monolith grows, its gonna require more CPU, more memory, to a point where it doesn’t fit on any physical machine anymore. Also big physical machines are often much more expensive than small ones.
  • Scaling depends on the database. Although it is possible to scale up by deploying multiple copies of a monolith, often the (relational) database runs on a single machine, creating a bottleneck. Even if a master-slave database setup is used, the limits of the physical machine of the database can be hit as well.
  • Components have to be scaled as a group. Some components might need less resources, however as it is a single unit application it is not possible to differentiate.
  • Deep coupling leads to inflexibility. For example changing the structure of a table becomes hard due to database data dependencies.
  • Development tends to become slow with changes becoming increasingly difficult and build times increasing.
  • Failure in one component of the monolith brings down the whole application. Also when multiple copies are deployed and one fails, redistribution of load can cause cascading failures.

5.6.2 Service Oriented Architecture

When we talked about monoliths we saw that one of the ways that you can improve the monoliths’ situation rather than having a large ball of mud kind of scenario, is that you can introduce an additional isolation. You do that by separating your monolith along clear domain boundaries usually in the form of different libraries for different areas of the domain.

There’s another way that we can introduce additional isolation and that’s using something called Service Oriented Architecture. In a Service Oriented Architecture each of those domain boundaries, each of those libraries, represents what we call a service. The idea with Service Oriented Architecture is that those services don’t share a database. Each service will have its own database, and anybody who wants information about that service has to go through the services API, see Figure 5.10.

Service Oriented Architecture.

Figure 5.10: Service Oriented Architecture.

For example in the Reservation service we might need Customer information. Now if that’s the case we can’t go directly to the Customers database. What we have to do is we have to go through the Customer API.

This creates isolation because now each independent service can have its own database, which could also be of different type than the others (as seen in Figure 5.10). There is also no coupling between different parts of the application and the database. In the excample, the Reservation system isn’t coupled to the Customers database, therefore if the customer database needs to be modified, we are free to do so. Towards Microservices

Some people will say that Service Oriented Architecture is just microservices. However, Service Oriented Architecture, doesn’t talk about deployment, which is a crucial point. So while SOA talks about the fact that services don’t share a database, they have to communicate through their API’s, it doesn’t say how they can be deployed.

As a result some SOAs are built in a monolithic style which deployes all of those services as one application, communicating through some clearly defined API. On the other hand some SOAs choose to go the other route where each service is deployed as an independent application in which case it more closely resembles microservices.

So it’s true that if you are in the latter case, if you are deploying each of your services independently, then there is very little difference between Service Oriented Architecture and microservices. But if you’re on the other side and you’re deploying things in a more monolithic approach, then there actually is a fairly significant difference.

The baseline is that Service Oriented Architecture isn’t necessarily the same as microservices although in some cases it may be. From there we can start talking about well what makes it a micro service.

5.6.3 Microservices

Microservices are a subset of Service Oriented Architecture. The difference between them is Service Oriented Architecture doesn’t dictate any requirements around deployment. You can deploy Service Oriented Architectures as a monolith or you can deploy each of your services independently. Howerver, if you deploy them independently that’s when you start to build microservices.

Therefore, micro services require that each of those services are independently deployed. This means that they can be put on to many different machines and you can have any number of copies of individual services depending on your requirements, see Figure 5.11.


Figure 5.11: Microservices.

We still keep all of the rules around Service Oriented Architecture: maintaining our own datastore, ensuring that our services communicate only through a clearly defined API, all of those things still apply. We’ve just added that extra requirement that we have to deploy the individual services independently. The base characteristics of microservices are:

  • Each services is deployed independently.
  • Multiple independent databases (just as in SOA).
  • Communication is synchronous or asynchronous (Reactive Microservices).
  • Loose coupling between components. There are no database dependencies, possibly asynchronous communication, no sharded code - all leads to loose coupling.
  • Rapid deployment, possibly continuous deployment.
  • Teams release features when they are ready.
  • Teams often organise around a DevOps approach, where the team is responsible for the application from implementation (Dev) to deployment (Ops) => Dev + Ops = DevOps. (instead of handing it over to an OPs team, which maintains it in production). With microservices one team is responsible for everything.

Each microservice is scaled independently and you can have as many microservices as are required. On one machine there could be one or more copies of a services deployed. Therefore a machine hosts a subset of the entire system, see Figures 5.12, 5.13.

Scaling Microservices.

Figure 5.12: Scaling Microservices.

Scaling Microservices.

Figure 5.13: Scaling Microservices.

Advantages of microservice systems are:

  • Individual services can be deployed / scaled individually as needed.
  • Increased availability. Serious failures are isolated to a single machine.
  • Isolation / decoupling provides more flexibility to evolve within a module.
  • Supports multiple languages and platforms.

Microservices do often come with an organizational change:

  • Teams operate more independently.
  • Release cycles are shorter.
  • Cross team coordination becomes necessary.
  • These changes facilitate increase in productivity.

Disadvantages of microservice systems are:

  • May require multiple complex deployment and monitoring approaches. Given multiple platforms, languages, individual services, different databasese… the complexity is substantially higher compared to monoliths. Also the monitoring is more complex.
  • Cross-service refactoring is (much) more challenging and harder. There is no common, single code base and services are not going to be deployed at the same time.
  • Requires to support older API versions. This follows immediately out of deployment, as services cannot be deployed literally at the same time. Also APIs need to be evolved without breaking changes.
  • Organisational change to microservices can be challenging. Responsibility of a Microservice

A fundamental question is how big a microservice should be. A big part of understanding where to draw the lines between your microservices is all about understanding responsibility. A guidline which works very well for microservices is the Single Responsibility Principle. It originated from OOP and says that a class should only have only one reason to change.

Applied to microservices means that a service should have only one responsibility. As a consequence a change to the internals of one microservice should not necessitate a change to another microservice.

How do we decide where to draw those lines? How do we decide when to build our microservices and wear the proper responsibilities are? Bounded Contexts from DDD are an excellent place to start building microservices. They define a context in which a specific model applies. Further subdivision of the bounded context is possible. Principles of Isolation

In many ways the core of what reactive microservices is about is isolation. It really is about finding ways to create isolation between different microservices. So when you want to answer the question of how big should a microservice be, you’re really asking the wrong question. The right question is more about how can I isolate my microservices? If you can figure out how to isolate them appropriately, then you’ll probably find a way to make them a good size. Reactive microservices look to be isolated in as many ways as possible. The four main principles are

  1. State All access to a microservice must go through its API. There is no backdoor access via the database. This allows the microservice to evolve internally without affecting any outside dependencies. See Figure 5.14.
Isolation of State.

Figure 5.14: Isolation of State.

  1. Space Microservices should not care where other microservices are deployed. It should be possible to move a microservice to another machine, possibly in another data center without issue. This allows the microservice to be scaled up/down to meet demand. See Figure 5.15.
Isolation of Space.

Figure 5.15: Isolation of Space.

  1. Time Microservices should not wait for each other. Requests are asynchronous and non-blocking. This leads to more efficient use of resources and resources can be freed immediately, rather than waiting for a request to finish. Between Microservices we expect eventual consistency,15 which provides increased scalability. Total consistency would require a central coordination which limits scalability. See Figure 5.16.
Isolation of Time.

Figure 5.16: Isolation of Time.

  1. Failure Reactive Microservices also isolate failure. A failure in one microservice should not cause another to fail. This allows the system to remain operational in spite of failure. See Figure 5.17.
Isolation of Failure.

Figure 5.17: Isolation of Failure.

5.6.4 Isolation Techniques Bulkheading

We’ve indicated that it’s important to provide isolation in a number of different forms but failure being one of them. A particular technique for isolating failure is something that we call bulkheading. Bulkheading is a term that comes from shipbuilding, see Figure 5.18.


Figure 5.18: Bulkheading.

Failures are isolated into failure zones. Failures in one service will not propagate to other services. Overall system can remain operational, possibly in a degraded state. For example Netflix will hide certain entries in the UI if a given service fails. See Figure 5.19 for an example of bulkheading in microservices.


Figure 5.19: Bulkheading. Circuit Breakers

Another technique for isolating failures is what we call a circuit breaker. We want to avoid those situations where we have things like cascading failures and we want to avoid having retries that put additional load on a system:

  • Calls to an overladed service may fail.
  • The caller may not realise the service is under stress and retry.
  • The retry makes the load worse.
  • Callers need to be careful to avoid this.

The challeng is to know whether a system is overloaded or not. In general, there’s no guarantee that the system is overloaded if a timeout happens. It’s possible there’s a network failure or a temporary network glitch and the retry maybe would have been successful. There’s no way for us to know the difference between a timeout because the system is overloaded versus a timeout because of something else happening in the system. To avoid this situation Circuit Breakers are used, see Figure 5.20.

Circuit Breakers.

Figure 5.20: Circuit Breakers.

Circuit breakers are a way to avoid overloading a service. They do this by quaranting a failing service, so it can fail fast. In normal operations a circuit breaker operates in the closed state. In the closed state any request that gets sent just goes through to the external service and and everything works as normal.

However in the event that we get a failure then we move into the open state. What that does is it’ll now cause it to fail fast. Now when we make calls through the circuit breaker we don’t even bother trying to talk to the external service. We just immediately fail the call. This has the result, that the external service now has a chance to recover.

What about when the service does recover? These circuit breakers usually have a timeout of some kind and after that period of time the circuit breaker attempts to reset itself. At that point, it goes into what we call the half-open state. In the half-open state the next request that comes through it will allow it to go to the external service. If this request succeeds the circuit breaker will go to the closed state. If this requrest fails, it will go to the open state. Gateway Services

One of the disadvantages of microservices is they can create complexity, particularly in your client application. A single request may require information from multiple microservices, see Figure 5.21.

API complexity.

Figure 5.21: API complexity.

The client actually needs to make individual calls to each microservice, then it needs to aggregate the information and then present it. That means that the client needs to know about each of the different microservices. That means that the client needs to know about each of the different microservices. It needs to know they exist. It needs to deal with potential failures from each of those microservices, which again is additional complexity that we don’t want to deal with. So what has happened now is that all that complexity that used to be handled by the monolith has moved into the client.

In general, clients are harder to update and it is impossible to force updates of the client. This has the consequence that older API versions must be kept around for compatibility reasons.

The question is how can we manage these complex APIs that may access many microservices? This is achieved by a Gateway Service, see Figure 5.22.

Gateway Service.

Figure 5.22: Gateway Service.

Gateway services are essentially just other microservices, that we put between the individual microservices and the client. A gateway service sends the requests to individual microservices and aggregates the responses. Therefore, the logic for aggregation is moved out of the client into the gateway service. Gateway handles the failure from each service. The client then only needs to deal with possible failure from the gateway.

5.7 Building Scalable Systems

Building Reactive Systems is all about balance. Often, we talk about systems as though we have to make black or white choices, but the reality is that it’s rarely that clear cut. This becomes incredibly apparent when we begin to talk about the trade off between Consistency and Availability.

Often, we make the choice between Consistency and Availability, based solely on our choice of database. The problem is that from a business perspective, it’s rarely an either/or situation. The business wants both. We will learn, when we look at the CAP theorem, that both isn’t really an option, but that doesn’t mean we have to make the same choice in every case.

Before it is possible to talk about the balance between consistency and availability, we need to establish some definitions for consistency and availability and also scalability.

  • Scalability A system is scalable if it can meet increases in demand while remaining responsive. For example a restaurant could be considered scalable if it can meet an increase in customers and still continue to respond to those customers needs in a fast and efficient way.

  • Consistency A system is consistent if all members of the system have the same view or state. For example in the restaurant if we ask multiple employees about the status of an order and we get the same answers then it is consistent. Note that in this definition time is not factored into this definition. In the sections below we will go more into detail how time factors into consistency.

  • Availability A system is considered available if it remains responsive despite any failures. For example in the restaurant if the cook is working on cooking a steak, accidentally burns his hand, and has to go to the hospital, that is a failure. When that happens if the restaurant can continue to serve the customers then the system is considered available.

5.7.1 Understanding Scalability

To better understand scalability it is important to distinguish between performance and scalability. These are related but different concepts, see Figure 5.23.

Performance vs Scalability.

Figure 5.23: Performance vs Scalability.

Performance optimises response time. Scalability optimises ability to handle load.

One way to measure it is to use requests-per-second. However this combines both performance and scalability and although very useful it is unclear which part has which ratio. For example if we have a system which takes 1 seconds to process a request and can only process one at a time, then the request-per-second measure is 1. If we improve the system to take 0.5s to handle the process it will be able to handle 2 requests-per-second. On the other hand if we take the same system which takes 1 second per request and parallelises it to handle 2 requests per second, again we end up with 2 requests-per-second, however through different means. In the later case the scalability is improved (improve number of requests which can be handled simultaneously), in the former one performance (reduce time required to handle a single request). The baseline is that if we made some changes to the system and simply look at requests-per-second, we cannot tell which of the two indicators we improved. It is a valuable metric because it shows that we’ve made an improvement, but if we want to know where that improvement came from then we have to track each of these things individually.

When we look at performance isolated, we can see on the graph that if performance is improved we improve our response time, but the number of requests (load) may have not changed, see Figure 5.24.


Figure 5.24: Performance.

Scalability improves the ability to handle load and so that pushes the graph along the x-axis, but the performance of each request may not change. see Figure 5.25.


Figure 5.25: Scalability.

An interesting difference between the two indicators is that performace is theoretically limited, while scalability not. Performance can never be smaller or equal 0 due to the laws of physics. Scalability on the other hand theoretically has no limit and could be pushed along the x-axis forever. However, this would require an infinite amount of resources, which in practice (and physics) is simply not possible. Other limitations to scalability are costs or resources and more importantly some bottlenecks which occur due to the design of the system.

As a consequence because we know that scalability has no theoretical limit, it means that when building Reactive Microservices the focus tends to be on improving scalability because this can in theory be push forever. That doesn’t mean that performance is not important, but performance tends to be something that is focused on less. We tend to focus more on scalability because it has no theoretical limit.

Note that around 2005, CPU clocks hit a limit [28] and since then the main focus has become to increase the number of physical and virtual cores. This is due to this same problem that there are much more severe limitations to performance than to scalability.

5.7.2 Consistency

When we start building distributed systems there is a fundamental problem of physics that we have to deal with. Distributed systems are systems that are separated by space. Due to the laws of phyiscs, which say that the speed by which information travels is finite (speed of light), because of the separation by space there is always time required to reach a consensus.

So when two systems are separated by space it means that there is always time required to reach consensus. If you want two pieces of your system to agree on the state of the world then they have to communicate with each other in order to come to some sort of consensus. That takes time. In the time that it takes to transfer the information, the state of the original sender may have changed.

So we have one piece of the system that wants to inform another piece about its current state, so it sends that information over the network. By the time that information arrives the original sender may not be in the same state anymore. It may have changed state. This is the fundamental problem that we’re talking about.

The problem is that the receiver of information is always dealing with stale data. You send the information, it takes time to arrive, by the time it gets there it’s now stale. And this applies whether we’re talking about computer systems, people, or even the real physical world.

The key here is to recognize that when we are dealing with a distributed system we are always dealing with stale data. Reality is basically eventually consistent.

5.7.3 Eventual Consistency

The laws of physics put constraints on us when we start building distributed systems. They basically dictate that those systems have to be eventually consistent.

Eventual consistency guarantees that in the absence of new updates all accesses to a specific piece of data will eventually return the most recent value.

This implies that in order to reach a state of consistency you have to stop all updates, at least for some period of time, in order to reach that level of consistency.

Eventual Consistency breaks down into different forms:

  • Causal Consistency means that causally related items will be processed in a common order. For example if A causes B then A will always be processed before B.
  • Sequential Consistency processes all items in a sequential order regardless of whether they’re causally related. This is a stronger form than causal consistency.

These forms have trade-offs. As you go to more strict forms of eventual consistency you get into different types of trade-offs like with sequential consistency, if you choose that you eliminate the ability to parallelize because it requires that every event be processed in a specific order.

Traditional monolithic architectures are usually based around strong consistency.

5.7.4 Strong Consistency

Strong Consistency means that an update to a piece of data needs agreement from all nodes before it becomes visible.

Another definition is that All accesses are seen by all parallel processes (or nodes, processors, etc.) in the same order (sequentially).

Often, distributed systems seem to exhibit characteristics of strong consistency. The question is how this is possible, when the laws of physics forbid it. This is achieved by introducing mechanisms which simulate strong consistency. An example is a lock, which reduces a distributed problem to a non-distributed one, see Figure 5.26.

Strong consistency in an eventual consistent world.

Figure 5.26: Strong consistency in an eventual consistent world.

In this example, locks allow to simulate strong consistency by isolating a distributed system so non-distributed locks. The distributed system problem only exists when you have multiple things that are responsible for the same piece of data. As long as only one thing is responsible for that data, as long as we only have one instance of the lock, it’s not a distributed system problem anymore.

When we introduce a lock like this it introduces overhead in the form of contention. That overhead has consequences to our ability to be elastic, it has consequences to our ability to be resilient, and it has other consequences as well. This contention has a number of effects:

  • Any two things that content for a single limited resource are in competition.
  • This competition can have only one winner.
  • Others are forced to wait for the winner to complete.
  • As the number of things competing increases, the time until resources can be freed up increases.
  • As load increases, we will eventually exceed acceptable time limits. This leads to timeouts and users leaving the system.

Ultimately contention reduces the ability to parallelise (see to Amdahls law). The more parallel processors we add, the more contention, which eventually results in diminishing returns, making the performance actually worse on increased parallel processors than with less.

This also introduces Coherence Delay

  • In a distributed system, synchronising the state of multiple nodes is done using crosstalk or gossip.
  • Each node of in the system will send messages to each other informing them of any state changes.
  • The time it takes for this synchronisation to complete is called the Coherence Delay.
  • Increasing the number of nodes, increases the coherence delay.

If coherence delay is factored in, then increasing contention through increased parallelism, can actually result in negative returns (Gunthers law).

An example is people which are added to a project that is late. This has the consequence that the project will be even more delayed. The reason for that are both Amdahls and Gunthers laws, see Figure 5.27.

Laws of scalability.

Figure 5.27: Laws of scalability.

Linear scalability requires total isolation. We basically need a system that’s stateless. And when we talk about stateless we don’t mean stateless in the sense that the state is stored in the database we mean truly stateless. This gets tricky because even if we had a system that was truly stateless, we probably would need something like a load balancer or something like that in front of it, and that then becomes a shared resource as well, and so we get back to the point where we don’t have total isolation anymore. It’s very difficult or impossible to design a system that is truly stateless.

Reactive systems understand the limitations that are imposed by these laws. We don’t try to avoid these limitations we accept them and what we do instead is we try to minimize their impact.

This happens by

  • Reducing contention
    • Isolating locks.
    • Eliminating transactions.
    • Avoiding blocking operations.
  • Mitigating coherent delays
    • Embracing eventual consistency
    • Building in Autonomy

Ultimately this results in higher scalability.

5.7.5 CAP Theorem

Distributed systems must account for the CAP Theorem that means that any two separate systems that want to talk to each other are going to have to take this into account.

The CAP Theorem states that in a distributed system we cannot provide more than two of the following: consistency, availability, and partition tolerance. See Figure 5.28.

CAP Theorem.

Figure 5.28: CAP Theorem.

Ideally you would want to have CA: coherence and availability, sacrificing partition tolerance. However, as it turns out this is not really a valid choice. To understand that, we need to defined partition tolerance.

Partition tolerance means that the system continues to operate despite an arbitrary number of messages being dropped or delayed by the network.

The reality is that no distributed system is safe from partitions and so far it is unknown how to build such a system. Networks can go down, nodes can go down; outages can be short or long lived. As a consequence, sacrificing partition tolerance is therefore not an option, and we are therefore ultimately left with the choice between consistency and availability. Therefore we are left with two options:

  1. AP Sacrifice consistency, allowing writes to both sides of the partition. When the partition is resolved you will need a way to merge the data in order to restore consistency.

  2. CP Sacrifice availability, disabling or terminating one side of the partition. During the partition, some or all of your system will be unavailable.

If a system supports CA then it ignores the network, partitioning and ignoring the distributed system. However, when another system communicates with this CA system, there might be the chance of a failure in the CA system, therefore availability is not given anymore and we are back to a CP system, rather than a CA system. Even when using replicas would not solve this problem as then you would run into the consistency problem, resulting in an AP system.

Note that systems are often consistent and partition tolerant, but have areas where they’re not fully consistent, such as in specific edge cases. For example, when they fail over to a replica then they sacrifice consistency, so they’re not necessarily always consistent and just aim to be consistent in most cases. On the other hand a system might be available and partition tolerant but still have areas where its not actually available. For example, it might be generally available but if a certain number of nodes fail then maybe the system become unavailable again.

In reality, most systems balance the two concerns usually favoring one side or the other.

5.7.6 Sharding

Reactive Systems can use a technique called Sharding to limit the scope of contention and reduced coordination communication. Sharding is applied within the application and is a technique that provides strong consistency.16

Sharding partitions Entities (or Actors) in the domain according to their id. Groups of entities are called a shard, each entity exists in only one shard and each shard exists only in one location. Because each entity lives in only one location, the distributed system problem is eliminated. An Entity acts as a consistency boundary. See Figure 5.29.


Figure 5.29: Sharding

In this case if you want information for reservation A the expert for that information only exists in one place. It’s only in one place in the cluster, and as a result we don’t have a distributed system anymore. This means that reservation A can act as a consistency boundary. This is sort of like having a lock, for example, when we have a lock we eliminate the distributed system problem by making the lock only exist in one place.

In order for Sharding to work we need to know where all of these entities live. This is achieved using a coordinator. The coordinator ensures that traffic for a particular entity is routed to the correct location. So if we have a message like CancelReservation and it is identified for reservation ID C, then we can see where reservation ID C lives, and we can direct that message as appropriate. See Figure 5.30.

Routing Messages to Sharded Entities.

Figure 5.30: Routing Messages to Sharded Entities.

The id of a particular entity is used to calculate the appropriate shard. All traffic for a particular id goes then through the same entity. Aggregate Roots from DDD are good candidates for sharding. An important thing here is to choose an appropriate shard key and an appropriate algorithm for computing the shard key. If we choose that poorly then what we get is unbalanced shards, see Figure 5.31.

Balancing Shards.

Figure 5.31: Balancing Shards.

A few more notes on Sharding:

  • Experience from the industry has shown that you should approximately aim at 10x as many shards as you have nodes.
  • It is important to understand that sharding does not eliminate contention, but that it isolates it to sngle entities.
  • Scalability is achieved by distributing the shards over multiple machines.
  • Strong consistency is achieved by isolating operations to a specific entity, which processes one message at a time.
  • Careful choice of shard keys is important to maintain good scalability.
  • Sharding is primarily a CP solution, therefore sacrificing availability. If a shard goes down then there is a period of time where it is unavailable. The shard will migrate to another node eventually. Caching

One of the benefits of Sharding is that it reduces the back-and-forth traffic that you have with your database. This is achieved using caching, see Figure 5.32.

Caching with Shards.

Figure 5.32: Caching with Shards.

In general, caching is problematic in a distributed system because a cache is distributed to each node and therefore needs to be kept in sync with the database. On top of that you also need to keep the cache in sync between each node, when a write happens to the cache in one node. Keeping those caches in sync isn’t really possible in a distributed system.

Sharding solves this problem, by making each cache unique to the entity. The entity maintains an internal view of the state of that entity in memory, and that internal view is unique to only that entity.

As a result our operations can basically become write only. They only ever write to the database. They never have to read from the database. The only time they would read from the database is if there’s a failure or something like that, and we have to recover that entity into memory and therefore rebuild that cached state. As most applications tend to be read heavy this means that this becomes a very, very efficient system because it it’s always reading straight from memory.

5.7.7 CRDTs

So we established sharding as a technique that we can use when we want consistency. When strong consistency is important, sharding is the tool that we would reach to. However, what if we prefer to have availability rather than strong consistency?17

There is a technique called CRDT’s, conflict-free replicated data types, that provides a highly available solution based on asynchronous replication, see Figure 5.33.


Figure 5.33: CRDTs.

Data in CRDTs is stored on multiple replicas for availability. Updates are applied on one replica and then copied asynchronously. Updates are merged to determine the final state.

The specific details of CRDTs are quite complex and beyond this course. We refer interested students to the respective Lightbend Course.18

5.7.8 Consistency or Availability?

The CAP Theorem forces us into a situation where we have to make a choice between strong consistency, or high availability. Now this is a challenge because really, we want both, but we can’t have that, which is what the CAP Theorem tells us. So the question is, how do we decide whether we aim for consistency or whether we aim for availability? Which one is more important?

The choice between consistency and availability isn’t really a technical decision, it’s actually a business decision.

  • It is rarely an either/or situation, it is usual about a balance.
  • The decision of when and where to sacrifice consistency or availability should be discussed with the domain experts and product owners. Software Developers should talk to these people and say “look, we can give you high availability in this situation, but it will cost you this. Or we can give you strong consistency here, but it’s going to cost you that.”
  • We need to factor in the impact on revenue if the system is unavailable vs. eventually consistent.


This chapter is based on material from the Lightbend Reactive Architecture Course.19


[4] Alberto Brandolini. 2013. Introducing event storming. Available at: goo. gl/GMzzDv [Last accessed: 8 July 2017] (2013).

[10] Eric Evans. 2004. Domain-driven design: Tackling complexity in the heart of software. Addison-Wesley Professional.

[28] Herb Sutter. 2005. The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb’s journal 30, 3 (2005), 202–210.

  1. All communication between Actors is treated as remote communication, regardless of the location of the actor.↩︎

  2. Informally it guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.↩︎

  3. Sharding also exists on the database level and is basically the same, however in this course we will focus on Sharding only on the application level.↩︎

  4. In a business which sells stuff such as Amazon, it is more important to sell something rather than the right thing.↩︎