Chapter 7 Command and Query Responsibility Segregation

The Command and Query Responsibility Segregation (CQRS) builds on a very elegant and deep principle, introduced by Bertrand Meyer [24]: Command and Query Separation. It states that every method should either be a command that performs an action, or a query that returns data to the caller, but not both. In other words, asking a question should not change the answer. More generally, this principle suggests to divide an object’s interface into two separated categories:

  • Queries return a result but must not change the observable state of the system and are thus referential transparent and free of side effects.20 A trival example for Queries are getters.

  • Commands are allowed to change the state of a system but do not return a value. Also known as modifiers or mutators. A trival example for Commands are setters.

This boils down to if you have a return value you cannot mutate state, and if you mutate state your return type must be void. The benefits of this principle is that it makes reasoning about side effects in an application substantially easier. For example, Queries can be placed anywhere and re-ordered amongst each other, they will always return the same value. With Commands one has to be more careful but the clear separation helps there as well.

Lets see how we can apply this to a more real-world example. Consider a CustomerService:

interface CustomerService {
  void MakeCustomerPreferred(CustomerId)
  Customer GetCustomer(CustomerId)
  CustomerSet GetCustomersWithName(Name)
  CustomerSet GetPreferredCustomers()
  void ChangeCustomerLocale(CustomerId, NewLocale)
  void CreateCustomer(Customer)
  void EditCustomerDetails(CustomerDetails)
}

When applying the separating principle, we get two services, a CustomerWriteService and a CustomerReadService:

interface CustomerWriteService {
  void MakeCustomerPreferred(CustomerId)
  void ChangeCustomerLocale(CustomerId, NewLocale)
  void CreateCustomer(Customer)
  void EditCustomerDetails(CustomerDetails)
}

interface CustomerReadService {
  CustomerReadService
  Customer GetCustomer(CustomerId)
  CustomerSet GetCustomersWithName(Name)
  CustomerSet GetPreferredCustomers()
}

An example which violates this principle is the next method of the Iterator in Java, which returns the value at the current cursor and advances the cursor. Another example, is the pop method of a Stack, which removes the element at the top of the stack and returns it. As with all patterns and principles, they should not reign absolute and one should know when to break them. We think in the case of the Iterator and the Stack it is reasonable to not follow this principle.

This principle although not very interesting in and of itself becomes extremely interesting when viewed from an architectural point of view, which we do in the next section.

7.1 Towards CQRS

How does this now apply to architectures in the real-world? The initial CustomerService reflects the mainstream approach for interacting with an information system, which is to treat it as a CRUD datastore. Interactions are all about storing and retrieving records by creating new records, reading records, updating existing records, and deleting records.

In general such models tend to become quite sophisticated because of multiple representations of information, generally implemented using a layered architecture. For example, validation rules that only allow certain combinations of data to be stored, infering data to be stored that is different from data provided,… When users interact with the information they use various presentations of this information, each of which is a different representation.

Developers typically build their own conceptual model which they use to manipulate the core elements of the model. In case of a Domain Model, then this is usually the conceptual representation of the domain and typically the persistent storage is made as close to the conceptual model as possible. See Figure 7.1 for a conceptual depiction of a single model with a CRUD approach.

Single model with a CRUD approach (Figure taken from https://martinfowler.com/bliki/CQRS.html)

Figure 7.1: Single model with a CRUD approach (Figure taken from https://martinfowler.com/bliki/CQRS.html)

Such a structure of multiple layers of representation can get quite complicated, which is resolved down to a single conceptual representation which acts as a conceptual integration point between all the presentations. Now, CQRS assumes that for many problems, particularly in more complicated domains, having the same conceptual model for commands and queries leads to a more complex model that does neither well. Therefore, the change CQRS introduces is to split that conceptual model into separate models for reading and writing, following the Command and Query Separation.

For example a web application would see a user looking at a web page that is rendered using the query model. If they initiate a change that change is routed to the separate command model for processing, the resulting change is communicated to the query model to render the updated state. See Figure 7.2 for a conceptual depiction of the model from Figure 7.1 split into two using a CQRS approach.

A split into two models using CQRS (Figure taken from https://martinfowler.com/bliki/CQRS.html)

Figure 7.2: A split into two models using CQRS (Figure taken from https://martinfowler.com/bliki/CQRS.html)

Therefore, recognising that commands and queries have different architectural properties, directly leads to the fundamental architectural concept of CQRS: breaking up the underlying data model into two separate models, one for reading and one for writing.

By separate models we most commonly mean different object models as seen above in the CustomerService example. However, there’s room for considerable variation here. The in-memory models may share the same database, in which case the database acts as the communication between the two models. However they may also use separate databases, effectively making the query-side’s database into a real-time ReportingDatabase. In this case there needs to be some communication mechanism between the two models or their databases. The two models might not be separate object models, it could be that the same objects have different interfaces for their command side and their query side, rather like views in relational databases.

A key benefit of CQRS is that it allows to host the two types of services differently. For example, if reads in the application are much more frequent than writes, the read service could be hosted on 10 servers and the write service on 2 This very architectural separation recognises the fact that processing of commands and queries is fundamentally asymmetrical, and that therefore scaling the services symmetrical as in a single CustomerService does not make a lot of sense and could result in a bottleneck.

7.2 Task-Based UI

A prime example for employing CQRS are task-based UIs, which we will discuss here a bit more in-depth to better understand CQRS through an applied context.

Many commercial software applications include user interfaces in which a screen presents a set of controls, but leaves it to the user to deduce the page’s purpose and how to use the controls to accomplish that purpose. (Microsoft Corporation, 2001)

This is known as a deductive UI, where users need to deduct how the work flow actually works. Without going too much into detail, this approach has generally the following problems:

  1. Users don’t seem to construct an adequate mental model of the product. These users aren’t dumb - they are just very busy and overloaded with information. They do not have the time, energy, or desire to wonder about a conceptual model for their software.
  2. Even many long-time users never master common procedures. Even when users perform tasks repeatedly, the next time they perform the same operation, they may stumble through it in exactly the same way.
  3. Users must work hard to figure out each feature or screen. For the majority of customers, each feature or procedure is a frustrating, unwanted puzzle.

An example of such a deductive UI for a deactivate inventory item use case can be seen in Figure 7.3 This deductive UI might have an editable data grid containing all of the inventory items. It would have editable fields for various data and perhaps a drop down for the status of the inventory item, deactivated being one of them. In order to deactivate an inventory item the user would have to go to the item in the grid, type in a comment as to why they were deactivating it and then change the drop down to the status of deactivated. If the user attempts to submit an item that is “deactivated” and has not entered a comment they will receive an error saying that they must enter a comment as it is a mandatory field for a deactivated item. Some UIs might be a bit more user friendly, they may not show the comment field until the user selects deactivated from the drop down at which point it would appear on the screen. This is far more intuitive to the user as it is a cue that they should be putting data in that field but one can do even better.

A CRUD screen of a deductive UI of a deactivate inventory item use case (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

Figure 7.3: A CRUD screen of a deductive UI of a deactivate inventory item use case (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

An alternative to this kind of UI is an inductive UI, also known as a task-based UI. The basic idea behind a task-based or inductive UI is that it is important to figure out how the users want to use the software and to make it guide them through those processes. The goal is to guide the user through the process.

A task-based UI would take a different approach to the one seen in Figure 7.3, likely it would show a list of inventory items, next to an inventory item there might be a link to “deactivate” the item as seen in Figure 4.

A more task-based screen of an inductive UI of a deactivate inventory item use case (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

Figure 7.4: A more task-based screen of an inductive UI of a deactivate inventory item use case (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

This link would take them to a screen that would then ask them for a comment as to why they are deactivating the items which is shown in Figure 5. The intent of the user is clear in this case and the software is guiding them through the process of deactivating an inventory item. It is also very easy to build Commands representing the user’s intentions with this style of interface.

A more task-based screen of an inductive UI of a deactivate inventory item use case (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

Figure 7.5: A more task-based screen of an inductive UI of a deactivate inventory item use case (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

The intent of the user is clear in this case and the software is guiding them through the process of deactivating an inventory item. It is also very easy to build Commands representing the user’s intentions with this style of interface. Web, Mobile, and especially Mac UIs have been trending towards the direction of being task based. The UI guides you through a process and offers you contextually sensitive guidance pushing you in the right direction. This is largely due to the style offering the capability of a much better user experience. There is a solid focus on how and why the user is using the software; the user’s experience becomes an integral part of the process. Beyond this there is also value on focusing more in general on how the user wants to use the software; this is a great first step in defining some of the verbs of the domain.

Due to a very different information and work flow in a task-based UI, this has consequences for the underlying technology and architecture, which can be ideally satisfied with CQRS. In the next section we have a look at technical details of CQRS and how task-based UI and CQRS are connected.

7.3 Technical Details of CQRS

We have already introduced the broad architectural picture of CQRS above in Figure 7.2. In this section we discuss the concept of commands and queries a bit more in technical depth.

TODO: https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf

7.3.1 Query Side

TODO this still needs a bit of rephrasing to make it clearer

In the original architecture the building of DTOs (Data Transfer Objects) was handled by projecting off of domain objects. There are two issues with using DTOs: code overhead and difficulty of mapping. If the DTOs are very close to the domain objects this leads to code overhead (can be eleviated by using code generators); if the DTOs are a different model than the domain, mapping can become tedious.

The reason for using DTOs is first simply to transfer data to the client and to prevent multiple round trips with the server, therefore they are optimally built to match the screens of the client. However, an issue is that when going through full blown domain objects when sending a query results in the instantiation of a lot of domain objects out of an ORM mapper - even if the only thing which is needed is simply data. This is also known as the Impendance Mismatch. The problem is not the concept of DTOs, it is the need for materialising domain objects from data in the database through an ORM, and then projecting this very in the domain objects back onto DTOs. This can become a serious performance bottleneck also because it is difficult to optimise the database side of it: when using an ORM it is very advanced knowledge to optimise queries which map to objects.

Therefore a concept called Thin Read Layer is introduced, see Figure 7.6. This read layer avoids the materialisation of domain objects and mapping them to DTOs by bypassing the whole domain and reading the required data from the storgage and returning it in DTOs - no domain object materialisation, no projection/mapping of domain objects to DTOs.

The query side as a thin read layer (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

Figure 7.6: The query side as a thin read layer (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

One benefit of the separate read layer is that by bypassing the domain modelit will not suffer from an impedance mismatch. It is connected directly to the data model, which makes queries much easier to optimize. Developers working on the Query side of the system also do not need to understand the domain model nor whatever ORM tool is being used. At the simplest level they would need to understand only the data model (however we do not encourage this).

Therefore, the separation into the Thin Read Layer with the bypassing of the domain for reads allows for the specialisation of the domain.

7.3.2 Command Side

TODO this still needs a bit of rephrasing to make it clearer

Once the read layer has been separated the domain will only focus on the processing of Commands. These issues also suddenly go away. Domain objects suddenly no longer have a need to expose internal state, repositories have very few if any query methods aside from GetById, and a more behavioral focus can be had on Event Provider boundaries. See Figure 7.7

The command side a proper domain object layer (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

Figure 7.7: The command side a proper domain object layer (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

This change has been done at a lower or no cost in comparison to the original architecture. In many cases the separation will actually lower costs as the optimization of queries is simpler in the thin read layer than it would be if implemented in the domain model. The architecture also carries lower conceptual overhead when working with the domain model as the querying is separated; this can also lead towards a lower cost. In the worst case, the cost should work out to be equal; all that has really been done is the moving of a responsibility, it is feasible to even have the read side still use the domain.

It is good practice to incorporate a UUID (Universally Unique ID) in a command, generated on the client side so that the server can refer to it in its reply or the client can send a command again if no response arrives within some time window. This allows then to implement indempotent services.

Avoid commands such as ChangeAddress, CreateUser, DeleteClass as they are only CRUD operations in disguise of commands and do not communicate the intent of a user and do not model the domain accordingly. Naming of commands can lead to great amounts of domain insight. Rather use case names should be used as commands, which very explicitly encodes the domain into the command structure: it all becomes about the what instead of how. Therefore, avoid technical commands, which describe how to achieve a use case and focus on semantic commands, named after use case names, which describe what do achieve.

7.4 Consistency

By applying CQRS the concepts of Reads and Writes have been separated. It really begs the question of whether the two should exist reading the same data model or perhaps they can be treated as if they were two integrated systems, see Figure 7.8 for an illustration of this concept. There are many well known integration patterns between multiple data sources in order to maintain synchronisity either in a consistent or eventually consistent fashion. The two distinct data sources allow the data models to be optimized to the task at hand. As an example the Read side can be modeled in 1NF (1st Normal Form) and the transactional model could be modeled in 3NF (3rd Normal Form).

Separated data models with CQRS (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

Figure 7.8: Separated data models with CQRS (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

7.4.1 Eventual Consistency

Data inconsistency in large-scale reliable distributed systems has to be tolerated for two reasons: improving read and write performance under highly concurrent conditions; and handling partition cases where a majority model would render part of the system unavailable even though the nodes are up and running. Whether or not inconsistencies are acceptable depends on the client application.

CAP theorem [12] states that of the three properties of shared-data systems 1. data consistency, 2. system availability and 3. tolerance to network partition, only two can be achieved at any given time.

A system that is not tolerant to network partitions can achieve data consistency and availability, and often does so by using transaction protocols. To make this work, client and storage systems must be part of the same environment; they fail as a whole under certain scenarios, and as such, clients cannot observe partitions.

However, a general observation21 is that in larger distributed-scale systems, network partitions are a given, therefore fixing property 3 tolerance to network partition. This has the consequence, that consistency and availability cannot be achieved at the same time and therefore when designing a distributed system, it must be decided which other property to fix and which to sacrifice: relaxing consistency will allow the system to remain highly available under the partitionable conditions, whereas making consistency a priority means that under certain conditions the system will not be available.

Both options require the client developer to be aware of what the system is offering. If the system emphasizes consistency, the developer has to deal with the fact that the system may not be available to take, for example, a write. If this write fails because of system unavailability, then the developer will have to deal with what to do with the data to be written. If the system emphasizes availability, it may always accept the write, but under certain conditions a read will not reflect the result of a recently completed write. The developer then has to decide whether the client requires access to the absolute latest update all the time. There is a range of applications that can handle slightly stale data, and they are served well under this model.

Now, consider client-side consistency which has to do with how and when observers see updates made to a data object in the storage systems. Assume that there are Processes A,B and C and Process A makes changes to some underlying storage system. There are three types of consistency:

  • Strong consistency After the update completes, any subsequent access (by A, B, or C) will return the updated value.

  • Weak consistency The system does not guarantee that subsequent accesses will return the updated value. A number of conditions need to be met before the value will be returned. The period between the update and the moment when it is guaranteed that any observer will always see the updated value is dubbed the inconsistency window.

  • Eventual consistency This is a specific form of weak consistency; the storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value. If no failures occur, the maximum size of the inconsistency window can be determined based on factors such as communication delays, the load on the system, and the number of replicas involved in the replication scheme. The most popular system that implements eventual consistency is DNS (Domain Name System). Updates to a name are distributed according to a configured pattern and in combination with time-controlled caches; eventually, all clients will see the update.

The eventual consistency model has a number of variations that are important to consider:

  • Causal consistency If process A has communicated to process B that it has updated a data item, a subsequent access by process B will return the updated value, and a write is guaranteed to supersede the earlier write. Access by process C that has no causal relationship to process A is subject to the normal eventual consistency rules.

  • Read-your-writes consistency This is an important model where process A, after it has updated a data item, always accesses the updated value and will never see an older value. This is a special case of the causal consistency model.

  • Session consistency This is a practical version of the previous model, where a process accesses the storage system in the context of a session. As long as the session exists, the system guarantees read-your-writes consistency. If the session terminates because of a certain failure scenario, a new session needs to be created and the guarantees do not overlap the sessions.

  • Monotonic read consistency If a process has seen a particular value for the object, any subsequent accesses will never return any previous values.

  • Monotonic write consistency In this case the system guarantees to serialize the writes by the same process. Systems that do not guarantee this level of consistency are notoriously hard to program.

A number of these properties can be combined. For example, one can get monotonic reads combined with session-level consistency. From a practical point of view these two properties (monotonic reads and read-your-writes) are most desirable in an eventual consistency system, but not always required. These two properties make it simpler for developers to build applications, while allowing the storage system to relax consistency and provide high availability.

7.4.2 Eventual Consistency in CQRS

TODO

7.5 Events as Storage Mechanism

TODO maybe this already belongs to event sourcing

Most systems in production today rely on the storing of current state of objects in order to process transactions. When most people consider storage for an object they tend to think about it in a structural sense. That is when considering how, for example a Sale should be stored, they think about it as being stored as a Cart that has Line Items and perhaps some Shipping Information associated with it, see Figure 7.9.

A structural view on a Sale.

Figure 7.9: A structural view on a Sale.

This is however not the only way that the problem can be conceptualized and other solutions offer different and often interesting architectural properties - consider deltas. The canonical example of delta usage is in the field of accounting. When looking at a ledger such as in Figure 7.10 each transaction or delta is being recorded.

A simplified ledger, using deltas (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

Figure 7.10: A simplified ledger, using deltas (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

Next to it is a denormalized total of the state of the account at the end of that delta. In order to calculate this number the current delta is applied to the last known value. The last known value can be trusted because at any given point the transactions from the beginning of time for that account could be re-run in order to reconcile the validity of that value. Because all of the transactions or deltas associated with the account exist, they can be stepped through verifying the result. There are however some other interesting properties to this mechanism of representing state, for example, it is possible to go back and look at what a state was at a given point in time. Applied to the Sale example above, we arrive at a conceptually different view, see Figure 7.11.

A transactional view of placing orders in a Sale (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

Figure 7.11: A transactional view of placing orders in a Sale (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

There is a structural representation of the object, but it exists only by replaying previous transactions to return the structure to its last known state: data is not persisted in a structure but as a series of transactions. One very interesting possibility here is that unlike when storing current state in a structural way there is no coupling between the representation of current state in the domain and in storage, the representation of current state in the domain can vary without thought of the persistence mechanism.

Such a delta approach is ideally suited for CQRS to store current state because the choice of integration model is important as translation and synchronization between models (read / write or queries / command) can be become a very expensive undertaking. Therefore, CQRS uses a delta approach by introducing domain events as they are a well known integration pattern and offer the best mechanism for model synchronization.

A domain event in CQRS22 is something that has happened in the past at a specific point in time with no duration. They should be represented as verbs in the past tense such as CustomerRelocated, CargoShipped, or InventoryLossageRecorded. Events normally correspond to use cases and/or events from the system context modeling. In terms of code, a domain event is simply a data holding structure:

public class InventoryItemDeactivatedEvent {
  public readonly Guid InventoryItemId;
  public readononly string Comment;
  
  public InventoryItemDeactivatedEvet(Guid id, string comment) {
    InventoryItemId = id;
    Comment = comment;
  }
}

The code listing looks very similar to the code listing that was provided for a command. However, the main differences exist in terms of significance and intent: commands have an intent of asking the system to perform an operation whereas domain events are a recording of the action that occurred. Therefore a command arrives as an imperative, for example PlaceSale and domain events record the successful action completing the command in the past tense, for example SaleCompleted. This separation makes the language much clearer and although subtle it tends to lead developers towards a clearer understanding of context based solely on the language being used.

It is fundamentally important to understand that there is no delete in such a transactional domain event model! It would not make sense to delete an event in the event sequence, which would inherently alter the past and cause inconsistencies (time traveling?). If a delete is necessary, for example removal of a Line Item from the Cart, this is modeled as any other domain event and recorded in the transactional model, see Figure 7.12. Note that this delete is conceptually very different from a database delete: it denotes a delete in the domain, not in the technical RDBMS sense!

A transactional view of placing and deleting orders in a Sale (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

Figure 7.12: A transactional view of placing and deleting orders in a Sale (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

In the event stream the two pairs of socks were added and then removed later The end state is equivalent to not having added the two pairs of socks. The data has not however been deleted, new data has been added to bring the object to the state as if the first event had not happened, this process is known as a Reversal Transaction. By placing a Reversal Transaction in the event stream results in the object to be returned to the state as if the item had not been added, however the reversal leaves a trail that shows that the object had been in that state at a given point in time.

There are also architectural benefits to not deleting data. The storage system becomes an additive only architecture it is well known that append-only architectures distribute more easily than updating architectures because there are far fewer locks to deal with (because it becomes a Monoid).

7.5.1 Performance and Scalability

As an append-only model storing events is a far easier model to scale. There is however other benefits in terms of performance and scalability especially compared with a stereotypical relational model. As an example, the storage of events offers a much simpler mechanism to optimize as it is limited to a single append-only model.

7.5.2 Partitioning

A very common performance optimization in today’s systems is the use of Horizontal Partitioning. With Horizontal Partitioning the same schema will exist in many places and some key within the data will be used to determine in which of the places the data will exist. This is also known as Sharding as of late. The basic idea is that you can maintain the same schema in multiple places and based on the key of a given row place it in one of many partitions.

One problem when attempting to use Horizontal Partitioning with a Relational Database is that it is necessary to define the key with which the partitioning should operate. This problem goes away when using events where Event Provider IDs are the only partition point in the system. No matter how many Event Providers exist or how they may change structures, the Event Provider Id associated with events is the only partition point in the system. Therefore, Horizontally Partitioning / Sharding an Event Store is a very simple process.

7.5.3 Saving Objects

When dealing with a stereotypical system utilizing a relational data storage it can be quite complex to figure out what has changed within the Event Provider In general, ORMs keep a graph of objects to figure out the changes that have occurred: upon persisting an object, the graph is walked and compared with the object to determine what has changed - these changes are then saved back to the RDBMS.

In a system that is domain event centric, the Event Providers are themselves tracking strong events as to what has changed within them. Therefore, there is no need for complex checking of object graphs as to what has changed.

7.5.4 Loading Objects

Very often there are many queries that must be issued to build the Event Provider. In order to help minimize the latency cost of these queries many ORMs have introduced a heuristic of Lazy Loading also known as Delayed Loading where a proxy is given in lieu of the real object. The data is then only loaded when some code attempts to use that particular object.

When dealing with events as a storage mechanism things are quite different. There is but one thing being stored, events. Simply load all of the events for an Event Provider and replay them.

7.5.5 Rolling Snapshots

An objection to this domain event model is that it requires more queries in a relational system because when storing events there may be an ever increasing number of events for some Event Providers, see Figure 7.13.

An event stream (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

Figure 7.13: An event stream (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

Such an event stream can indeed become a problem as there may be a very large number of events between the beginning of time and the current point. It can be easily imagined that there is an event stream with a million or more events that have occurred, such an event stream would be quite inefficient to load. However by using rolling snapshots, this problem can be alleviated.

A rolling snapshot is a denormalization of the current state of an Event Provider at a given point in time. It represents the state when all events to that point in time have been replayed. Rolling snapshots are used as a heuristic to prevent the need to load all events for the entire history of an Event Provider. It is then possible to only play the events from that point in time forward in order to load the event provider, see Figure 7.14.

An event stream with snapshot (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

Figure 7.14: An event stream with snapshot (Figure taken from https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf)

The process for rebuilding an event provider changes when using Rolling Snapshots. Instead of reading from the beginning of time forward, it is read backwards putting the events on to a stack until either there were no more events left or a snapshot was found. The snapshot would then if found be applied and the events would be popped off the stack and applied until the stack was empty. Such snapshots can be taken asynchronously by a process monitoring the Event Store.

It is important though to remember that Rolling Snapshots are just a heuristic and that conceptually the event stream is still viewed in its entirety.

TODO It is important to note that although this is an easy way to conceptualize how Rolling Snapshots work, that this is a less than ideal solution in a production system for various reasons. Further discussion on the implementation of Rolling Snapshots can be found in “Building an Event Storage”

7.6 Building an Event Storage

TODO follow greg young https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf from page 41 TODO maybe this belongs already in event sourcing

7.7 CQRS and Event Sourcing

TODO follow greg young https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf from page 50 TODO maybe this belongs already in event sourcing

7.8 When to Use

Like any pattern, CQRS is useful in some places, but not in others. Many systems do fit a CRUD mental model, and so should be done in that style. CQRS is a significant mental leap for all concerned, so shouldn’t be tackled unless the benefit is worth the jump. In particular CQRS should only be used on specific portions of a system (a BoundedContext in DDD lingo) and not the system as a whole. In this way of thinking, each Bounded Context needs its own decisions on how it should be modeled. When looking at a ledger such as in Figure 2 each transaction or delta is being recorded.

In general it is recommended to use CQRS only in very complex domains, where Domain Driven Design is applicable and where the modeling in an event log generates unique business value, for example when there is a need to look at past data for analysing. CQRS leads to higher costs in modeling every behaviour in a system as well as a higher cost in terms of disk cost and thought process to store every event in the system. The question which needs to be answered is whether the costs are worth the Return on Investment (ROI) when the business derives a competitive advantage from the data.

7.9 Benefits

CQRS in itself is a very simple concept, however its benefits evolve around the architectural flexibility and decisions that can be made around it. Complex domains, implemented with DDD (Domain-Driven-Design), are very well suited to the application of CQRS, together with a task-based UI. Additionally, it fits very well with event-based programming and architectures as already introduced with [Reactive Programming], Actors and Blackboard Pattern. Finally, it combines very well with Event Sourcing, however care has to be taken if ES should be really applied.

In terms of non-functional requirements it offers (TODO refine them):

  • Performance CQRS allows you to separate the load from reads and writes allowing you to scale each independently. If your application sees a big disparity between reads and writes this is very handy. Even without that, you can apply different optimization strategies to the two sides. An example of this is using different database access techniques for read and update.
  • Scalability we can easily scale up by adding more services
  • Security because the separation allows for different security for either type of services, for example only a certain role is allowed to write data, or only another role is allowed to view certain data.
  • Maintinability because of clear separation of effectful commands and read-only queries.

There is not an impedance mismatch between events and the domain model. The events are themselves a domain concept, the idea of replaying events to reach a given state is also a domain concept. The entire system becomes defined in domain terms. Defining everything in domain terms not only lowers the amount of knowledge that developers need to have, it also limits the number of representations of the model needed as the events are directly tied to the domain model itself.

7.10 Drawbacks

CQRS is regarded for adding risky complexity and its difficulty to use. Also, because having separate models raises questions about how hard is to keep those models consistent, which raises the likelihood of using eventual consistency.

7.11 What it is not

  • Task-Based UI CQRS is applicable to CRUD as well as task-based UIs. Due to CQRS strength in DDD (Domain Driven Design) which is the best way to implement a task-based UI, CQRS was often mistaken to be only applicable to task-based UIs.
  • Event Sourcing During its hype, CQRS has often been used interchangably with Event Sourcing, due to the compelling fact that events can be stored for keeping the current state. However the two concepts are not directly related.

References

[12] Seth Gilbert and Nancy Lynch. 2002. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33, 2 (June 2002), 51–59. DOI:https://doi.org/10.1145/564585.564601

[24] Bertrand Meyer. 1988. Object-oriented software construction. Prentice hall New York.


  1. In Java, this cannot be checked by the compiler, whereas in C++ it is possible to some extent through the const keyword. However, some functional programming languages, for example Haskell, allow to express the type of side effects in the type of a function.↩︎

  2. https://www.allthingsdistributed.com/2008/12/eventually_consistent.html↩︎

  3. There are also other slightly different definitions of domain events, for example by Martin Fowler http://martinfowler.com/eeaDev/DomainEvent.html, however these other definitons have a different context↩︎