Active-Active Geo-Distribution allows you to place your Redis database cluster instances and centers close to your users, no matter where they are. Placing read replicas closer to your users is the right thing to enable real-time response. For write-heavy applications, that’s not enough. So how do you develop an Active-Active geo-distributed cluster?
(aka -free replicated types (CRDTs) or ) are a family of replicated types with a common set of properties that enable operations to always converge to a final state consistent among all replicas. To ensure that conflicts never happen (there is no concept of ) and cause problems in your applications, operations on types have to respect a specific set of algebraic properties.
CRDTs work in when you create a -enabled database because standard commands get swapped with an equivalent . Let’s look at a Redis with a equivalent and the nuances that Active-Active geo-distributed adds.
Redis Sets are similar to what your favorite programming language offers in the standard library, offering specialized operations to add and remove elements, check if an element is part of the set, and perform set intersection, union, and difference. Here’s how it looks like when calling actual Redis commands:
// Create a set:
> SADD fruits apple pear banana
// Test if it contains apple:
> SISMEMBER fruits apple
// Remove a fruit:
> SREM fruits banana
// Test if it’s still present:
> SISMEMBER fruits banana
Running the same commands in a CRDT-enabled Redis Enterprise database would display the same behavior, but what’s happening behind the scenes is completely different.
In a database, all nodes can independently apply changes to the “fruits” key. This enables you to write genuinely geo-distributed applications. The downside is that the “ lag” is also experienced when writing to a . This approach offers significant advantages for geo-distributed applications and achieving .
In a Redis Enterprise CRDT-enabled database, sets operations work under a few additional rules, the two most important of which get applied when merging operations coming from different nodes:
The second rule is sometimes referred to as the “observed remove” rule, meaning that you can delete only items that you were able to observe when the command was issued.
A note on replication lag and consistency model
While eventually all replica nodes will converge to the same final state, in the short term a command sent to ReplicaEU may not have yet been propagated to ReplicaUS, for example. This situation, albeit normally brief, is the reason why CRDT data structures must hardcode conflict-resolution strategies. All replicas in the same CRDT database will constantly sync their state to provide as consistent a view of the dataset as possible, but keep in mind that CRDTs are also useful for ensuring high availability in case of network partitions.
This means that to implement better resilience in your system you will need to account for the possibility of prolonged moments where the system has not yet been not fully synchronized. This makes CRDTs a form of eventual consistency. This is usually described as strong eventual consistency because it’s much more efficient than the more common types of replication based on quorum quotas. (For more information, see the Redis page on Active-Active Geo-Distribution.)
CRDTs perform as commutative operations. This has the desirable quality (for a ) that the order of does not matter. in an arbitrary order fundamentally reframes many race conditions, and its usefulness increases as asynchrony (e.g., distribution) increases. In many systems, copies of some need to be stored on replicas. Examples of such systems include
More CRDT data structures
For a complete list of the CRDT data structures that Redis Enterprise supports, take a look at the official documentation.
What are the practical requirements for developing an Active-Active application?
Let’s look at the main development aspects that could raise questions.
Client libraries for databases
One question is whether you need a special type of Redis client to interact with a CRDT database. The answer is no—any normal Redis client can connect to a CRDT database and execute commands directly. As mentioned earlier, the commands don’t change, it’s the underlying mechanics that do. The only thing that clients don’t offer out-of-the-box is the ability to connect to a different geo-distributed replica in case the one closest to the service instance becomes unavailable. We’re working on this issue, but in the meantime, in the event of a network split, you will have to decide at the application level when it’s appropriate to connect to a replica in a different region.
The advantage of Active-Active geo-distribution is the ability to share some state across service instances distributed globally, all while experiencing local latencies when manipulating that state. To fully exploit this capability, your services need to rely on CRDTs’ semantics as much as possible and, as such, avoid keeping an internal state that would get lost (or become un-mergeable) in the event of failures. -based databases are available even when the distributed database replicas cannot exchange the .
CRDTs can’t solve all problems because they hardcode specific merge rules that might not be appropriate for your particular problem. Counters are a typical : counters are great, but they can’t be used to model a bank account balance because merges could allow the to go negative-and there is no way at the application level to prevent this from happening. In other words, CRDTs are an efficient but nuanced form of that doesn’t apply properly to inherently transactional problems.
Testing Active-Active Geo-distributed Applications
It might seem that testing an Active-Active geo-distributed application must be much more complicated than testing normal single-master ones. While CRDTs certainly are a complex component of your model, their behavior is fully deterministic in terms of results. You have to account for the primary situation when the cluster is partitioned. As mentioned above, it’s OK to continue sending updates to a that has been disconnected from the rest of the cluster because the updates will eventually be merged successfully when the connection is reestablished. You need to ensure that your service can still operate correctly when disconnected from the whole Active-Active geo-distributed database cluster.
In other words, the only non-obvious extra testing required on your part should be about how the application behaves in the event of a network partition.
CRDTs allow you to create geo-distributed applications that can offer local latencies to your entire user base, all the while rendering your entire application more resilient to failure. While not all problems can be solved by CRDTs, there is a vast space of improvements that most companies can benefit from. As an example, please take a look at this recent blog post by Kyle Davis, where he shows how to implement a leaderboard using Sorted Sets—and hints at the benefits of a CRDT version.
is propagating any to all copies of . In the most simple case, a series of updates to the come from a single source, and all other entities that hold a of the are guaranteed to receive those updates in the same order.
When multiple copies of the need to be updated simultaneously, there needs to be a way to agree upon the correct version of the . With , once one entity makes any updates, all other replicas are locked to eliminate the need for until they’ve been updated to the same new version. The source of truth pushes updates to all entities in the same order to keep them ‘in step.’
The locking mechanism used in is inconsistent with the need for real-time performance. This is where and CRDTs come into play. Every cluster or node can potentially make and receive updates, so there is no way to ensure that each gets the same sequence of updates. is the property where the state of the is eventually reconciled, regardless of the order of events that reach each . CRDTs can resolve conflicts that arise from or on replicas existing in a decentralized environment.