Redis Enterprise is a self-managed, real-time data platform that unlocks the full potential of Redis at scale, ensuring five-nines (5-9s) high availability. Redis Enterprise is architected to provide automated database resilience and mitigate hardware failure and cloud outages risks.
Redis Enterprise’s high availability is built around replication, but automatic failover, backup, and recovery also impact the ability to meet the application’s high availability service level agreements.
Replication is the process of storing identical copies of your data on multiple Redis Enterprise servers. Redis Enterprise replication keeps your data safe and allows your application to run uninterrupted, without downtime, keeping it highly available, even if something happens to one or more of your servers.
Like most NoSQL database deployments, open source Redis uses three replicas to ensure high availability. From a high-level perspective, the first replica is usually used to store your dataset, the second for failover purposes, and the third serves as a tiebreaker in case of a network split event. Because DRAM is expensive, maintaining three replicas can be extremely expensive. Redis Enterprise allows you to have a complete high availability (HA) system with only two replicas. Your tiebreaker is determined at the node level by using an uneven number of nodes in a cluster. The example below compares the infrastructure cost of running a 90GB high availability architecture with an open source Redis dataset on Amazon Web Services with three replicas as opposed to with a Redis Enterprise cluster that uses two replicas and a quorum node:
Redis Enterprise replication is based on diskless replication (pure in-memory replication) at both the primary server and replica, providing complete redundancy, as shown in the figure below:
In addition, Redis Enterprise uses PSYNC2 for its core operations, so the active replication link is maintained afterwards for planned failover or shard migration operations.
A Redis Enterprise cluster provides fault tolerance and resilience. In the case of a primary server or node outage, Redis Enterprise’s self-healing process automatically detects the hardware failure, elects a replica as a replacement, and promotes that replica to become the new primary server. Redis Enterprise also automatically switches all client connections. The entire failover process occurs in single-digit seconds, without manual intervention. A Redis Enterprise cluster uses two watchdog processes to detect failures:
These watchdog processes are part of the distributed cluster manager entity and reside on each node of the cluster. Failure detection needs to be managed by entities that run inside the cluster to avoid situations like that shown on the left side of the figure below. In this example, the watchdog entity is located in the wrong side of the network split and cannot trigger the failover process:
Once a failure event is detected, the Redis Enterprise cluster automatically and transparently runs a set of internal distributed processes that failover the relevant shard(s) and endpoint(s) (if needed) to healthy cluster nodes. They also reroute user traffic through a different proxy or proxies if necessary.
The Redis Enterprise cluster has out-of-the-box HA profiles for noisy (public cloud) and quiet (virtual private cloud, on-premises) environments. We have found that triggering failovers too aggressively can create stability issues. On the other hand, in a quiet network environment, a Redis Enterprise cluster can be easily tuned to support a constant single-digit (<10 sec) failover time in all failure scenarios.
With Redis Enterprise, you can choose from about 90 regions across AWS, Google Cloud, and Microsoft Azure. This ensures your applications are close to your user to provide a sub-millisecond response time.
Redis Enterprise is also designed to provide escalating geographic resilience in multi-zone, multi-region, and multi-cloud Redis Enterprise clusters. Redis Enterprise supports multi-availability zone/rack cluster configurations. In this mode, the cluster nodes are tagged with the zone/rack they have been deployed in, and Redis Enterprise ensures that primary server and replica Redis processes of the same shard are never hosted on nodes located in the same availability zone/rack. Running Redis Enterprise in a multi-availability zone/rack environment requires the following conditions:
An example of Redis Enterprise multi-availability zone configuration in the cloud is shown here:
As you can see, this example meets all the conditions discussed above:
Redis Enterprise geo-resilience with multi-cloud clusters
Cloud provider disruptions vary in severity, from temporary capacity constraints to complete outages that can devastate application deployments. By distributing data across multiple clouds, organizations can improve database and application resilience and prevent data loss. Redis Enterprise multi-cloud clusters allow your Redis Enterprise cluster distribution across multiple regions on different public cloud providers. With multi-cloud clusters, you can take advantage of unique tools and services native to AWS, Google Cloud, and Azure without the operational complexity of managing data replication and migration across clouds. You can expand your reach with low-latency access to more cloud regions and satisfy in-region data sovereignty requirements without sacrificing resilience. Redis Enterprise automatically distributes your data across clouds for increased fault tolerance to ensure highly available applications.
In addition to database resilience, you may also want a plan for maximizing application resilience and fault tolerance in multi-cloud environments. Applications should be distributed across multiple clouds to failover when necessary to meet high availability system requirements.
Watch our recent Tech Talk on High Availability with Redis Enterprise!