Durable Redis: Data Persistence Storage

Data persistence concepts:

Redis Enterprise is a fully durable database. It supports the following data persistence mechanisms:

  • AOF (Append-Only File) data persistence: Every shard of a Redis database appends new lines to its persistent file in one of the following manners:
    • every second (fast but less safe)
    • every write (safer but slower)
  • Snapshot: The entire point-in-time view of the dataset is written to persistent storage, across all shards of the database. The snapshot time is configurable.

Snapshots vs. backups

Snapshots and backup are designed for two different things. While snapshot supports data durability (i.e. to automatically recover data when there is no copy of the dataset in memory), backup supports disaster recovery (i.e. when the entire cluster needs to be rebuilt from scratch).

Ephemeral vs. persistent storage

In cloud native deployments such as a public cloud, private cloud, or virtual private cloud, ephemeral (instance) storage cannot be used for durability purposes. Instead, a network-attached storage like Amazon Elastic Block Store (EBS), Microsoft Azure Disk Storage, or Google Cloud Platform Persistent Disk is required. That’s because, just as it sounds, ephemeral storage is ephemeral! When a cloud instance fails (which is relatively common), the contents of its local disk are also lost.

The Redis Enterprise cluster is designed to work with network-attached storage for data persistence. By default, every node in the cluster is connected to a network-attached storage resource, making the cluster immune to data-loss events such as multiple node failures with no copies of the dataset left in DRAM. This durability-proven architecture is illustrated here:

As illustrated above, in cases where there is no copy of the dataset left in DRAM, Redis Enterprise will find the most recent copy of the dataset in the network-attached devices that were connected to the failed node, and use that to populate the Redis shard on the new cloud instance.  

Data-persistence at the master or at the slave level?

By default, when data persistence is enabled Redis Enterprise sets data persistence at the slave of each shard of the database. In this configuration there is no impact on performance, as the master shard is not affected by the slowness of the disk; on the other hand, replication adds latencies that may break the data persistence SLA. Therefore, Redis Enterprise allows you to enable data persistence on both the master and slave shards. This is a more reliable configuration that doesn’t infringe on your data persistence SLA, but if the disk speed cannot cope with the throughput of ‘writes,’ it will affect the latency of your database, as Redis delays its processing when it cannot commit to disk. If you use Redis Enterprise DBaaS deployments (Cloud or VPC) you will automatically be tuned to work with a storage engine and the right shards configuration to support your persistent storage load; in an on-premises deployment, we recommend you consult with Redis solutions architects regarding your sizing. Data persistence options are shown here:

Enhanced storage engine

Redis Enterprise enhances the Redis storage engine to increase the throughput of the Redis core with data persistence enabled, and to better utilize cluster resources by allowing multiple Redis instances to run on the same cluster node without affecting performance:

  1. When AOF is used as a mechanism for data persistence, the size of the append-only file grows with every ‘write’ operation. An AOF rewrite process is then triggered to control the size of the file and reduce the recovery time from disk. By default (and configurable), the OSS Redis triggers a rewrite operation when the size of the AOF has doubled since the size of the previous rewrite operation. In a ‘write’ intensive scenario, the rewrite operation can block the main loop of Redis (as well as other Redis instances that are running on the same cluster node) from executing ongoing requests to disk. Redis Enterprise uses a greedy AOF rewrite algorithm that attempts to both postpone AOF rewrite operations as much as possible without infringing the SLA for recovery time (a configurable parameter) as well as prevent the rewrite from reaching the disk space limits. Though optimal use of the rewrite process, the overall throughput of a persistent Redis instance is much higher than it otherwise would be.
  2. The Redis Enterprise storage layer allows multiple Redis instances to write to the same persistent storage in a non-blocking way, i.e. a busy shard that is constantly writing to disk (during an AOF rewrite) will not block other shards from executing durable operations.

A storage engine benchmark performed by Dell-EMC and Redis showed that when using Redis Enterprise’s enhanced storage engine with Dell-EMC VMAX, Redis performance is nearly unaffected by AOF every-write operation, as shown here:

Want to learn more?

Watch our recent Tech Talk on Buy vs Build: Disaster Recovery in Redis Open Source vs Redis Enterprise! 

Build vs. Buy: Disaster Recovery in Redis Open Source vs. Redis Enterprise - Tech Talk

More information on this benchmark can be found here:

Next section  ►  Backup, Restore, and Cluster Recovery