We are now, simply, Redis
For those of you familiar with Redis, it should be relatively straightforward to create a configuration that guarantees ACID-ish (Atomicity, Consistency, Isolation, Durability) operations: merely create a single Redis instance with a ‘master’ role and have it configured with AOF every write (‘appendfsync always’) to a persistent storage device. This configuration provides ACID characteristics in the following ways:
That said, most Redis users prefer not to run Redis in this configuration as it can dramatically affect the performance. For example, if the persistent storage is currently busy, Redis would wait with the request’s execution until the storage becomes available again.
With that in mind, we wanted to determine how fast can Redis Enterprise (Redise) cluster can process ACID transactions. There are several built-in enhancements that we have made to the Redise architecture that enable a better performance in an ACID configuration, including:
We deployed the following benchmark configuration inside an AWS VPC:
Despite having tested multiple types of loads, including different read/write ratios; different object sizes (from 100B to 6KB); multiple number of connections; with and without pipelining; we couldn’t get less than one millisecond latency for durable ‘write’ operations, where latency was measured from the time the first byte of the request arrived at the cluster until the first byte of the ‘write’ response was sent back to the client. Finally, we tested a single request over a single connection, but still couldn’t get less than 2-3 millisecond latency. We did a deeper analysis and found that there was no way to achieve less than two milliseconds of latency between any instance on the AWS cloud and EBS storage under an ACID configuration.
As most of our customers want <1 millisecond latency, we decided to look for alternatives.
In a nutshell VMAX is a family of storage arrays built on the strategy of simple, intelligent, modular storage. It incorporates a Dynamic Virtual Matrix interface that connects and shares resources across all VMAX engines, allowing the storage array to seamlessly grow from an entry-level configuration into the world’s largest storage array.
Performance-wise, VMAX can scale from one up to eight engines (V-Bricks). Each engine consists of dual directors, each with 2-socket Intel CPUs, front-end and back-end connectivity, hardware compression module, Infiniband internal fabric, and a large mirrored and persistent cache. All writes are acknowledged to the host as soon as they registered with VMAX cache and only later, perhaps after multiple updates, are written to flash. Reads also benefit from the VMAX large cache. When a read is requested for data that is not already in cache, FlashBoost technology delivers the I/O directly from the back-end (flash) to the front-end (host) and is later staged in the cache for possible future access.
We set up the following benchmark environment:
Below are some more details:
As expected, the ‘read’ intensive tests provided the best results; that said, we were very surprised to see over 660K ops/sec on the standard 1:1 read/write use case with 100B item_size, and only slightly lower throughout (i.e. 640K op/sec) on the write-intensive scenario. We were also impressed with the 6000B results, even under a write-intensive scenario such as 80K ops/sec with sub-millisecond latency on a single cluster node.
We were surprised (and happy) to discover that with high-end persistent storage devices like Dell-EMC VMAX, a single Redise cluster node can support over 650K ACID ops/sec while keeping sub-millisecond database latency.
On the other hand, we were disappointed to see that we cannot run a single durable operation under sub-millisecond latency on the state-of-the-art cloud storage infrastructure (i.e. AWS io1 EBS). With the multitude of advanced technologies and public clouds services available, there is still a ways to go.
The full Redis with Dell-EMC VMAX Performance Assessment Tests and Best Practices can be found on our website here.
Each test was run with the following memtier_benchmark parameters