I briefly described what happens when a slave connects—that the master starts a snapshot and sends that to the slave—but that’s the simple version. Table 4.2 lists all of the operations that occur on both the master and slave when a slave connects to a master.
Step |
Master operations |
Slave operations |
---|---|---|
1 |
(waiting for a command) |
(Re-)connects to the master; issues the SYNC command |
2 |
Starts BGSAVE operation; keeps a backlog of all write commands sent after BGSAVE |
Serves old data (if any), or returns errors to commands (depending on configuration) |
3 |
Finishes BGSAVE; starts sending the snapshot to the slave; continues holding a backlog of write commands |
Discards all old data (if any); starts loading the dump as it’s received |
4 |
Finishes sending the snapshot to the slave; starts sending the write command backlog to the slave |
Finishes parsing the dump; starts responding to commands normally again |
5 |
Finishes sending the backlog; starts live streaming of write commands as they happen |
Finishes executing backlog of write commands from the master; continues executing commands as they happen |
With the method outlined in table 4.2, Redis manages to keep up with most loads during replication, except in cases where network bandwidth between the master and slave instances isn’t fast enough, or when the master doesn’t have enough memory to fork and keep a backlog of write commands. Though it isn’t necessary, it’s generally considered to be a good practice to have Redis masters only use about 50–65% of the memory in our system, leaving approximately 30–45% for spare memory during BGSAVE and command backlogs.
On the slave side of things, configuration is also simple. To configure the slave for master/slave replication, we can either set the configuration option SLAVEOF host port, or we can configure Redis during runtime with the SLAVEOF command. If we use the configuration option, Redis will initially load whatever snapshot/AOF is currently available (if any), and then connect to the master to start the replication process outlined in table 4.2. If we run the SLAVEOF command, Redis will immediately try to connect to the master, and upon success, will start the replication process outlined in table 4.2.
DURING SYNC, THE SLAVE FLUSHES ALL OF ITS DATAJust to make sure that we’re all on the same page (some users forget this the first time they try using slaves): when a slave initially connects to a master, any data that had been in memory will be lost, to be replaced by the data coming from the master.
WARNING: REDIS DOESN’T SUPPORT MASTER-MASTER REPLICATIONWhen shown master/slave replication, some people get the mistaken idea that because we can set slaving options after startup using the SLAVEOF command, that means we can get what’s known as multi-master replication by setting two Redis instances as being SLAVEOF each other (some have even considered more than two in a loop). Unfortunately, this does not work. At best, our two Redis instances will use as much processor as they can, will be continually communicating back and forth, and depending on which server we connect and try to read/write data from/to, we may get inconsistent data or no data.
When multiple slaves attempt to connect to Redis, one of two different scenarios can occur. Table 4.3 describes them.
When additional slaves connect |
Master operation |
---|---|
Before step 3 in table 4.2 |
All slaves will receive the same dump and same backlogged write commands. |
On or after step 3 in table 4.2 |
While the master is finishing up the five steps for earlier slaves, a new sequence of steps 1-5 will start for the new slave(s). |
For the most part, Redis does its best to ensure that it doesn’t have to do more work than is necessary. In some cases, slaves may try to connect at inopportune times and cause the master to do more work. On the other hand, if multiple slaves connect at the same time, the outgoing bandwidth used to synchronize all of the slaves initially may cause other commands to have difficulty getting through, and could cause general network slowdowns for other devices on the same network.