When confronted with system failures, we have tools to help us recover when either snapshotting or append-only file logging had been enabled. Redis includes two command-line applications for testing the status of a snapshot and an append-only file. These commands are redis-check-aof and redis-check-dump. If we run either command without arguments, we’ll see the basic help that’s provided:
$ redis-check-aof Usage: redis-check-aof [--fix] <file.aof> $ redis-check-dump Usage: redis-check-dump <dump.rdb> $
If we provide –fix as an argument to redis-check-aof, the command will fix the file. Its method to fix an append-only file is simple: it scans through the provided AOF, looking for an incomplete or incorrect command. Upon finding the first bad command, it trims the file to just before that command would’ve been executed. For most situations, this will discard the last partial write command.
Unfortunately, there’s no currently supported method of repairing a corrupted snapshot. Though there’s the potential to discover where the first error had occurred, because the snapshot itself is compressed, an error partway through the dump has the potential to make the remaining parts of the snapshot unreadable. It’s for these reasons that I’d generally recommend keeping multiple backups of important snapshots, and calculating the SHA1 or SHA256 hashes to verify content during restoration. (Modern Linux and Unix platforms will have available sha1sum and sha256sum command-line applications for generating and verifying these hashes.)
CHECKSUMS AND HASHESRedis versions including 2.6 and later include a CRC64 checksum of the snapshot as part of the snapshot. The use of a CRCfamily checksum is useful to discover errors that are typical in some types of network transfers or disk corruption. The SHA family of cryptographic hashes is much better suited for discovering arbitrary errors. To the point, if we calculated the CRC64 of a file, then flipped any number of bits inside the file, we could later flip a subset of the last 64 bits of the file to produce the original checksum. There’s no currently known method for doing the same thing with SHA1 or SHA256.
After we’ve verified that our backups are what we had saved before, and we’ve corrected the last write to AOF as necessary, we may need to replace a Redis server.