7.2.2 Non-numeric sorting with ZSETs

Typical comparison operations between strings will examine two strings character by character until one character is different, one string runs out of characters, or until they’re found to be equal. In order to offer the same sort of functionality with string data, we need to turn strings into numbers. In this section, we’ll talk about methods of converting strings into numbers that can be used with Redis ZSETs in order to sort based on string prefixes. After reading this section, you should be able to sort your ZSET search results with strings.

Our first step in translating strings into numbers is understanding the limitations of what we can do. Because Redis uses IEEE 754 double-precision floating-point values to store scores, we’re limited to at most 64 bits worth of storage. Due to some subtleties in the way doubles represent numbers, we can’t use all 64 bits. Technically, we could use more than the equivalent of 63 bits, but that doesn’t buy us significantly more than 63 bits, and for our case, we’ll only use 48 bits for the sake of simplicity. Using 48 bits limits us to prefixes of 6 bytes on our data, which is often sufficient.

To convert our string into an integer, we’ll trim our string down to six characters (as necessary), converting each character into its ASCII value. We’ll then extend the values to six entries for strings shorter than six characters. Finally, we’ll combine all of the values into an integer. Our code for converting a string into a score can be seen next.

Most of our string_to_score() function should be straightforward, except for maybe our use of -1 as a filler value for strings shorter than six characters, and our use< of 257 as a multiplier before adding each character value to the score. For many applications, being able to differentiate between hello and hello can be important, so we take steps to differentiate the two, primarily by adding 1 to all ASCII values (making null 1), and using 0 (-1 + 1) as a filler value for short strings. As a bonus, we use an extra bit to tell us whether a string is more than six characters long, which helps us with similar six-character prefixes.2

By mapping strings to scores, we’re able to get a prefix comparison of a little more than the first six characters of our string. For non-numeric data, this is more or less what we can reasonably do without performing extensive numeric gymnastics, and without running into issues with how a non-Python library transfers large integers (that may or may not have been converted to a double).

When using scores derived from strings, the scores aren’t always useful for combining with other scores and are typically only useful for defining a single sort order.

Exercise: Autocompleting with strings as scores

Back in section 6.1.2, we used ZSETs with scores set to 0 to allow us to perform prefix matching on user names. We had to add items to the ZSET and either use WATCH/MULTI/EXEC or the lock that we covered in section 6.2 to make sure that we fetched the correct range. But if instead we added names with scores being the result of string_to_score() on the name itself, we could bypass the use of WATCH/MULTI/EXEC and locks when someone is looking for a prefix of at most six characters by using ZRANGEBYSCORE, with the endpoints we had calculated before being converted into scores as just demonstrated. Try rewriting our find_prefix_range() and autocomplete_on_prefix() functions to use ZRANGEBYSCORE instead.

Exercise: Autocompleting with longer strings

In this section and for the previous exercise, we converted arbitrary binary strings to scores, which limited us to six-character prefixes. By reducing the number of valid characters in our input strings, we don’t need to use a full 8+ bits per input character. Try to come up with a method that would allow us to use more than a six-character prefix if we only needed to autocomplete on lowercase alphabetic characters.

This is because the score that we produced from the string doesn’t really mean anything, aside from defining a sort order.

Now that we can sort on arbitrary data, and you’ve seen how to use weights to adjust and combine numeric data, you’re ready to read about how we can use Redis SETs and ZSETs to target ads.

2 If we didn’t care about differentiating between hello and hello, then we wouldn’t need the filler. If we didn’t need the filler, we could replace our multiplier of 257 with 256 and get rid of the +1 adjustment. But with the filler, we actually use .0337 additional bits to let us differentiate short strings from strings that have nulls. And when combined with the extra bit we used to distinguish whether we have strings longer than six characters, we actually use 49.0337 total bits.