Break the data matrix. Explore what Redis has to offer.
Hi. My name’s Guy Royse and I’m a brand-new Developer Advocate here at Redis. Normally I don’t introduce myself at the beginning of a blog post (I think it’s a bit gauche) but since this is my first post for Redis, it seemed appropriate.
This might come as a surprise, but I’ve never put Redis into an application before. Well, more accurately, I had never put Redis into an application until right before I started working at Redis. Before that I had never really given Redis more than a cursory examination. There are a lot of other things out there to learn and the opportunity never came up to use it. Plus, at the end of my consulting career, I was mostly doing frontend and mobile stuff—it seemed like Redis tends to be used more for backend stuff, and I just didn’t see the need.
So, I had a lot of learning to do, which seemed like an opportunity to share my take on the product in this beginner’s guide to Redis. What was my experience of learning it like? What resonated with me?
Sure, I came to Redis with practically no Redis experience and the product was essentially new to me. But I also came to Redis with 25 years of software development experience and a fair bit of developer advocacy background.
The proof of the pudding is in the eating, so this post lays out some key aspects of what I learned getting up to speed on Redis:
When I first looked at Redis, about a year ago, I saw it as primarily a cache. It’s in-memory and lets you store things. Said things have a time to live. Quacks like a duck: Must be a cache. A lot of the Internet appears to agree with me.
Now, I saw Redis as a nice cache. A really nice cache. It had useful data types, was extremely performant, and was easy to understand. If I needed a cache, I would dig deeper. But I never thought I needed that cache and so I never did.
A year or so later, I found myself interviewing at Redis. As part of the interview process I had to put together a presentation with time for Q&A and a demo.
Time to start digging.
First things first: Install Redis and poke it. I installed Redis from a Docker image, which was as easy as you think it is. But I needed the redis-cli to interact with it. The instructions involved downloading and compiling all of Redis and I didn’t want to mess with that. So, I took a stab and entered the following into my MacBook terminal:
$ brew install redis-cli
Nothing. I tried again:
$ brew install redis
Totally worked! It actually installed all of Redis, but I had redis-cli and that was all that mattered. In hindsight, I would have just downloaded and compiled Redis. It’s actually quick and easy.
Once I had Redis up and running I poked it. Hey, look! It stores things! This is pretty easy. And some of these data types are really interesting.
Now that I had Redis itself working, it was time to play with the RedisBloom module.
Modules are plugins for Redis that allow you to expand its capabilities. They let you add new data types and the commands required to interact with them. Out of the many interesting modules that have been created, I think my favorite is RedisGraph (primarily because graph databases are super cool), but the award for Most Amusingly Named goes to cthulhu.
RedisBloom is a module that expands Redis to include probabilistic data structures. A probabilistic data structure is, well, they’re sort of like the TARDIS—bigger on the inside—and JPEG compression—a bit lossy. And, like both, they are fast and accurate enough for many purposes. More technically speaking, most probabilistic data structures use hashes to give you faster and smaller data structures in exchange for accuracy. If you’ve got a mountain of data to process, this is super useful. (If you’d like to go deeper into the topic, we’ve got a great video that does just that.)
Top-K, of course, was the specific data type I started playing with. Top-K—finding the largest K elements (a.k.a. keyword frequency)—is conceptually simple enough. You keep feeding it items and it will give you the top K most frequent items, where K is an arbitrary number of your choosing. Want to know the Top 10 Most-Referenced Clickbait Articles Using This One Weird Trick? You can feed the titles into Top-K every time they are clicked and get that list. You can cram millions of entries in there and the structure will stay small and fast and still give you that list.
I decided to use Top-K to find the most frequently used words in UFO sightings. Yep. There is this really fun dataset of 112,092 UFO sightings—many of the eyewitness accounts are a delight to read. I figured I could parse the CSV file with the sightings, tokenize the accounts into individual words, and shove all the words into a Top-K data type in RedisBloom. Then I could build some sort of UI for the humans.
I ended up building a small command-line application in Python that uses Pandas to parse the CSV file and NLTK to tokenize the sightings and remove common words like “a”, “is”, and “the” from the text. Then I used the redisbloom-py library to send the words to Redis. By far, placing the data is Redis was the easiest part of the process—accomplished with just a couple lines of code:
All the code for this is available on GitHub. The commands needed to get it running are in the NOTES.md file in the root of the project.
As I was doing all of this, I was still thinking of Redis as a cache, not as a primary database. I think this is because, to me at least, the idea of “in-memory” is so tightly coupled with the idea of “temporary” that a cache is the only thing I thought to use it for.
Of course, this is wrong.
About 15 years ago, I evaluated a limited piece of Java “freeware” called Prevayler. It was based on the idea that cheap and plentiful DRAM (this was 2004) meant we could keep Java objects in memory all the time and just write out a transaction log of changes. Then, when you restarted the system, you could replay the transactions and reconstitute the objects in memory. At the time, I thought this was super cool.
That’s how it clicked for me. Redis is more than a cache.
Redis is, in a sense, a grown up version of Prevayler, with more sophisticated data types, an actual server, and better persistence. It’s written in C, not Java, which makes it fast. And most importantly, it works with any language, not just Java.
Redis is a primary database and not just a cache. It stores stuff and it caches. It’s a floor wax and a dessert topping.
Based on this epiphany, Redis is my new go-to database. When I need a database for a sample, a prototype, or to provide a backend for some other learning, Redis will be the thing I select first.
Hopefully the pudding was tasty. I’m looking forward to building all sorts of cool stuff with Redis as the primary database and sharing it with the community. Thanks for reading!