Full text search is a technique used in information retrieval that allows searching for documents or data based on the presence of keywords or phrases within the entire text of the document. Unlike traditional search methods that rely on simple matching of keywords, full text search takes into account the context, synonyms, and word proximity to provide more relevant search results.
Full text search engines use algorithms to index the content of documents or data sources, and allow users to query the index using natural language queries or Boolean operators to filter and refine results. Full text search is commonly used in databases, search engines, content management systems, and other applications that require efficient and accurate searching of large volumes of text-based data.
Prior to the advent of modules, full-text search was implemented using native Redis commands. The RediSearch module provides much higher performance than this pattern. However, in some environments, RediSearch is not available. Additionally, this pattern is very interesting and can be generalized to other workloads for which RediSearch may not be ideal.
Let’s say you want to search a number of text documents—this may not be an obvious use case for Redis as it access via keys rather than tables. But on the contrary, Redis can be used to underpin a very novel full-text search engine.
First, let’s take some examples:
Let’s break down these items into sets of words just limited by space for simplicity:
> SADD ex1 redis is very fast > SADD ex2 cheetahs are very fast > SADD ex3 cheetahs have spots
Notice that we’re giving each line its own set (ex1…) and then we’re adding multiple members to that set based on each word (even though it might looks we’re just adding the entire line, SADD is variadic, so accepts multiple members. We’ve also turned all the words lowercase.
Next we need to invert this index and show which word is located in which document. To do this, we’ll make a set for each word and then put the document set names as members.
> SADD redis ex1 > SADD is ex1 > SADD very ex1 ex2 > SADD fast ex1 ex2 > SADD cheetahs ex2 ex3 > SADD have ex3 > SADD spots ex3
For clarity, we’ve split this up into individual commands, but all the commands would normally be atomicly executed with a MULTI/EXEC block.
To query our tiny full-text search index, we will use the SINTER command (set intersect). To find the documents with “very” and “fast”
> SINTER very fast 1) "ex2" 2) "ex1"
In a situation where we don’t have any documents that match the query, we’ll get an empty result:
> SINTER cheetahs redis (empty list or set)
If you wanted an logical or search, you can substitute SUNION (set union) for SINTER.
> SUNION cheetahs redis 1) "ex2" 2) "ex3" 3) "ex1"
Deleting an item from the index is a little more involved. First, we’ll get the document index members from the document set (SMEMBERS) then remove the document IDs from the word indexes.
> SMEMBERS ex3 1) "have" 2) "cheetahs" 3) "spots" > SREM have ex3 > SREM cheetahs ex3 > SREM spots ex3
This cannot be completed in a single operation in Redis, so you’ll need to get the results of SMEMBERS then issue the SREM commands afterwards.
Of course, this is a very simple full-text search. You create a more reflective index by using Sorted Set commands instead of Set commands. This way, as example, if a document contains a word more than once, you can have it “rank” higher than a document that has only one occurrence. The patterns above stay more-or-less the same except use Sorted Set commands.