7.1.2 Sorting search results

  • Redis in Action – Home
  • Foreword
  • Preface
  • Part 1: Getting Started
  • Part 2: Core concepts
  • 1.3.1 Voting on articles
  • 1.3.2 Posting and fetching articles
  • 1.3.3 Grouping articles
  • 4.2.1 Configuring Redis for replication
  • 4.2.2 Redis replication startup process
  • 4.2.3 Master/slave chains
  • 4.2.4 Verifying disk writes
  • 5.1 Logging to Redis
  • 5.2 Counters and statistics
  • 5.3 IP-to-city and -country lookup
  • 5.4 Service discovery and configuration
  • 5.1.1 Recent logs
  • 5.1.2 Common logs
  • 5.2.2 Storing statistics in Redis
  • 5.3.1 Loading the location tables
  • 5.3.2 Looking up cities
  • 5.4.1 Using Redis to store configuration information
  • 5.4.2 One Redis server per application component
  • 5.4.3 Automatic Redis connection management
  • 8.1.1 User information
  • 8.1.2 Status messages
  • 9.1.1 The ziplist representation
  • 9.1.2 The intset encoding for SETs
  • Chapter 10: Scaling Redis
  • Chapter 11: Scripting Redis with Lua
  • 10.1 Scaling reads
  • 10.2 Scaling writes and memory capacity
  • 10.3 Scaling complex queries
  • 10.2.2 Creating a server-sharded connection decorator
  • 10.3.1 Scaling search query volume
  • 10.3.2 Scaling search index size
  • 10.3.3 Scaling a social network
  • 11.1.1 Loading Lua scripts into Redis
  • 11.1.2 Creating a new status message
  • 11.2 Rewriting locks and semaphores with Lua
  • 11.3 Doing away with WATCH/MULTI/EXEC
  • 11.4 Sharding LISTs with Lua
  • 11.5 Summary
  • 11.2.1 Why locks in Lua?
  • 11.2.2 Rewriting our lock
  • 11.2.3 Counting semaphores in Lua
  • 11.4.1 Structuring a sharded LIST
  • 11.4.2 Pushing items onto the sharded LIST
  • 11.4.4 Performing blocking pops from the sharded LIST
  • A.1 Installation on Debian or Ubuntu Linux
  • A.2 Installing on OS X
  • B.1 Forums for help
  • B.4 Data visualization and recording
  • Buy the paperback
  • Redis in Action – Home
  • Foreword
  • Preface
  • Part 1: Getting Started
  • Part 2: Core concepts
  • 1.3.1 Voting on articles
  • 1.3.2 Posting and fetching articles
  • 1.3.3 Grouping articles
  • 4.2.1 Configuring Redis for replication
  • 4.2.2 Redis replication startup process
  • 4.2.3 Master/slave chains
  • 4.2.4 Verifying disk writes
  • 5.1 Logging to Redis
  • 5.2 Counters and statistics
  • 5.3 IP-to-city and -country lookup
  • 5.4 Service discovery and configuration
  • 5.1.1 Recent logs
  • 5.1.2 Common logs
  • 5.2.2 Storing statistics in Redis
  • 5.3.1 Loading the location tables
  • 5.3.2 Looking up cities
  • 5.4.1 Using Redis to store configuration information
  • 5.4.2 One Redis server per application component
  • 5.4.3 Automatic Redis connection management
  • 8.1.1 User information
  • 8.1.2 Status messages
  • 9.1.1 The ziplist representation
  • 9.1.2 The intset encoding for SETs
  • Chapter 10: Scaling Redis
  • Chapter 11: Scripting Redis with Lua
  • 10.1 Scaling reads
  • 10.2 Scaling writes and memory capacity
  • 10.3 Scaling complex queries
  • 10.2.2 Creating a server-sharded connection decorator
  • 10.3.1 Scaling search query volume
  • 10.3.2 Scaling search index size
  • 10.3.3 Scaling a social network
  • 11.1.1 Loading Lua scripts into Redis
  • 11.1.2 Creating a new status message
  • 11.2 Rewriting locks and semaphores with Lua
  • 11.3 Doing away with WATCH/MULTI/EXEC
  • 11.4 Sharding LISTs with Lua
  • 11.5 Summary
  • 11.2.1 Why locks in Lua?
  • 11.2.2 Rewriting our lock
  • 11.2.3 Counting semaphores in Lua
  • 11.4.1 Structuring a sharded LIST
  • 11.4.2 Pushing items onto the sharded LIST
  • 11.4.4 Performing blocking pops from the sharded LIST
  • A.1 Installation on Debian or Ubuntu Linux
  • A.2 Installing on OS X
  • B.1 Forums for help
  • B.4 Data visualization and recording
  • Buy the paperback

    7.1.2 Sorting search results

    We now have the ability to arbitrarily search for words in our indexed documents. But searching is only the first step in retrieving information that we’re looking for. After we have a list of documents, we need to decide what’s important enough about each of the documents to determine its position relative to other matching documents. This question is generally known as relevance in the search world, and one way of determining whether one article is more relevant than another is which article has been updated more recently. Let’s see how we could include this as part of our search results.

    If you remember from chapter 3, the Redis SORT call allows us to sort the contents of a LIST or SET, possibly referencing external data. For each article in Fake Garage Startup’s knowledge base, we’ll also include a HASH that stores information about the article. The information we’ll store about the article includes the title, the creation timestamp, the timestamp for when the article was last updated, and the document’s ID. An example document appears in figure 7.4.

    Figure 7.4 An example document stored in a HASH

    With documents stored in this format, we can then use the SORT command to sort by one of a few different attributes. We’ve been giving our result SETs expiration times as a way of cleaning them out shortly after we’ve finished using them. But for our final SORTed result, we could keep that result around longer, while at the same time allowing for the ability to re-sort, and even paginate over the results without having to perform the search again. Our function for integrating this kind of caching and re-sorting can be seen in the following listing.

    Listing 7.5 A function to parse and search, sorting the results
    def search_and_sort(conn, query, id=None, ttl=300, sort="-updated",
                        start=0, num=20):
    

    We’ll optionally take a previous result ID, a way to sort the results, and options for paginating over the results.

        desc = sort.startswith('-')
        sort = sort.lstrip('-')
        by = "kb:doc:*->" + sort
    

    Determine which attribute to sort by and whether to sort ascending or descending.

        alpha = sort not in ('updated', 'id', 'created')
    
    

    We need to tell Redis whether we’re sorting by a number or alphabetically.

        if id and not conn.expire(id, ttl):
            id = None
    
    

    If there was a previous result, try to update its expiration time if it still exists.

        if not id:
            id = parse_and_search(conn, query, ttl=ttl)
    
    

    Perform the search if we didn’t have a past search ID, or if our results expired.

        pipeline = conn.pipeline(True)
    
     
        pipeline.scard('idx:' + id)
    

    Fetch the total number of results.

        pipeline.sort('idx:' + id, by=by, alpha=alpha,
            desc=desc, start=start, num=num)
    

    Sort the result list by the proper column and fetch only those results we want.

        results = pipeline.execute()
    
    
     
        return results[0], results[1], id
    

    Return the number of items in the results, the results we wanted, and the ID of the results so that we can fetch them again later.

     

    When searching and sorting, we can paginate over results by updating the start and num arguments; alter the sorting attribute (and order) with the sort argument; cache the results for longer or shorter with the ttl argument; and reference previous search results (to save time) with the id argument.

    Though these functions won’t let us create a search engine to compete with Google, this problem and solution are what brought me to use Redis in the first place. Limitations on SORT lead to using ZSETs to support more intricate forms of document sorting, including combining scores for a composite sort order.