RedisDays Available Now On-Demand.

7.3.3 Targeting ads

  • Redis in Action – Home
  • Foreword
  • Preface
  • Acknowledgments
  • About this Book
  • About the Cover Illustration
  • Part 1: Getting Started
  • Part 2: Core concepts
  • Part 3: Next steps
  • Appendix A
  • Appendix B
  • Buy the paperback
  • Redis in Action – Home
  • Foreword
  • Preface
  • Acknowledgments
  • About this Book
  • About the Cover Illustration
  • Part 1: Getting Started
  • Part 2: Core concepts
  • Part 3: Next steps
  • Appendix A
  • Appendix B
  • Buy the paperback

    7.3.3 Targeting ads

    As described earlier, when we receive a request to target an ad, we’re generally looking to find the highest eCPM ad that matches the viewer’s location. In addition to matching an ad based on a location, we’ll gather and use statistics about how the content of a page matches the content of an ad, and what that can do to affect the ad’s CTR. Using these statistics, content in an ad that matches a web page can result in bonuses that contribute to the calculated eCPM of CPC and CPA ads, which can result in those types of ads being shown more.

    Before showing any ads, we won’t have any bonus scoring for any of our web page content. But as we show ads, we’ll learn about what terms in the ads help or hurt the ad’s expected performance, which will allow us to change the relative value of each of the optional targeting terms.

    To execute the targeting, we’ll union all of the relevant location SETs to produce an initial group of ads that should be shown to the viewer. Then we’ll parse the content of the page on which the ad will be shown, add any relevant bonuses, and finally calculate a total eCPM for all ads that a viewer could be shown. After we’ve calculated those eCPMs, we’ll fetch the ad ID for the highest eCPM ad, record some statistics about our targeting, and return the ad itself. Our code for targeting an ad looks like this.

    Listing 7.11 Ad targeting by location and page content bonuses
    def target_ads(conn, locations, content):
        pipeline = conn.pipeline(True)
    
     
        matched_ads, base_ecpm = match_location(pipeline, locations)
    

    Find all ads that fit the location targeting parameter, and their eCPMs.

        words, targeted_ads = finish_scoring(
    

    Finish any bonus scoring based on matching the content.

            pipeline, matched_ads, base_ecpm, content)
    
    

    Get an ID that can be used for reporting and recording of this particular ad target.

        pipeline.incr('ads:served:')
    
     
        pipeline.zrevrange('idx:' + targeted_ads, 0, 0)
    

    Fetch the top-eCPM ad ID.

        target_id, targeted_ad = pipeline.execute()[-2:]
    
     
        if not targeted_ad:
            return None, None
    
    

    If there were no ads that matched the location targeting, return nothing.

        ad_id = targeted_ad[0]
    
     
        record_targeting_result(conn, target_id, ad_id, words)
    
    

    Record the results of our targeting efforts as part of our learning process.

        return target_id, ad_id
    

    Return the target ID and the ad ID to the caller.

     

    In this first version, we hide away the details of exactly how we’re matching based on location and adding bonuses based on terms so that we can understand the general flow. The only part I didn’t mention earlier is that we’ll also generate a target ID, which is an ID that represents this particular execution of the ad targeting. This ID will allow us to track clicks later, helping us learn about which parts of the ad targeting may have contributed to the overall total clicks.

    As mentioned earlier, in order to match based on location, we’ll perform a SET union over the locations (city, state, country) that the viewer is coming from. While we’re here, we’ll also calculate the base eCPM of these ads without any bonuses applied. The code for performing this operation is shown next.

    Listing 7.12 A helper function for targeting ads based on location
    def match_location(pipe, locations):
    
     
        required = ['req:' + loc for loc in locations]
    

    Calculate the SET key names for all of the provided locations.

        matched_ads = union(pipe, required, ttl=300, _execute=False)
    

    Calculate the SET of matched ads that are valid for this location.

        return matched_ads, zintersect(pipe,
            {matched_ads: 0, 'ad:value:': 1}, _execute=False)
    

    Return the matched ads SET ID, as well as the ID of the ZSET that includes the base eCPM of all of the matched ads.

     

    This code listing does exactly what I said it would do: it finds ads that match the location of the viewer, and it calculates the eCPM of all of those ads without any bonuses based on page content applied. The only thing that may be surprising about this listing is our passing of the funny _execute keyword argument to the zintersect() function, which delays the actual execution of calculating the eCPM of the ad until later. The purpose of waiting until later is to help minimize the number of round trips between our client and Redis.

    CALCULATING TARGETING BONUSES

    The interesting part of targeting isn’t the location matching; it’s calculating the bonuses. We’re trying to discover the amount to add to an ad’s eCPM, based on words in the page content that matched words in the ad. We’ll assume that we’ve precalculated a bonus for each word in each ad (as part of the learning phase), stored in ZSETs for each word, with members being ad IDs and scores being what should be added to the eCPM.

    These word-based, per-ad eCPM bonuses have values such that the average eCPM of the ad, when shown on a page with that word, is the eCPM bonus from the word plus the known average CPM for the ad. Unfortunately, when more than one word in an ad matches the content of the page, adding all of the eCPM bonuses gives us a total eCPM bonus that probably isn’t close to reality.

    We have eCPM bonuses based on word matching for each word that are based on single word matching page content alone, or with any one or a number of other words in the ad. What we really want to find is the weighted average of the eCPMs, where the weight of each word is the number of times the word matched the page content. Unfortunately, we can’t perform the weighted average calculation with Redis ZSETs, because we can’t divide one ZSET by another.

    Numerically speaking, the weighted average lies between the geometric average and the arithmetic average, both of which would be reasonable estimates of the combined eCPM. But we can’t calculate either of those averages when the count of matching words varies. The best estimate of the ad’s true eCPM is to find the maximum and minimum bonuses, calculate the average of those two, and use that as the bonus for multiple matching words.

    BEING MATHEMATICALLY RIGOROUSMathematically speaking, our method of averaging the maximum and minimum word bonuses to determine an overall bonus isn’t rigorous. The true mathematical expectation of the eCPM with a collection of matched words is different than what we calculated. We chose to use this mathematically nonrigorous method because it gets us to a reasonable answer (the weighted average of the words is between the minimum and maximum), with relatively little work. If you choose to use this bonus method along with our later learning methods, remember that there are better ways to target ads and to learn from user behavior. I chose this method because it’s easy to write, easy to learn, and easy to improve.

    We can calculate the maximum and minimum bonuses by using ZUNIONSTOREwith the MAX and MIN aggregates. And we can calculate their average by using them as part of a ZUNIONSTOREoperation with an aggregate of SUM, and their weights each being .5. Our function for combining bonus word eCPMs with the base eCPM of the ad can be seen in the next listing.

    Listing 7.13 Calculating the eCPM of ads including content match bonuses
    def finish_scoring(pipe, matched, base, content):
        bonus_ecpm = {}
    
     
        words = tokenize(content)
    

    Tokenize the content for matching against ads.

        for word in words:
    
     
            word_bonus = zintersect(
                pipe, {matched: 0, word: 1}, _execute=False)
            bonus_ecpm[word_bonus] = 1
    

    Find the ads that are location targeted that also have one of the words in the content.

        if bonus_ecpm:
    
     
            minimum = zunion(
                pipe, bonus_ecpm, aggregate='MIN', _execute=False)
            maximum = zunion(
                pipe, bonus_ecpm, aggregate='MAX', _execute=False)
    

    Find the minimum and maximum eCPM bonuses for each ad.

            return words, zunion(
                pipe, {base:1, minimum:.5, maximum:.5}, _execute=False)
    

    Compute the total of the base, plus half of the minimum eCPM bonus, plus half of the maximum eCPM bonus.

        return words, base
    

    If there were no words in the content to match against, return just the known eCPM.

     

    As before, we continue to pass the _execute parameter to delay execution of our various ZINTERSTORE and ZUNIONSTOREoperations until after the function returns to the calling target_ads(). One thing that may be confusing is our use of ZINTERSTORE between the location-targeted ads and the bonuses, followed by a final ZUNIONSTORE call. Though we could perform fewer calls by performing a ZUNIONSTOREover all of our bonus ZSETs, followed by a single ZINTERSTORE call at the end (to match location), the majority of ad-targeting calls will perform better by performing many smaller intersections followed by a union.

    The difference between the two methods is illustrated in figure 7.5, which shows that essentially all of the data in all of the relevant word bonus ZSETs is examined when we union and then intersect. Compare that with figure 7.6, which shows that when we intersect and then union, Redis will examine far less data to produce the same result.

    After we have an ad targeted, we’re given a target_id and an ad_id. These IDs would then be used to construct an ad response that includes both pieces of information, in addition to fetching the ad copy, formatting the result, and returning the result to the web page or client that requested it.

    Figure 7.5 The data that’s examined during a union-then-intersect calculation of ad-targeting bonuses includes all ads in the relevant word bonus ZSETs, even ads that don’t meet the location matching requirements.
    Figure 7.6 The data that’s examined during an intersect-then-union calculation of ad-targeting bonuses only includes those ads that previously matched, which significantly cuts down on the amount of data that Redis will process.

    Exercise: No matching content

    If you pay careful attention to the flow of target_ads() through finish_scoring() in listings 7.11 and 7.13, you’ll notice that we don’t make any effort to deal with the case where an ad has zero matched words in the content. In that situation, the eCPM produced will actually be the average eCPM over all calls that returned that ad itself. It’s not difficult to see that this may result in an ad being shown that shouldn’t. Can you alter finish_scoring() to take into consideration ads that don’t match any content?

     

    The only part of our target_ads() function from listing 7.11 that we haven’t defined is record_targeting_result(), which we’ll now examine as part of the learning phase of ad targeting.

    4 It may seem strange to be estimating an expectation (which is arguably an estimate), but everything about targeting ads is fundamentally predicated on statistics of one kind or another. This is one of those cases where the basics can get us pretty far.