RedisDays Available Now On-Demand.

7.3.4 Learning from user behavior

  • Redis in Action – Home
  • Foreword
  • Preface
  • Acknowledgments
  • About this Book
  • About the Cover Illustration
  • Part 1: Getting Started
  • Part 2: Core concepts
  • Part 3: Next steps
  • Appendix A
  • Appendix B
  • Buy the paperback
  • Redis in Action – Home
  • Foreword
  • Preface
  • Acknowledgments
  • About this Book
  • About the Cover Illustration
  • Part 1: Getting Started
  • Part 2: Core concepts
  • Part 3: Next steps
  • Appendix A
  • Appendix B
  • Buy the paperback

    7.3.4 Learning from user behavior

    As ads are shown to users, we have the opportunity to gain insight into what can cause someone to click on an ad. In the last section, we talked about using words as bonuses to ads that have already matched the required location. In this section, we’ll talk about how we can record information about those words and the ads that were targeted to discover basic patterns about user behavior in order to develop per-word, per-adtargeting bonuses.

    A crucial question you should be asking yourself is “Why are we using words in the web page content to try to find better ads?” The simple reason is that ad placement is all about context. If a web page has content related to the safety of children’s toys, showing an ad for a sports car probably won’t do well. By matching words in the ad with words in the web page content, we get a form of context matching quickly and easily.

    One thing to remember during this discussion is that we aren’t trying to be perfect. We aren’t trying to solve the ad-targeting and learning problem completely; we’re trying to build something that will work “pretty well” with simple and straightforward methods. As such, our note about the fact that this isn’t mathematically rigorous still applies.

    RECORDING VIEWS

    The first step in our learning process is recording the results of our ad targeting with the record_targeting_result() function that we called earlier from listing 7.11. Overall, we’ll record some information about the ad-targeting results, which we’ll later use to help us calculate click-through rates, action rates, and ultimately eCPM bonuses for each word. We’ll record the following:

    • Which words were targeted with the given ad
    • The total number of times that a given ad has been targeted
    • The total number of times that a word in the ad was part of the bonus calculation

    To record this information, we’ll store a SET of the words that were targeted and keep counts of the number of times that the ad and words were seen as part of a single ZSET per ad. Our code for recording this information is shown next.

    Listing 7.14 A method for recording the result after we’ve targeted an ad
    def record_targeting_result(conn, target_id, ad_id, words):
        pipeline = conn.pipeline(True)
    
    
     
        terms = conn.smembers('terms:' + ad_id)
        matched = list(words & terms)
    

    Find the words in the content that matched with the words in the ad.

        if matched:
    
     
            matched_key = 'terms:matched:%s' % target_id
    
     
            pipeline.sadd(matched_key, *matched)
            pipeline.expire(matched_key, 900)
    
    

    If any words in the ad matched the content, record that information and keep it for 15 minutes.

        type = conn.hget('type:', ad_id)
        pipeline.incr('type:%s:views:' % type)
    

    Keep a per-type count of the number of views that each ad received.

        for word in matched:
            pipeline.zincrby('views:%s' % ad_id, word)
        pipeline.zincrby('views:%s' % ad_id, '')
    
    

    Record view information for each word in the ad, as well as the ad itself.

        if not pipeline.execute()[-1] % 100:
            update_cpms(conn, ad_id)
    

    Every 100th time that the ad was shown, update the ad’s eCPM.

     

    That function does everything we said it would do, and you’ll notice a call to update_cpms(). This update_cpms() function is called every 100th time the ad is returned from a call. This function really is the core of the learning phase—it writes back to our per-word, per-ad-targeting bonus ZSETs.

    We’ll get to updating the eCPM of an ad in a moment, but first, let’s see what happens when an ad is clicked.

    RECORDING CLICKS AND ACTIONS

    As we record views, we’re recording half of the data for calculating CTRs. The other half of the data that we need to record is information about the clicks themselves, or in the case of a cost per action ad, the action. Numerically, this is because our eCPM calculations are based on this formula: (value of a click or action) x (clicks or actions) / views. Without recording clicks and actions, the numerator of our value calculation is 0, so we can’t discover anything useful.

    When someone actually clicks on an ad, prior to redirecting them to their final destination, we’ll record the click in the total aggregates for the type of ad, as well as whether the ad got a click and which words matched the clicked ad. We’ll record the same information for actions. Our function for recording clicks is shown next.

    Listing 7.15 A method for recording clicks on an ad
    def record_click(conn, target_id, ad_id, action=False):
        pipeline = conn.pipeline(True)
        click_key = 'clicks:%s'%ad_id
    
        match_key = 'terms:matched:%s'%target_id
    
        type = conn.hget('type:', ad_id)
    
     
        if type == 'cpa':
            pipeline.expire(match_key, 900) 
    

    If the ad was a CPA ad, refresh the expiration time of the matched terms if it’s still available.

            if action:
    
     
                click_key = 'actions:%s' % ad_id
    
    

    Record actions instead of clicks.

        if action and type == 'cpa':
    
     
            pipeline.incr('type:cpa:actions:' % type)
            pipeline.incr('type:%s:clicks:' % type)
    
    

    Keep a global count of clicks/ actions for ads based on the ad type.

        matched = list(conn.smembers(match_key))
        matched.append('')
        for word in matched:
            pipeline.zincrby(click_key, word)
    

    Record clicks (or actions) for the ad and for all words that had been targeted in the ad.

        pipeline.execute()
    
    
     
        update_cpms(conn, ad_id)
    

    Update the eCPM for all words that were seen in the ad.

     

    You’ll notice there are a few parts of the recording function that we didn’t mention earlier. In particular, when we receive a click or an action for a CPA ad, we’ll refresh the expiration of the words that were a part of the ad-targeting call. This will let an action following a click count up to 15 minutes after the initial click-through to the destination site happened.

    Another change is that we’ll optionally be recording actions in this call for CPA ads; we’ll assume e that this function is called with the action parameter set to True in that case.

    And finally, we’ll call the update_cpms() function for every click/action because they should happen roughly once every 100–2000 views (or more), so each individual click/action is important relative to a view.

    Exercise: Changing the way we count clicks and actions

    In listing 7.15, we define a record_click() function to add 1 to every word that was targeted as part of an ad that was clicked on. Can you think of a different number to add to a word that may make more sense? Hint: You may want to consider this number to be related to the count of matched words. Can you update finish_scoring() and record_click() to take into consideration this new click/action value?

     

    To complete our learning process, we only need to define our final update_cpms() function.

    UPDATING ECPMS

    We’ve been talking about and using the update_cpms() function for a couple of sections now, and hopefully you already have an idea of what happens inside of it. Regardless, we’ll walk through the different parts of how we’ll update our per-word, per-ad bonus eCPMs, as well as how we’ll update our per-ad eCPMs.

    The first part to updating our eCPMs is to know the click-through rate of an ad by itself. Because we’ve recorded both the clicks and views for each ad overall, we have the click-through rate by pulling both of those scores from the relevant ZSETs. By combining that click-through rate with the ad’s actual value, which we fetch from the ad’s base value ZSET, we can calculate the eCPM of the ad over all clicks and views.

    The second part to updating our eCPMs is to know the CTR of words that were matched in the ad itself. Again, because we recorded all views and clicks involving the ad, we have that information. And because we have the ad’s base value, we can calculate the eCPM. When we have the word’s eCPM, we can subtract the ad’s eCPM from it to determine the bonus that the word matching contributes. This difference is what’s added to the per-word, per-ad bonus ZSETs.

    The same calculation is performed for actions as was performed for clicks, the only difference being that we use the action count ZSETs instead of the click count ZSETs. Our method for updating eCPMs for clicks and actions can be seen in the next listing.

    Listing 7.16 A method for updating eCPMs and per-word eCPM bonuses for ads
    def update_cpms(conn, ad_id):
        pipeline = conn.pipeline(True)
    
     
            pipeline.hget('type:', ad_id)
            pipeline.zscore('ad:base_value:', ad_id)
            pipeline.smembers('terms:' + ad_id)
            type, base_value, words = pipeline.execute()
    
    

    Fetch the type and value of the ad, as well as all of the words in the ad.

            which = 'clicks'
            if type == 'cpa':
                which = 'actions'
    
    

    Determine whether the eCPM of the ad should be based on clicks or actions.

            pipeline.get('type:%s:views:' % type)
            pipeline.get('type:%s:%s' % (type, which))
            type_views, type_clicks = pipeline.execute()
    

    Fetch the current number of views and clicks/actions for the given ad type.

            AVERAGE_PER_1K[type] = (
                1000. * int(type_clicks or '1') / int(type_views or '1'))
    
    

    Write back to our global dictionary the clickthrough rate or action rate for the ad.

            if type == 'cpm':
                return
    
    

    If we’re processing a CPM ad, then we don’t update any of the eCPMs; they’re already updated.

            view_key = 'views:%s' % ad_id
            click_key = '%s:%s' % (which, ad_id)
    
            to_ecpm = TO_ECPM[type]
    
    
     
            pipeline.zscore(view_key, '')
            pipeline.zscore(click_key, '')
            ad_views, ad_clicks = pipeline.execute()
    

    Fetch the per-ad view and click/action scores.

            if (ad_clicks or 0) < 1:
                ad_ecpm = conn.zscore('idx:ad:value:', ad_id)
    

    Use the existing eCPM if the ad hasn’t received any clicks yet.

            else:
    
     
                ad_ecpm = to_ecpm(ad_views or 1, ad_clicks or 0, base_value)
                pipeline.zadd('idx:ad:value:', ad_id, ad_ecpm)
    
    

    Calculate the ad’s eCPM and update the ad’s value.

            for word in words:
    
     
                pipeline.zscore(view_key, word)
                pipeline.zscore(click_key, word)
                views, clicks = pipeline.execute()[-2:]
    
    

    Fetch the view and click/action scores for the word.

                if (clicks or 0) < 1:
                    continue
    
    

    Don’t update eCPMs when the ad hasn’t received any clicks.

                word_ecpm = to_ecpm(views or 1, clicks or 0, base_value)
    

    Calculate the word’s eCPM.

                bonus = word_ecpm - ad_ecpm
    

    Calculate the word’s bonus.

                pipeline.zadd('idx:' + word, ad_id, bonus)
    

    Write the word’s bonus back to the per-word, per-ad ZSET.

            pipeline.execute()
    
     

     

    Exercise: Optimizing eCPM calculations

    In listing 7.16, we perform a number of round trips to Redis that’s relative to the number of words that were targeted. More specifically, we perform the number of words plus three round trips. In most cases, this should be relatively small (considering that most ads won’t have a lot of content or related keywords). But even so, some of these round trips can be avoided. Can you update the update_cpms() function to perform a total of only three round trips?

     

    In our update_cpms() function, we updated the global per-type click-through and action rates, the per-ad eCPMs, and the per-word, per-ad bonus eCPMs.

    With the learning portion of our ad-targeting process now complete, we’ve now built a complete ad-targeting engine from scratch. This engine will learn over time, adapt the ads it returns over time, and more. We can make many possible additions or changes to this engine to make it perform even better, some of which are mentioned in the exercises, and several of which are listed next. These are just starting points for building with Redis:

    • Over time, the total number of clicks and views for each ad will stabilize around a particular ratio, and subsequent clicks and views won’t alter that ratio significantly. The real CTR of an ad will vary based on time of day, day of week, and more. Consider degrading an ad’s click/action and view count on a regular basis, as we did in section 2.5 with our rescale_viewed() call. This same concept applies to our global expected CTRs.
    • To extend the learning ability beyond just a single count, consider keeping counts over the last day, week, and other time slices. Also consider ways of weighing those time slices differently, depending on their age. Can you think of a method to learn proper weights of different ages of counts?
    • All of the big ad networks use second-price auctions in order to charge for a given ad placement. More specifically, rather than charging a fixed rate per click, per thousand views, or per action, you charge a rate that’s relative to the second-highest value ad that was targeted.
    • In most ad networks, there’ll be a set of ads that rotate through the highest-value slot for a given set of keywords as each of them runs out of money. These ads are there because their high value and CTR earn the top slot. This means that new ads that don’t have sufficiently high values will never be seen, so the network will never discover them. Rather than picking the highest-eCPM ads 100% of the time, fetch the top 100 ads and choose ads based on the relative values of their eCPMs anywhere from 10%-50% of the time (depending on how you want to balance learning true eCPMs and earning the most money).
    • When ads are initially placed into the system, we know little about what to expect in terms of eCPM. We addressed this briefly by using the average CTR of all ads of the same type, but this is moot the moment a single click comes in. Another method mixes the average CTR for a given ad type, along with the seen CTR for the ad based on the number of views that the ad has seen. A simple inverse linear or inverse sigmoid relationship between the two can be used until the ad has had sufficient views (2,000–5,000 views is typically enough to determine a reliable CTR).
    • In addition to mixing the average CTR for a given ad type with the CTR of the ad during the learning process, ads that are in the process of reaching an initial 2,000–5,000 views can have their CTR/eCPM artificially boosted. This can ensure sufficient traffic for the system to learn the ads’ actual eCPMs.
    • Our method of learning per-word bonuses is similar to Bayesian statistics. We could use real Bayesian statistics, neural networks, association rule learning, clustering, or other techniques to calculate the bonuses. These other methods may offer more mathematically rigorous results, which may result in better CTRs, and ultimately more money earned.
    • In our code listings, we recorded views, clicks, and actions as part of the calls that return the ads or handle any sort of redirection. These operations may take a long time to execute, so should be handled after the calls have returned by being executed as an external task, as we discussed in section 6.4.

    As you can see from our list, many additions and improvements can and should be made to this platform. But as an initial pass, what we’ve provided can get you started in learning about and building the internet’s next-generation ad-targeting platform.

    Now that you’ve learned about how to build an ad-targeting platform, let’s keep going to see how to use search tools to find jobs that candidates are qualified for as part of a job-search tool.