We are now, simply, Redis

Learn More

e-Book - Redis in Action

This book covers the use of Redis, an in-memory database/data structure server

  • Redis in Action – Home
  • Foreword
  • Preface
  • Acknowledgments
  • About this Book
  • About the Cover Illustration
  • Part 1: Getting Started
  • Part 2: Core concepts
  • Part 3: Next steps
  • Appendix A
  • Appendix B
  • Buy the paperback
  • Redis in Action – Home
  • Foreword
  • Preface
  • Acknowledgments
  • About this Book
  • About the Cover Illustration
  • Part 1: Getting Started
  • Part 2: Core concepts
  • Part 3: Next steps
  • Appendix A
  • Appendix B
  • Buy the paperback

    7.3.4 Learning from user behavior

    As ads are shown to users, we have the opportunity to gain insight into what can cause
    someone to click on an ad. In the last section, we talked about using words as bonuses to ads that have already matched the required location. In this section, we’ll talk about
    how we can record information about those words and the ads that were targeted to
    discover basic patterns about user behavior in order to develop per-word, per-adtargeting

    A crucial question you should be asking yourself is “Why are we using words in the
    web page content to try to find better ads?” The simple reason is that ad placement is
    all about context. If a web page has content related to the safety of children’s toys, showing
    an ad for a sports car probably won’t do well. By matching words in the ad with
    words in the web page content, we get a form of context matching quickly and easily.

    One thing to remember during this discussion is that we aren’t trying to be perfect.
    We aren’t trying to solve the ad-targeting and learning problem completely; we’re trying
    to build something that will work “pretty well” with simple and straightforward methods.
    As such, our note about the fact that this isn’t mathematically rigorous still applies.


    The first step in our learning process is recording the results of our ad targeting with
    the record_targeting_result() function that we called earlier from listing 7.11.
    Overall, we’ll record some information about the ad-targeting results, which we’ll
    later use to help us calculate click-through rates, action rates, and ultimately eCPM
    bonuses for each word. We’ll record the following:

    • Which words were targeted with the given ad
    • The total number of times that a given ad has been targeted
    • The total number of times that a word in the ad was part of the bonus calculation

    To record this information, we’ll store a SET of the words that were targeted and keep
    counts of the number of times that the ad and words were seen as part of a single ZSET
    per ad. Our code for recording this information is shown next.

    Listing 7.14A method for recording the result after we’ve targeted an ad
    def record_targeting_result(conn, target_id, ad_id, words):
        pipeline = conn.pipeline(True)
        terms = conn.smembers('terms:' + ad_id)
        matched = list(words & terms)

    Find the words in the content that matched with the words in the ad.

        if matched:
            matched_key = 'terms:matched:%s' % target_id
            pipeline.sadd(matched_key, *matched)
            pipeline.expire(matched_key, 900)

    If any words in the ad matched the content, record that information and keep it for 15 minutes.

        type = conn.hget('type:', ad_id)
        pipeline.incr('type:%s:views:' % type)

    Keep a per-type count of the number of views that each ad received.

        for word in matched:
            pipeline.zincrby('views:%s' % ad_id, word)
        pipeline.zincrby('views:%s' % ad_id, '')

    Record view information for each word in the ad, as well as the ad itself.

        if not pipeline.execute()[-1] % 100:
            update_cpms(conn, ad_id)

    Every 100th time that the ad was shown, update the ad’s eCPM.

    That function does everything we said it would do, and you’ll notice a call to
    update_cpms(). This update_cpms() function is called every 100th time the ad is
    returned from a call. This function really is the core of the learning phase—it writes
    back to our per-word, per-ad-targeting bonus ZSETs.

    We’ll get to updating the eCPM of an ad in a moment, but first, let’s see what happens
    when an ad is clicked.


    As we record views, we’re recording half of the data for calculating CTRs. The other
    half of the data that we need to record is information about the clicks themselves, or
    in the case of a cost per action ad, the action. Numerically, this is because our eCPM
    calculations are based on this formula: (value of a click or action) x (clicks or actions)
    / views. Without recording clicks and actions, the numerator of our value calculation
    is 0, so we can’t discover anything useful.

    When someone actually clicks on an ad, prior to redirecting them to their final
    destination, we’ll record the click in the total aggregates for the type of ad, as well as
    whether the ad got a click and which words matched the clicked ad. We’ll record the
    same information for actions. Our function for recording clicks is shown next.

    Listing 7.15A method for recording clicks on an ad
    def record_click(conn, target_id, ad_id, action=False):
        pipeline = conn.pipeline(True)
        click_key = 'clicks:%s'%ad_id
        match_key = 'terms:matched:%s'%target_id
        type = conn.hget('type:', ad_id)
        if type == 'cpa':
            pipeline.expire(match_key, 900) 

    If the ad was a CPA ad, refresh the expiration time of the matched terms if it’s still available.

            if action:
                click_key = 'actions:%s' % ad_id

    Record actions instead of clicks.

        if action and type == 'cpa':
            pipeline.incr('type:cpa:actions:' % type)
            pipeline.incr('type:%s:clicks:' % type)

    Keep a global count of clicks/ actions for ads based on the ad type.

        matched = list(conn.smembers(match_key))
        for word in matched:
            pipeline.zincrby(click_key, word)

    Record clicks (or actions) for the ad and for all words that had been targeted in the ad.

        update_cpms(conn, ad_id)

    Update the eCPM for all words that were seen in the ad.

    You’ll notice there are a few parts of the recording function that we didn’t mention
    earlier. In particular, when we receive a click or an action for a CPA ad, we’ll refresh
    the expiration of the words that were a part of the ad-targeting call. This will let an
    action following a click count up to 15 minutes after the initial click-through to the
    destination site happened.

    Another change is that we’ll optionally be recording actions in this call for CPA
    ads; we’ll assume e that this function is called with the action parameter set to True in
    that case.

    And finally, we’ll call the update_cpms() function for every click/action because
    they should happen roughly once every 100–2000 views (or more), so each individual
    click/action is important relative to a view.

    Exercise: Changing the way we count clicks and actions

    In listing 7.15, we define a record_click() function to add 1 to every word that
    was targeted as part of an ad that was clicked on. Can you think of a different number
    to add to a word that may make more sense? Hint: You may want to consider
    this number to be related to the count of matched words. Can you update
    finish_scoring() and record_click() to take into consideration this new click/
    action value?

    To complete our learning process, we only need to define our final update_cpms()


    We’ve been talking about and using the update_cpms() function for a couple of sections
    now, and hopefully you already have an idea of what happens inside of it.
    Regardless, we’ll walk through the different parts of how we’ll update our per-word,
    per-ad bonus eCPMs, as well as how we’ll update our per-ad eCPMs.

    The first part to updating our eCPMs is to know the click-through rate of an ad by
    itself. Because we’ve recorded both the clicks and views for each ad overall, we have
    the click-through rate by pulling both of those scores from the relevant ZSETs. By combining
    that click-through rate with the ad’s actual value, which we fetch from the ad’s
    base value ZSET, we can calculate the eCPM of the ad over all clicks and views.

    The second part to updating our eCPMs is to know the CTR of words that were
    matched in the ad itself. Again, because we recorded all views and clicks involving the
    ad, we have that information. And because we have the ad’s base value, we can calculate
    the eCPM. When we have the word’s eCPM, we can subtract the ad’s eCPM from it
    to determine the bonus that the word matching contributes. This difference is what’s
    added to the per-word, per-ad bonus ZSETs.

    The same calculation is performed for actions as was performed for clicks, the only
    difference being that we use the action count ZSETs instead of the click count ZSETs.
    Our method for updating eCPMs for clicks and actions can be seen in the next listing.

    Listing 7.16A method for updating eCPMs and per-word eCPM bonuses for ads
    def update_cpms(conn, ad_id):
        pipeline = conn.pipeline(True)
            pipeline.hget('type:', ad_id)
            pipeline.zscore('ad:base_value:', ad_id)
            pipeline.smembers('terms:' + ad_id)
            type, base_value, words = pipeline.execute()

    Fetch the type and value of the ad, as well as all of the words in the ad.

            which = 'clicks'
            if type == 'cpa':
                which = 'actions'

    Determine whether the eCPM of the ad should be based on clicks or actions.

            pipeline.get('type:%s:views:' % type)
            pipeline.get('type:%s:%s' % (type, which))
            type_views, type_clicks = pipeline.execute()

    Fetch the current number of views and clicks/actions for the given ad type.

            AVERAGE_PER_1K[type] = (
                1000. * int(type_clicks or '1') / int(type_views or '1'))

    Write back to our global dictionary the clickthrough rate or action rate for the ad.

            if type == 'cpm':

    If we’re processing a CPM ad, then we don’t update any of the eCPMs; they’re already updated.

            view_key = 'views:%s' % ad_id
            click_key = '%s:%s' % (which, ad_id)
            to_ecpm = TO_ECPM[type]
            pipeline.zscore(view_key, '')
            pipeline.zscore(click_key, '')
            ad_views, ad_clicks = pipeline.execute()

    Fetch the per-ad view and click/action scores.

            if (ad_clicks or 0) < 1:
                ad_ecpm = conn.zscore('idx:ad:value:', ad_id)

    Use the existing eCPM if the ad hasn’t received any clicks yet.

                ad_ecpm = to_ecpm(ad_views or 1, ad_clicks or 0, base_value)
                pipeline.zadd('idx:ad:value:', ad_id, ad_ecpm)

    Calculate the ad’s eCPM and update the ad’s value.

            for word in words:
                pipeline.zscore(view_key, word)
                pipeline.zscore(click_key, word)
                views, clicks = pipeline.execute()[-2:]

    Fetch the view and click/action scores for the word.

                if (clicks or 0) < 1:

    Don’t update eCPMs when the ad hasn’t received any clicks.

                word_ecpm = to_ecpm(views or 1, clicks or 0, base_value)

    Calculate the word’s eCPM.

                bonus = word_ecpm - ad_ecpm

    Calculate the word’s bonus.

                pipeline.zadd('idx:' + word, ad_id, bonus)

    Write the word’s bonus back to the per-word, per-ad ZSET.


    Exercise: Optimizing eCPM calculations

    In listing 7.16, we perform a number of round trips to Redis that’s relative to the
    number of words that were targeted. More specifically, we perform the number of
    words plus three round trips. In most cases, this should be relatively small (considering
    that most ads won’t have a lot of content or related keywords). But even so,
    some of these round trips can be avoided. Can you update the update_cpms() function
    to perform a total of only three round trips?

    In our update_cpms() function, we updated the global per-type click-through and
    action rates, the per-ad eCPMs, and the per-word, per-ad bonus eCPMs.

    With the learning portion of our ad-targeting process now complete, we’ve now
    built a complete ad-targeting engine from scratch. This engine will learn over time,
    adapt the ads it returns over time, and more. We can make many possible additions or
    changes to this engine to make it perform even better, some of which are mentioned
    in the exercises, and several of which are listed next. These are just starting points for
    building with Redis:

    • Over time, the total number of clicks and views for each ad will stabilize around
      a particular ratio, and subsequent clicks and views won’t alter that ratio significantly.
      The real CTR of an ad will vary based on time of day, day of week, and
      more. Consider degrading an ad’s click/action and view count on a regular
      basis, as we did in section 2.5 with our rescale_viewed() call. This same concept
      applies to our global expected CTRs.
    • To extend the learning ability beyond just a single count, consider keeping
      counts over the last day, week, and other time slices. Also consider ways of
      weighing those time slices differently, depending on their age. Can you think of
      a method to learn proper weights of different ages of counts?
    • All of the big ad networks use second-price auctions in order to charge for a
      given ad placement. More specifically, rather than charging a fixed rate per
      click, per thousand views, or per action, you charge a rate that’s relative to the
      second-highest value ad that was targeted.
    • In most ad networks, there’ll be a set of ads that rotate through the highest-value
      slot for a given set of keywords as each of them runs out of money. These ads are
      there because their high value and CTR earn the top slot. This means that new
      ads that don’t have sufficiently high values will never be seen, so the network will
      never discover them. Rather than picking the highest-eCPM ads 100% of the
      time, fetch the top 100 ads and choose ads based on the relative values of their
      eCPMs anywhere from 10%-50% of the time (depending on how you want to balance
      learning true eCPMs and earning the most money).
    • When ads are initially placed into the system, we know little about what to
      expect in terms of eCPM. We addressed this briefly by using the average CTR of
      all ads of the same type, but this is moot the moment a single click comes in.
      Another method mixes the average CTR for a given ad type, along with the seen
      CTR for the ad based on the number of views that the ad has seen. A simple
      inverse linear or inverse sigmoid relationship between the two can be used until
      the ad has had sufficient views (2,000–5,000 views is typically enough to determine
      a reliable CTR).
    • In addition to mixing the average CTR for a given ad type with the CTR of the
      ad during the learning process, ads that are in the process of reaching an
      initial 2,000–5,000 views can have their CTR/eCPM artificially boosted. This
      can ensure sufficient traffic for the system to learn the ads’ actual eCPMs.
    • Our method of learning per-word bonuses is similar to Bayesian statistics. We
      could use real Bayesian statistics, neural networks, association rule learning,
      clustering, or other techniques to calculate the bonuses. These other methods
      may offer more mathematically rigorous results, which may result in better
      CTRs, and ultimately more money earned.
    • In our code listings, we recorded views, clicks, and actions as part of the calls
      that return the ads or handle any sort of redirection. These operations may take
      a long time to execute, so should be handled after the calls have returned by
      being executed as an external task, as we discussed in section 6.4.

    As you can see from our list, many additions and improvements can and should be
    made to this platform. But as an initial pass, what we’ve provided can get you started
    in learning about and building the internet’s next-generation ad-targeting platform.

    Now that you’ve learned about how to build an ad-targeting platform, let’s keep
    going to see how to use search tools to find jobs that candidates are qualified for as
    part of a job-search tool.