As ads are shown to users, we have the opportunity to gain insight into what can cause someone to click on an ad. In the last section, we talked about using words as bonuses to ads that have already matched the required location. In this section, we’ll talk about how we can record information about those words and the ads that were targeted to discover basic patterns about user behavior in order to develop per-word, per-adtargeting bonuses.
A crucial question you should be asking yourself is “Why are we using words in the web page content to try to find better ads?” The simple reason is that ad placement is all about context. If a web page has content related to the safety of children’s toys, showing an ad for a sports car probably won’t do well. By matching words in the ad with words in the web page content, we get a form of context matching quickly and easily.
One thing to remember during this discussion is that we aren’t trying to be perfect. We aren’t trying to solve the ad-targeting and learning problem completely; we’re trying to build something that will work “pretty well” with simple and straightforward methods. As such, our note about the fact that this isn’t mathematically rigorous still applies.
The first step in our learning process is recording the results of our ad targeting with the record_targeting_result() function that we called earlier from listing 7.11. Overall, we’ll record some information about the ad-targeting results, which we’ll later use to help us calculate click-through rates, action rates, and ultimately eCPM bonuses for each word. We’ll record the following:
To record this information, we’ll store a SET of the words that were targeted and keep counts of the number of times that the ad and words were seen as part of a single ZSET per ad. Our code for recording this information is shown next.
def record_targeting_result(conn, target_id, ad_id, words): pipeline = conn.pipeline(True)
terms = conn.smembers('terms:' + ad_id) matched = list(words & terms)
Find the words in the content that matched with the words in the ad.
if matched:
matched_key = 'terms:matched:%s' % target_id
pipeline.sadd(matched_key, *matched) pipeline.expire(matched_key, 900)
If any words in the ad matched the content, record that information and keep it for 15 minutes.
type = conn.hget('type:', ad_id) pipeline.incr('type:%s:views:' % type)
Keep a per-type count of the number of views that each ad received.
for word in matched: pipeline.zincrby('views:%s' % ad_id, word) pipeline.zincrby('views:%s' % ad_id, '')
Record view information for each word in the ad, as well as the ad itself.
if not pipeline.execute()[-1] % 100: update_cpms(conn, ad_id)
Every 100th time that the ad was shown, update the ad’s eCPM.
That function does everything we said it would do, and you’ll notice a call to update_cpms(). This update_cpms() function is called every 100th time the ad is returned from a call. This function really is the core of the learning phase—it writes back to our per-word, per-ad-targeting bonus ZSETs.
We’ll get to updating the eCPM of an ad in a moment, but first, let’s see what happens when an ad is clicked.
As we record views, we’re recording half of the data for calculating CTRs. The other half of the data that we need to record is information about the clicks themselves, or in the case of a cost per action ad, the action. Numerically, this is because our eCPM calculations are based on this formula: (value of a click or action) x (clicks or actions) / views. Without recording clicks and actions, the numerator of our value calculation is 0, so we can’t discover anything useful.
When someone actually clicks on an ad, prior to redirecting them to their final destination, we’ll record the click in the total aggregates for the type of ad, as well as whether the ad got a click and which words matched the clicked ad. We’ll record the same information for actions. Our function for recording clicks is shown next.
def record_click(conn, target_id, ad_id, action=False): pipeline = conn.pipeline(True) click_key = 'clicks:%s'%ad_id match_key = 'terms:matched:%s'%target_id type = conn.hget('type:', ad_id)
if type == 'cpa': pipeline.expire(match_key, 900)
If the ad was a CPA ad, refresh the expiration time of the matched terms if it’s still available.
if action:
click_key = 'actions:%s' % ad_id
Record actions instead of clicks.
if action and type == 'cpa':
pipeline.incr('type:cpa:actions:' % type) pipeline.incr('type:%s:clicks:' % type)
Keep a global count of clicks/ actions for ads based on the ad type.
matched = list(conn.smembers(match_key)) matched.append('') for word in matched: pipeline.zincrby(click_key, word)
Record clicks (or actions) for the ad and for all words that had been targeted in the ad.
pipeline.execute()
update_cpms(conn, ad_id)
Update the eCPM for all words that were seen in the ad.
You’ll notice there are a few parts of the recording function that we didn’t mention earlier. In particular, when we receive a click or an action for a CPA ad, we’ll refresh the expiration of the words that were a part of the ad-targeting call. This will let an action following a click count up to 15 minutes after the initial click-through to the destination site happened.
Another change is that we’ll optionally be recording actions in this call for CPA ads; we’ll assume e that this function is called with the action parameter set to True in that case.
And finally, we’ll call the update_cpms() function for every click/action because they should happen roughly once every 100–2000 views (or more), so each individual click/action is important relative to a view.
In listing 7.15, we define a record_click() function to add 1 to every word that was targeted as part of an ad that was clicked on. Can you think of a different number to add to a word that may make more sense? Hint: You may want to consider this number to be related to the count of matched words. Can you update finish_scoring() and record_click() to take into consideration this new click/action value?
To complete our learning process, we only need to define our final update_cpms() function.
We’ve been talking about and using the update_cpms() function for a couple of sections now, and hopefully you already have an idea of what happens inside of it. Regardless, we’ll walk through the different parts of how we’ll update our per-word, per-ad bonus eCPMs, as well as how we’ll update our per-ad eCPMs.
The first part to updating our eCPMs is to know the click-through rate of an ad by itself. Because we’ve recorded both the clicks and views for each ad overall, we have the click-through rate by pulling both of those scores from the relevant ZSETs. By combining that click-through rate with the ad’s actual value, which we fetch from the ad’s base value ZSET, we can calculate the eCPM of the ad over all clicks and views.
The second part to updating our eCPMs is to know the CTR of words that were matched in the ad itself. Again, because we recorded all views and clicks involving the ad, we have that information. And because we have the ad’s base value, we can calculate the eCPM. When we have the word’s eCPM, we can subtract the ad’s eCPM from it to determine the bonus that the word matching contributes. This difference is what’s added to the per-word, per-ad bonus ZSETs.
The same calculation is performed for actions as was performed for clicks, the only difference being that we use the action count ZSETs instead of the click count ZSETs. Our method for updating eCPMs for clicks and actions can be seen in the next listing.
def update_cpms(conn, ad_id): pipeline = conn.pipeline(True)
pipeline.hget('type:', ad_id) pipeline.zscore('ad:base_value:', ad_id) pipeline.smembers('terms:' + ad_id) type, base_value, words = pipeline.execute()
Fetch the type and value of the ad, as well as all of the words in the ad.
which = 'clicks' if type == 'cpa': which = 'actions'
Determine whether the eCPM of the ad should be based on clicks or actions.
pipeline.get('type:%s:views:' % type) pipeline.get('type:%s:%s' % (type, which)) type_views, type_clicks = pipeline.execute()
Fetch the current number of views and clicks/actions for the given ad type.
AVERAGE_PER_1K[type] = ( 1000. * int(type_clicks or '1') / int(type_views or '1'))
Write back to our global dictionary the clickthrough rate or action rate for the ad.
if type == 'cpm': return
If we’re processing a CPM ad, then we don’t update any of the eCPMs; they’re already updated.
view_key = 'views:%s' % ad_id click_key = '%s:%s' % (which, ad_id) to_ecpm = TO_ECPM[type]
pipeline.zscore(view_key, '') pipeline.zscore(click_key, '') ad_views, ad_clicks = pipeline.execute()
Fetch the per-ad view and click/action scores.
if (ad_clicks or 0) < 1: ad_ecpm = conn.zscore('idx:ad:value:', ad_id)
Use the existing eCPM if the ad hasn’t received any clicks yet.
else:
ad_ecpm = to_ecpm(ad_views or 1, ad_clicks or 0, base_value) pipeline.zadd('idx:ad:value:', ad_id, ad_ecpm)
Calculate the ad’s eCPM and update the ad’s value.
for word in words:
pipeline.zscore(view_key, word) pipeline.zscore(click_key, word) views, clicks = pipeline.execute()[-2:]
Fetch the view and click/action scores for the word.
if (clicks or 0) < 1: continue
Don’t update eCPMs when the ad hasn’t received any clicks.
word_ecpm = to_ecpm(views or 1, clicks or 0, base_value)
Calculate the word’s eCPM.
bonus = word_ecpm - ad_ecpm
Calculate the word’s bonus.
pipeline.zadd('idx:' + word, ad_id, bonus)
Write the word’s bonus back to the per-word, per-ad ZSET.
pipeline.execute()
In listing 7.16, we perform a number of round trips to Redis that’s relative to the number of words that were targeted. More specifically, we perform the number of words plus three round trips. In most cases, this should be relatively small (considering that most ads won’t have a lot of content or related keywords). But even so, some of these round trips can be avoided. Can you update the update_cpms() function to perform a total of only three round trips?
In our update_cpms() function, we updated the global per-type click-through and action rates, the per-ad eCPMs, and the per-word, per-ad bonus eCPMs.
With the learning portion of our ad-targeting process now complete, we’ve now built a complete ad-targeting engine from scratch. This engine will learn over time, adapt the ads it returns over time, and more. We can make many possible additions or changes to this engine to make it perform even better, some of which are mentioned in the exercises, and several of which are listed next. These are just starting points for building with Redis:
As you can see from our list, many additions and improvements can and should be made to this platform. But as an initial pass, what we’ve provided can get you started in learning about and building the internet’s next-generation ad-targeting platform.
Now that you’ve learned about how to build an ad-targeting platform, let’s keep going to see how to use search tools to find jobs that candidates are qualified for as part of a job-search tool.