One of the principles of Privacy by Design, as advocated in the FTC Privacy Report, is that when you design a business process, it’s a best practice to think carefully about how to minimize the information you collect, retain, and use in that process. Often, you can implement the feature you want, with a smaller privacy footprint, if you think carefully about your design alternatives.
As an example, let’s look at frequency capping in the online ad market. Advertisers want to limit the number of times a particular user sees ads from a particular ad campaign. This is called a “frequency cap” for the campaign. The more times a user sees an ad, the less likely that one more viewing of the ad will get them to buy; and the more likely that they’ll find the repeated ad annoying.
One way to implement frequency caps is to use third-party tracking. The ad network assigns each user a unique userID (a pseudonym), stored in a cookie on the user’s computer, and the ad network records which userIDs saw which ads on which sites. The ad network uses these records to keep a count of how many times each userID has seen each ad, and to avoid repeating ads too many times. This approach works, but it gathers a lot of data–full tracking of user activities across all sites served by the ad network.
There are at least two ways to do frequency capping without gathering so much data.
The first way is to move information storage to the client (i.e., the user’s computer). The idea is to keep a count of how many ads the user has seen from each campaign, and store those counts on the client’s computer rather than on the ad network’s computers. A blog post by Jonathan Mayer and Arvind Narayanan gives more details. The main advantage of this approach is that, because the information is stored on the user’s own computer, the user can always delete the information if they’re concerned about the privacy implications. The main drawback is that the ad network would have to re-engineer how they choose which ads to place, because ad placement decisions are normally made on the ad network’s servers but the frequency information will now be stored elsewhere.
The second way to do frequency capping with less information collection is to store information on the ad network’s server, but to think carefully about how to minimize what is stored and how to reduce its linkability back to the user. In this approach, the user still gets a unique pseudonym, stored in a cookie, but the ad network does not store a complete record of what the user did online. Instead, the ad network just keeps a count of how many times each pseudonym has seen ads from each campaign.
For example, if you see an ad for the new Monster Mega Pizza, the ad network will remember that you (i.e., your pseudonym) have seen that ad–but it won’t remember which site you were reading when you saw that ad. And for ads that aren’t frequency-capped, it won’t store anything at all. Of course, the data about you seeing the Monster Mega Pizza ad campaign can be deleted once that campaign is over.
In practice, an ad network might want to collect and retain more information, in order to make other uses of that information later. But users will probably want the ad network to be straightforward about what it is doing, and to admit that it is collecting more information than it needs for frequency capping, because it wants to make other uses of the data.
[Bonus content for geeks: The ad network can use crypto to store information with even better privacy properties. Rather than using the pseudonymous userID as a key for storing and retrieving the frequency counts, the ad network can hash the userID together with the advertiser's campaignID and use the resulting value as the storage key. Then (assuming the userID is neither recorded nor guessable) the ad network won't be able to determine whether the person who saw the Monster Mega Pizza ad also saw some other ad from a different campaign. This is easy to do and provides some extra protection for the user's privacy, while still allowing frequency capping.]
[Thanks for participants in the W3C Tracking Protection Working Group for suggesting the second approach, including the hashing trick.]
[Extra-credit homework for serious geeks: How can you use Bloom Filters to store this information more efficiently? Assume it's acceptable to refuse to show an ad to a user even though that user hasn't yet hit the cap for that ad, as long as the probability that this happens is small.]