Sunday, August 12, 2007

You Are What You Look For – Search Engines And Data Profiling

Recently there was a flurry of search engine companies all touting their changes to their cookie life policies. As Google decided to go to 18 months for its cookie policy, other search providers followed suit or tried to one up them with shorter retention policies. The Thing to understand here is that, although this sounds like a privacy win for consumers, it is really something fairly unimportant. Cookies help sites know who you are. This is especially important for sites that have personably identifiable information (PII) on their users. The major search companies all have email programs that many users that have user’s information in them (both in signing up and in the message contents). Although it is true that people can give false information, the truth is that most people are honest. The new policies say that a year and a half after the last time you visit one of their sites, the cookie that they put on your machine will expire. This means that every time you go to one of that sites properties this year and a half timeline gets reset. So what’s the catch here? Well, remember that the majority of advertising on the net (and advertising is the monetization strategy for most of the web) is run by these companies. This means that if you don’t go to Google (or Gmail, Orkut, etc..) for a year and a half, this cookie still isn’t gone since you almost surely went to a site in that time period that uses Google adWords (full disclosure, Blogger, where this blog is hosted is owned by Google). Yahoo and MSN/Live are much the same (though Google has the vast majority of the search and ad traffic on the net). So although in theory there is a way for this data to go away, in reality it is quite unlikely that it will. Also, remember that a cookie is just an identifier; the real information on you is stored on the company servers. This means if you choose to delete your cookies (all browsers have this options and most can even let you block them) your information is still tracked since the next time you visit one of these sites, the cookie will be put back on your computer. The real key is what happens with the information.

Search/Ad companies really aren’t that interested in tracking “you”, they are interested in tracking who you are. They do this not out of some Orwellian desire to observe and control, they do it because it helps them sell more relevant ads to their clients who pay more money to get their ads shown to the people they want to target. By seeing where you go, what you write in your emails (yes Gmail does an automated search of all of your email) these companies can build a profile of who you are; and what you are interested in. Much a the old market research companies used to pay people to see what was in their pantry and form customer profiles based on this information, Ad companies are now doing the same thing. In and of itself, this may bother people who jealously guard their privacy but most people seem to feel ok with this. Where this can get more concerning is when these data stores are used for personal tracking/monitoring purposes. For example, assume you want to search for bomb making materials while researching a book, or child pornography to help the police track down those that take part in such an act. Such systems may be used by the police/FBI to indicate that you are a person who needs to be watched. The analogy would be the same as someone knowing what you buy and what you’re reading at the library. Prior to the USA Patriot act, such searches by the government were illegal. The reasoning was clear, it was an unreasonable invasion of privacy and the constitution forbid such dragnet searches. When information is gathered and tracked, even for innocuous reasons, it becomes a rich target for those who would want to exploit it (FBI records “mistakenly” ended up in the Clinton White House about Republican rivals in Congress (Filegate)and the K street project used lobbying donations to manipulate political donors to give only to Republican candidates.). Like many things in the privacy world, it is the aggregation of information and then the ability for others to access this information that makes its implications scary. When AOL released a large block of its search data, the implications became immediately apparent.

If you are interested in limiting the information kept by you by these companies, you should think about if you want to allow cookies form those sites. Furthermore, since IP addresses and Mac addresses can be sued to identify a machine, using an anonimizing network, like Tor, would add another layer of protection. There are also a couple of decent article on hiding your search tracks here and here.

No comments: