Showing posts with label tor. Show all posts
Showing posts with label tor. Show all posts

Monday, March 3, 2008

IP address; Your Home on the Net

OK, so here’s a quick primer on Internet traffic. Much like the traffic on the streets, it finds its way to its destination via an address (well except for male traffic which wanders randomly around until it sees its destination, luckily for us all, Internet traffic is androgynous). The world of computer technology (especially early technology in the space) used very descriptive naming and IP (or Internet Protocol) is one of those amazingly descriptive names. Every time you communicate with another machine on the internet (e.g. every time you type in an email address, a web-site or IM someone) your IP address is communicated to that site. Don’t believe me, go to www.WhatsMyIPAddress.com and it will tell you what your IP address is. The current version of IP addresses is called IPv4 (for version 4). The problem with IPv4 is that as the number of devices that are connected to the Internet has expanded (think of every server, Internet capable cell phone, desktop, laptop, etc.) the number of available addresses is getting pretty slim (much like with telephone numbers). Also like telephone numbers (or street addresses) sections are given out in blocks (blocks of numbers or just street blocks). To deal with the lack of addresses, organizations (probably like your workplace or school) set up a set of IP addresses and then allow the traffic to get sent to addresses only it knows within its network (this is called DHCP within a reverse-proxy, don’t worry about the tech parts of this, just accept that your IP address changed periodically to allow others to use that address when you weren’t). Your Internet Service Provider (ISP) most likely does this as (just like you might have an office number at work that the post office has no idea where it is). This has all changed.

Two things are changing this system. First off is IPv6. IPv6 has much more “addressing space” which means that if this were a city, you just built a ton of new roads and everyone can easily have their own address. This means that there is no need for dynamic addressing and thus people may keep their IP addresses for long periods of time (effectively making them personally identifiable). The second change was around data aggregation.

Data aggregation has become cheap enough that storing massive amounts of data is quite cheap. Right now I can go buy a terabyte of space (that’s 1,000,000 Megabytes (MB)) for a couple hundred dollars (US $). Since storage is cheap, organizations started to store this information and associate it with other information. IP address could be linked to users (say if you logged into an online website then linking your login time and IP address would give you a user’s identity, then use that IP address on other sites and you know where the person has been). You can even use this information to get a person’s physical location (or at least the location of the machine/access point they are using). Search engines use this information to build a profile of a user and use that information to build marketing profiles. In this is where Google has found itself on the bad side of the European Union’s Privacy initiatives.

Recently the EU, decided that IP addresses are personal information (called PII or personally Identifiable Information in the US). Google, in particular is fighting this as they argue that IP addresses aren’t personally identifiable. If comments on that blog are any indication, the net community isn’t buying this line any more than the EU is. In fairness to Google, they really don’t care if it personally identifies you as long as it uniquely identifies a person (since that’s where their targeted ad business (the core of how they make money) makes its money). Google is trying several steps to convince people they aren’t keeping info that is personally identifiable but in reality, anyone who is storing IP addresses (even without things like search histories that invariably have PII in them) is going to have this issue. Using ISP records, IP addresses can be linked to users and from a Govt. standpoint this is the magic connection.

If you are concerned about such actions, I can recommend two actions to take. The first is to use a service like Scroogle. You can make a search plug-in for your browser for them or just go to their homepage. They proxy searches to Google but take out the ads and the tracking cookies. In this way you can access the value of a search engine (like Google) without worrying about the nasty tracking aspects of such a company. The second option is to use an anonymizing service like TOR. TOR sends all your traffic though at least three other nodes. The data is thus Anonymized from its original source but it is NOT confidential (e.g. if you log into a website that is not using SSL (the little lock icon on your browser) then the person on the end of that chain of servers could capture your login and password). This is just as true if you aren’t using TOR but just a reminder that anonymity is different than privacy.

Sunday, August 12, 2007

You Are What You Look For – Search Engines And Data Profiling

Recently there was a flurry of search engine companies all touting their changes to their cookie life policies. As Google decided to go to 18 months for its cookie policy, other search providers followed suit or tried to one up them with shorter retention policies. The Thing to understand here is that, although this sounds like a privacy win for consumers, it is really something fairly unimportant. Cookies help sites know who you are. This is especially important for sites that have personably identifiable information (PII) on their users. The major search companies all have email programs that many users that have user’s information in them (both in signing up and in the message contents). Although it is true that people can give false information, the truth is that most people are honest. The new policies say that a year and a half after the last time you visit one of their sites, the cookie that they put on your machine will expire. This means that every time you go to one of that sites properties this year and a half timeline gets reset. So what’s the catch here? Well, remember that the majority of advertising on the net (and advertising is the monetization strategy for most of the web) is run by these companies. This means that if you don’t go to Google (or Gmail, Orkut, etc..) for a year and a half, this cookie still isn’t gone since you almost surely went to a site in that time period that uses Google adWords (full disclosure, Blogger, where this blog is hosted is owned by Google). Yahoo and MSN/Live are much the same (though Google has the vast majority of the search and ad traffic on the net). So although in theory there is a way for this data to go away, in reality it is quite unlikely that it will. Also, remember that a cookie is just an identifier; the real information on you is stored on the company servers. This means if you choose to delete your cookies (all browsers have this options and most can even let you block them) your information is still tracked since the next time you visit one of these sites, the cookie will be put back on your computer. The real key is what happens with the information.

Search/Ad companies really aren’t that interested in tracking “you”, they are interested in tracking who you are. They do this not out of some Orwellian desire to observe and control, they do it because it helps them sell more relevant ads to their clients who pay more money to get their ads shown to the people they want to target. By seeing where you go, what you write in your emails (yes Gmail does an automated search of all of your email) these companies can build a profile of who you are; and what you are interested in. Much a the old market research companies used to pay people to see what was in their pantry and form customer profiles based on this information, Ad companies are now doing the same thing. In and of itself, this may bother people who jealously guard their privacy but most people seem to feel ok with this. Where this can get more concerning is when these data stores are used for personal tracking/monitoring purposes. For example, assume you want to search for bomb making materials while researching a book, or child pornography to help the police track down those that take part in such an act. Such systems may be used by the police/FBI to indicate that you are a person who needs to be watched. The analogy would be the same as someone knowing what you buy and what you’re reading at the library. Prior to the USA Patriot act, such searches by the government were illegal. The reasoning was clear, it was an unreasonable invasion of privacy and the constitution forbid such dragnet searches. When information is gathered and tracked, even for innocuous reasons, it becomes a rich target for those who would want to exploit it (FBI records “mistakenly” ended up in the Clinton White House about Republican rivals in Congress (Filegate)and the K street project used lobbying donations to manipulate political donors to give only to Republican candidates.). Like many things in the privacy world, it is the aggregation of information and then the ability for others to access this information that makes its implications scary. When AOL released a large block of its search data, the implications became immediately apparent.

If you are interested in limiting the information kept by you by these companies, you should think about if you want to allow cookies form those sites. Furthermore, since IP addresses and Mac addresses can be sued to identify a machine, using an anonimizing network, like Tor, would add another layer of protection. There are also a couple of decent article on hiding your search tracks here and here.