When word went out earlier this month that AOL had publicly released user search data, resulting in the identities and sometimes lurid interests of thousands of users being laid bare for the world to see, it was actually good news for those people interested in privacy.

While embarrassing for AOL – not to mention those users whose searches for dates, porn, and cures to various intimate medical maladies – the incident finally proved in an unforgettable way something that privacy advocates have long known: “anonymizing” data doesn’t necessarily make it safe.

As is often the case, the road to AOL’s search data debacle was paved with good intentions. As a service to academic researchers working to create faster and more accurate search algorithms, AOL’s own in-house researchers took a snapshot of some 20 million search queries over a three-month period.


AOL’s search experts then stripped away user account information, replacing overt identifiers such as user names and IP addresses with random numbers designed to protect the anonymity of the user. Once anonymized in that way, the data file was posted on a website promoting AOL’s computer research activities.

Proving the old maxim that no good deed goes unpunished, many of the researchers who began looking at the contents of the file realized very quickly that, despite having the obvious identifying information stripped away, the anonymized data remained a treasure trove of personal information about those AOL customers whose data was captured.

In perhaps the most striking and detailed example, reporters at the New York Times looked at a sequence of searches for someone known only as “User 4417749.” Even though this person’s identity had been reduced to a random number, the reporters were able to examine 4417749’s searches and easily determined that she was Thelma Arnold, a 62-year old widow in Lilburn, Ga.

Once the reality of the situation dawned on the powers that be at AOL, the data was quickly taken off the website and profuse apologies were issued. Of course, the Internet being what it is, within minutes of the removal more than a dozen other websites had copies of the file available for download.

AOL is now facing investigations from the Federal Trade Commission, and will no doubt see some inquiries from state Attorneys General and probably a few lawsuits from class-action attorneys. But before all three rings of the legal circus begin, I think it is important to understand what AOL’s sin really was.

AOL's Real Sin

Unfortunately, many clueless news reporters and media outlets portrayed this as some sort of accidental breach in the security of user records. It’s hard to blame them, given the rash of actual security breaches, lost laptops, and other data losses that have made news in recent month. But if anybody sees this as a failure of security, they will have misunderstood the critically important lesson of this fiasco.

AOL’s real sin was buying into the B.S. that it and other major Internet companies have been peddling for years about how anonymizing search data could insulate the data subjects – folks like you and me – from any privacy risks. For years, marketers have argued that as long as you anonymized the data, there’s almost no risk to privacy by having companies like Google, Yahoo, MSN, and AOL, keep detailed log files about everything users do online.

For nearly a decade, folks in the privacy advocacy community (myself included) have warned that search data can, and often does, contain far more personal information than people realize, and that anonymizing the data creates very little impediment to someone determined to ferret out a user’s identity. And for most of that same decade, marketers and their lobbyists have pooh-poohed those concerns as overblown and unduly paranoid.

More importantly, the AOL search data incident reminded us all that our privacy is continuously at the mercy of those who run the tools, such as search engines, that we depend upon every day of our Internet-connected lives. Our privacy is only as assured as the products and services on whose goodwill we depend, and in that regard the record isn’t encouraging.

For example, we learned just a few months ago that each of the major search engine companies – with the exception of Google – happily and without a moment’s hesitation turned over massive volumes of anonymized search records to federal investigators who claimed they needed it to prove that there’s porn on the Internet.

Fodder For Fishing Expeditions

Many of us in the Unduly Paranoid Wing of the privacy nuthouse warned at the time that even anonymized data could be used by overzealous law enforcement investigators for fishing expeditions of unprecedented breadth.

You see, once that search data becomes part of the court records, it can be used by law enforcement to see if anybody is up to any mischief.

While you may think you have nothing to hide from such a fishing expedition, think again! Are you sure that IRS auditors wouldn’t look for likely audit targets if they could cull through search records looking for people who searched for barely legal tax shelters or other tax-cutting tips?

The possibilities are endless for law enforcement personnel who are handy with a computer and have a quota of citizens they need to make cry.

Although it is unfortunate that so many AOL customers had their personal information thrown into public view, the release of the AOL search data – which I believe is actually Google search data, given that Google power’s AOL’s search services – may well prove to be a watershed moment in Internet privacy.

By laying bare the online activities of sweet little old ladies in Georgia, the myth about the protective power of anonymizing search data has been forcefully and publicly debunked.

We owe these unwitting privacy pioneers a debt of gratitude. And since we now know their addresses, interests, and deepest personal desires, it should be a cinch to shop for them all this Christmas!