Forgot your password?
typodupeerror
Data Storage Privacy Your Rights Online

Why Anonymized Data Isn't 280

Posted by kdawson
from the can't-keep-good-PII-down dept.
Ars has a review of recent research, and a summary of the history, in the field of reidentification — identifying people from anonymized data. Paul Ohm's recent paper is an elaboration of what Ohm terms a central reality of data collection: "Data can either be useful or perfectly anonymous but never both." "...in 2000, [researcher Latanya Sweeney] showed that 87 percent of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex. ... For almost every person on earth, there is at least one fact about them stored in a computer database that an adversary could use to blackmail, discriminate against, harass, or steal the identity of him or her. I mean more than mere embarrassment or inconvenience; I mean legally cognizable harm. ... Reidentification science disrupts the privacy policy landscape by undermining the faith that we have placed in anonymization."
This discussion has been archived. No new comments can be posted.

Why Anonymized Data Isn't

Comments Filter:
  • by Anonymous Coward on Tuesday September 08, 2009 @03:05PM (#29355135)

    The only way to make sure that data remains truly anonymous if or it to start out as anonymous data. "Scrubbed" data will always be traceable and often will have the source data, non-scrubbed, leak into the wild.

    All hail the glorious Hypno-Google.

  • Duh. (Score:4, Informative)

    by SatanicPuppy (611928) * <Satanicpuppy@@@gmail...com> on Tuesday September 08, 2009 @03:10PM (#29355193) Journal

    Am I the only one who always gives their birthday as 01/01/1970 and their zip code as 20500?

    I mean, seriously. They don't need to know. Why would I give 'em the right numbers? They're lucky I even allow them to have rough demographic data.

  • Re:Paul Ohm? (Score:5, Informative)

    by natehoy (1608657) on Tuesday September 08, 2009 @03:12PM (#29355231) Journal

    Nonsense, it could be a extension of the current Law:

    "In electrical circuits, Ohms' law states that the current through a conductor between two points is directly proportional to the potential difference or voltage across the two points, and inversely proportional to the resistance between them. In data anonymity, the law states that the general usefulness of any set of data that originally contained personally-identifiable information is inversely proportional to the degree of anonymity applied to said data."

    See, on simple law to memorize, and now data analysts learn just a teensy bit about electricity and EEs learn just a teensy bit about data anonymization.

  • by RevWaldo (1186281) * on Tuesday September 08, 2009 @03:23PM (#29355355)
    If you ever wonder why people view the privacy of your records in the hand of third parties is important, and don't just hop on the "privacy is dead" bandwagon, this is the sort of scenario they have in mind.

    http://en.wikipedia.org/wiki/Mother_Earth_(magazine) [wikipedia.org]

    Mother Earth was an anarchist journal that described itself as "A Monthly Magazine Devoted to Social Science and Literature," edited by Emma Goldman. Alexander Berkman, another well-known anarchist, was the magazine's editor from 1907 to 1915. It published longer articles on a variety of anarchist topics including the labor movement, education, literature and the arts, state and government control, and women's emancipation, sexual freedom, and was an early supporter of birth control. Its subscribers and supporters formed a virtual "who's who" of the radical left in America in the years prior to 1920.

    In 1917, Mother Earth began to openly call for opposition to American entry into World War I and specifically to disobey government laws on conscription and registration for the military draft. On June 15, 1917, Congress passed the Espionage Act. The law set punishments for acts of interference in foreign policy and espionage. The Act authorized stiff fines and prison terms of up to 20 years for anyone who obstructed the military draft or encouraged "disloyalty" against the U.S. government. After Emma Goldman and Alexander Berkman continued to advocate against conscription, Goldman's offices at Mother Earth were thoroughly searched, and volumes of files and detailed subscription lists from Mother Earth, along with Berkman's journal The Blast, were seized. As a Justice Department news release reported:

    "A wagon load of anarchist records and propaganda material was seized, and included in the lot is what is believed to be a complete registry of anarchy's friends in the United States. A splendidly kept card index was found, which the Federal agents believe will greatly simplify their task of identifying persons mentioned in the various record books and papers. The subscription lists of Mother Earth and The Blast, which contain 10,000 names, were also seized."

    Mother Earth remained in monthly circulation until August 1917.[1] Berkman and Goldman were found guilty of violating the Espionage Act, (imprisoned for two years) and were later deported.

  • Re:Duh. (Score:1, Informative)

    by Anonymous Coward on Tuesday September 08, 2009 @03:31PM (#29355461)

    Am I the only one who always gives their birthday as 01/01/1970 and their zip code as 20500?

    But be careful. Using the same fake data consistently still allows someone to correlate across different records. For instance the aggregate data from various websites where you've filled-in data would identify you (with reasonably high probability) as being a single person. Then all it takes is one database that has enough info to link back to your real identity for your anonymity to be gone again.

    I'm not saying that the average company would go to that much effort. I'm just saying that if you're going to be paranoid about anonymity, you should vary the data you provide somewhat randomly.

  • by Daniel_Staal (609844) <DStaal@usa.net> on Tuesday September 08, 2009 @03:38PM (#29355571)

    That Paradox ignores the year. Add that in and it starts to become harder.

  • Re:Duh. (Score:3, Informative)

    by mmkkbb (816035) on Tuesday September 08, 2009 @03:42PM (#29355643) Homepage Journal

    (like the scanning of IDs at liquor stores or bars to check age--there is a birthdate listed on IDs for a fucking reason people--not that they can scan my rare earth magnet swiped ID anyway)

    That's not to check age; that's to check for counterfeits with mismatched mag data, or mismatched 2-D barcode data, or missing UV ink prints, or missing holograms, etc. etc.

  • by OrigamiMarie (1501451) on Tuesday September 08, 2009 @03:49PM (#29355753)
    Perhaps they meant zip + 4. Which gets you down to very few households, but most people can't rattle off their zip + 4, so this information wouldn't actually apply to the questions posed by cashiers. On the other hand, I have heard that data mining on web-surfing habits can usually pick up your zip + 4, so yeah, it would be pretty trivial to put that together with birth date (which is asked for a various places to determine that you're of-age -- though of course you can lie) and sex, which can probably be guessed at even if you don't click one of the radio buttons.
  • Re:Paul Ohm? (Score:5, Informative)

    by Beardo the Bearded (321478) on Tuesday September 08, 2009 @03:54PM (#29355819)

    Okay, let's take a road. The speed at which traffic can travel depends on the quality of the surface, gradient, camber, zoning, etc. Let's call this the "road conditions", with a lower number being better roads.

    The number of cars that want to get through that road is a primary unit, which we can refer to as the "volume of traffic".

    The third major criteria is the speed at which the traffic actually flows. This is the "actual flow" of traffic -- in other words, the "influence of other cars" on the traffic congestion.

    In other words:
    volume = influence of traffic * road conditions

    or:
    V = IR

  • Ohm is overwrought (Score:3, Informative)

    by feenberg (201582) on Tuesday September 08, 2009 @04:13PM (#29356189)

    I have worked with anonymized government data extensively, and birthdate and zipcode are always considered personally identifiable information. Sometimes birth year is available, and sometimes state or (rarely) county is available, but I have never even heard of a dataset with both. Datasets with month and day of birth are never considered to be anonymized, and are not released. The author of the paper is much overwrought.

  • Re:Duh. (Score:1, Informative)

    by Anonymous Coward on Tuesday September 08, 2009 @04:37PM (#29356593)
    And after that, it's to keep a list of everyone who has entered the bar for the history of it's operation. Much easier to identify "troublemakers" when you have a list of people who like to have fun once in a while.
  • by Anonymous Coward on Wednesday September 09, 2009 @03:39AM (#29362869)

    Well recently in the Netherlands a guy tried to do just that: He wanted to use video tapes to prove he was somewhere else. The problem here was that the DA 'lost' those video tapes. I tried to re-find a link but was unsuccessful. Any other Dutch news finders up to the task?

Thus spake the master programmer: "After three days without programming, life becomes meaningless." -- Geoffrey James, "The Tao of Programming"

Working...