Forgot your password?
typodupeerror
Data Storage Privacy Your Rights Online

Why Anonymized Data Isn't 280

Posted by kdawson
from the can't-keep-good-PII-down dept.
Ars has a review of recent research, and a summary of the history, in the field of reidentification — identifying people from anonymized data. Paul Ohm's recent paper is an elaboration of what Ohm terms a central reality of data collection: "Data can either be useful or perfectly anonymous but never both." "...in 2000, [researcher Latanya Sweeney] showed that 87 percent of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex. ... For almost every person on earth, there is at least one fact about them stored in a computer database that an adversary could use to blackmail, discriminate against, harass, or steal the identity of him or her. I mean more than mere embarrassment or inconvenience; I mean legally cognizable harm. ... Reidentification science disrupts the privacy policy landscape by undermining the faith that we have placed in anonymization."
This discussion has been archived. No new comments can be posted.

Why Anonymized Data Isn't

Comments Filter:
  • by Ethanol-fueled (1125189) * on Tuesday September 08, 2009 @03:03PM (#29355107) Homepage Journal

    For almost every person on earth, there is at least one fact about them stored in a computer database that an adversary could use to blackmail, discriminate against, harass, or steal the identity of him or her. I mean more than mere embarrassment or inconvenience; I mean legally cognizable harm.

    ...And this is the first thing that the author(s) though of regarding data-mining? Okay, but how would this happen? Why go through all the trouble to gather all that data when you could just hire a P.I. or know (or bribe) a law-enforcement official or an ISP employee? It Reminds me of a conversation I had with a guy who bragged that he could get anybody's info because a very good friend of his worked at the DMV. There were a couple semi-profile firings at the State Department because some employees snooped through celebrities' records for no reason other than voyeurism..er..curiosity.

    Those types, the ones with the direct access to the info, are the weakest link. They're only human. "Hey, Bob, there's this guy I really hate. Look up his IP logs and tell me what you see!"

    It all boils down to voyeurism. People would rather bring others down before bring their own lives up. It's the nature of the beast! Pathetic.

  • Mission Impossible (Score:5, Insightful)

    by im_thatoneguy (819432) on Tuesday September 08, 2009 @03:12PM (#29355219)

    I've pretty much given up any hope of being anonymous. It's just going to get exponentially more difficult as time goes on.

    I had my credit card stolen once. It was stolen from the CC company. How is a business supposed to entrust me with thousands of dollars in credit if they don't know who I am? How is a credit card company supposed to function without a worldwide network which authorizes transactions.

    If someone wants to find me they'll find me.

    If someone wants to use my identity to frame me for a crime then they're just going to encounter a mountain of evidence from numerous sources which contradict their fabrication.

    "My G1 was on a Starbucks Wifi at the time of the crime. I used my CC to purchase the drink. I received a text from a nearby tower. I posted a comment on breaking news story that is written in my style of writing. I was seen on 8 security cameras walking to the starbucks from my car. I used an automatic toll card 5 miles away from the coffee shop...." Good luck coming up with a large mountain of evidence to put me somewhere else.

  • by Anonymous Coward on Tuesday September 08, 2009 @03:16PM (#29355277)

    Even if the data is completely and unreversably anonymized, it is still invasive. Look at the story yesterday about the marketers data-mining kids' online private conversations for consumer gadget preferences. Even if there's no way from that data to infer the preferences of any particular kid, they should still be able to talk to each other without having their conversation be part of a marketing survey.

    Think also of a cafe that sells two kinds of food: apple pie (eaten by freedom-loving patriots), and felafel (eaten by terrorists and their supporters and sympathizers). Of course it would be invasive for the cafe to disclose which of its customers ordered which kind of food. But even releasing aggragate statistics is bad. An increase in felafel sales can led to a bullshit fbi investigation [wired.com] even if individual customers aren't identified.

    People sitting on private data constantly search for self-searching justifications to disclose as much as they can without getting clobbered by the sources of the data. It is bullshit. Private should mean no disclosure, not anonymized disclosure, not aggregate disclosure, just plain no disclosure period.

  • by causality (777677) on Tuesday September 08, 2009 @03:20PM (#29355329)

    But the voyeurism slant isn't newsworthy.

    Then how do you explain shows like Entertainment Tonight and all of these magazines and Web sites devoted entirely to completely useless celebrity trivia? Y'know, the ability to obsess over the personal life of someone you have never met and will never personally know, merely because they can sing or act, should be recognized as a pathology. Voyeurism only seems to partly explain it; much of it seems to come from an empty and unsatisfying life that leads to an attempt to live vicariously through some sort of idol which is perceived to be successful, in that sense that "most men lead lives of quiet desperation". However stupid and useless it may be, I can't deny that many do consider it newsworthy and much of "the news" includes such elements.

  • Re:Duh. (Score:4, Insightful)

    by garcia (6573) on Tuesday September 08, 2009 @03:25PM (#29355383) Homepage

    Am I the only one who always gives their birthday as 01/01/1970 and their zip code as 20500?

    I use 1/1/1979 (it's closer to my real age) and 90210 instead. I get a lot of crosseyed looks and many times the cashier (or whatever human I'm dealing with) will end up entering in a local zip code instead but people are no longer arguing w/me about what I choose to provide them when pressured for information (I always politely reply, "no thanks," when asked for that type of information but will give them false shit when they ask again and whine that they'll be fired).

    Why would I give 'em the right numbers? They're lucky I even allow them to have rough demographic data.

    Because the majority of people have absolutely no problems handing over any and all information they're prompted for up to and including their e-mail address, phone number or even SSN! Because most people don't even blink, those of us that don't feel like it should be anyone's business (like the scanning of IDs at liquor stores or bars to check age--there is a birthdate listed on IDs for a fucking reason people--not that they can scan my rare earth magnet swiped ID anyway) are looked at like assholes when we refuse to provide information that no one really needs anyway.

  • Re:20500 (Score:5, Insightful)

    by natehoy (1608657) on Tuesday September 08, 2009 @03:25PM (#29355385) Journal

    Because everyone knows that EVERYONE in DC lies.

  • by Applekid (993327) on Tuesday September 08, 2009 @03:28PM (#29355435)

    So, despite the Birthday Paradox [wikipedia.org], they can still identify 87% of Americans? For some reason I'm under the impression that there are a lot more zip codes with more than 366 people (heck, even 1000 to call upon 3 or 4 duplicates that should cover gender differences) than there are zip codes under that amount.

  • Couple of things.. (Score:5, Insightful)

    by hansraj (458504) on Tuesday September 08, 2009 @03:31PM (#29355469)

    Potential nitpick, but here goes.

    The summary (not surprisingly for a /. summary) omits a couple of details that give the reader a rather partial picture.

    For one, Paul Ohm is an Assistant Professor of law, and although the summary makes it sounds like the linked article would be from a technical perspective, (mostly) it is not.

    A quote like:

    "Data can either be useful or perfectly anonymous but never both."

    needs a bit of background about the qualification of the person making that claim. Why? Simply because it sounds like a rather technical remark. If some computer science researcher made this claim, I would tend to take it more on the face value, otherwise I would take it with a grain of salt.

    Now obviously this statement was not meant to be taken quite literally because the notion of "useful" is not precise. I can get reasonably useful information like "most of the people in my country like to buy branded stuff" or "most people who rent videos of actor X regularly, also rent the videos of actor Y regularly" without needing the underlying data to contain *any* personally identifiable information. The fact that extra data is store is a different thing.

    I personally believe that instead of claiming that some researcher has argued X, it can be more informative to actually say what kind of researcher it is who made a claim. Not because only researchers in a certain area can be trusted, but because a little bit of background puts the claims in right perspective.

  • by mea37 (1201159) on Tuesday September 08, 2009 @03:33PM (#29355491)

    Do you mean, you think you could've gotten an individual's medical records in MA for less than $20? Or maybe you can't see why someone would dig up an individual's medical records? (I can think of many... but then my employer was extorted by someone who'd stolen a bunch of medical-related data from them not that long ago.)

    I think I hear a bit of "nobody would go to all that trouble" in your message. If in the early days of WiFi networks I described to you in tedius yet vague terms how to compromise WEP encryption, you probably would've thought the same thing. Today anyone who cares to can break WEP using readily available tools - it's really no bother at all if you're even slightly inclined to do it.

    I've seen companies with contractual and regulatory obligations to protect data privacy make half-gestures to make it look like they're honoring privacy while still engaging in whatever easy-money scheme or shortcut they want. Shedding light on why those half-gestures don't work is a big deal.

  • by EasyTarget (43516) on Tuesday September 08, 2009 @03:37PM (#29355539) Journal

    Data can either be useful or perfectly anonymous but never both

    What a load of bolaks....

    Supposing you have a list of -just- birth dates for every citizen at the census. You -only- have only been given one piece of data per person, the date, nothing more. Just a huge list of dates, sorted chronologically.
    1) The data has been totally anonymised.
    2) You can do all kinds of meaningful analysis on the age demographics of the population. And make policy decisions based on that.

    Fully anonymous data producing useful results.

  • by riqtare (264681) on Tuesday September 08, 2009 @03:37PM (#29355547)

    If access to the evidence you just stated was available to the framer it makes it very easy to find a likely fall guy according to their habits. Makes the alibi of overwhelming evidence evaporate into prime suspicion.
    The best lies are those that are mostly truth.

  • by interkin3tic (1469267) on Tuesday September 08, 2009 @03:43PM (#29355667)

    I did think that was an overstatement that undermined the main point. None of my prescriptions would be embarassing to anyone but a holistic medicine believer, I've told some tasteless jokes online. If someone were to send that information to my family along with what porn I looked at, that would be awkward at most. And that's assuming it's credible, which it wouldn't be.

    How exactly would this blackmail work? Bob, the evil co-worker threatens to tell your wife and boss you have had a sex change, a running prescription to anti-psychotic medication, were arrested for something that they don't know about and you weren't legally obligated to inform them of, and look at gay porn aproximately 30% of your waking hours. For this hypothetical situation, assume that information is true. Do you do what he wants? If you don't and he does tell your wife and boss, do they actually believe him?

    I think privacy is good for privacy's sake, overstatements such as this undermine the point.

  • by Anonymous Coward on Tuesday September 08, 2009 @03:58PM (#29355871)

    "My G1 was on a Starbucks Wifi at the time of the crime. I used my CC to purchase the drink. I received a text from a nearby tower. I posted a comment on breaking news story that is written in my style of writing. I was seen on 8 security cameras walking to the starbucks from my car. I used an automatic toll card 5 miles away from the coffee shop...." Good luck coming up with a large mountain of evidence to put me somewhere else.

    They won't need luck. If they're trying to frame you (and especially if you're helping them by being so cavalier about your privacy), then they'll know all of the above, and thus how to avoid conflicts between their evidence and yours. You'll simply have committed the crime at a time and place for which you have no alibi, or in a way that makes the time and location irrelevant.

  • by causality (777677) on Tuesday September 08, 2009 @04:02PM (#29355981)

    Except that most people who watch Entertainment Tonight and such aren't "obsessed" with celebrity trivia. Interest =/= obsession.

    Dear AC, perhaps we are using different definitions of "obsession." Here's mine: when something cannot possibly benefit your life in any measurable way whatsoever, and you devote energy to pursuing it anyway, this is something of an obsession. To me, an interest is something different. The RIAA has an interest in strong copyright laws. Why? Because the RIAA is benefitted by strong copyright laws. Therefore, it's not a surprise that the RIAA tries to bring them about. However, it doesn't do a damned thing for me to know that $ACTRESS is thinking of divorcing her husband. I don't benefit from knowing this, therefore I can accurately say that it is not in my interests. Her family and personal friends might have an interest in this, and with good reason, but then they wouldn't need to find out second-hand from a TV show either.

    Think about it this way. If we treated all industries equally, in the sense that all industries were treated just like the entertainment industry, then anytime you bought a car or a computer it would come with a big long list containing the names of all the members of management, designers, and factory workers who produced it as well as the truck drivers who shipped it and the advertisers who marketed it. We would then have TV shows and magazines talking about the personal private lives of those people who produced your cars and computers, whom they marry, how many times they divorce and why, what goes on behind closed doors in their homes, and paparazzi would follow them around and try to get "exclusive" or embarassing photos of them. Additionally, average people who never met any of them would talk about them fondly as though they personally knew them.

    Now if this happened for the automobile or computer industries, and I said it was obsessive behavior, on what grounds would you dispute that? Real question. I'd like to know.

  • by mosb1000 (710161) <mosb1000@mac.com> on Tuesday September 08, 2009 @04:05PM (#29356035)
    How is this any different than articles about rockets and space travel (after all, most of us will never travel into space, or work for NASA)? Or any other in a myriad of technical subjects that most of us are not, and will not be directly involved in or use directly.

    People are curious. They are curious about everything. It's an exercise in futility to pick and chose useful information over non-useful information since none of us knows what tomorrow holds. If someone want's to read celebratory gossip more power to them. In truth, the gossip is more likely to be both true and useful than news about an new process that may produce titanium at half the cost or an article about NASA's next big toy. We on slashdot find the technical news more interesting, normal people who are interested in interpersonal relationships find the gossip more interesting. It's two sides of the same coin.
  • by causality (777677) on Tuesday September 08, 2009 @04:12PM (#29356179)

    Do you mean, you think you could've gotten an individual's medical records in MA for less than $20? Or maybe you can't see why someone would dig up an individual's medical records? (I can think of many... but then my employer was extorted by someone who'd stolen a bunch of medical-related data from them not that long ago.)

    I think I hear a bit of "nobody would go to all that trouble" in your message. If in the early days of WiFi networks I described to you in tedius yet vague terms how to compromise WEP encryption, you probably would've thought the same thing. Today anyone who cares to can break WEP using readily available tools - it's really no bother at all if you're even slightly inclined to do it.

    I've seen companies with contractual and regulatory obligations to protect data privacy make half-gestures to make it look like they're honoring privacy while still engaging in whatever easy-money scheme or shortcut they want. Shedding light on why those half-gestures don't work is a big deal.

    That's the thing that I also think people don't understand. With good reason, I am not satisfied merely that someone probably wouldn't want to abuse my information. I am satisfied only when I know that they cannot do so.

    I think the solution is to have the concept of "intellectual property" work both ways. Obviously your private information has value, otherwise advertisers and other companies wouldn't go to such great lenghts to obtain and use it. The problem is that they obtain it without your consent and without directly compensating you. For example, if I don't actively block web bugs, cookies, HTTP "ping", analytics tools, and other similar attempts, then that data will be gathered whether or not I like it.

    The reason why I actively go out of my way to prevent companies from gathering data on me is simple. No one asked me if I wanted to be data-mined. I refuse to honor agreements in which I did not participate. Why anyone else would do so is a mystery to me.

    So make each individual's private data their personal property. They can set whatever value they like, and if that value is more than a company thinks it is worth, the company is free to decline the sale. Most importantly, any attempt to just take that data will be theft, and anyone who does this can be prosecuted in a criminal court. I mean, think about it: why is it "marketing" when a company helps itself to my information against my will and "piracy" or "industrial espionage" if I helped myself to THEIR zeroes and ones against their will?

  • by causality (777677) on Tuesday September 08, 2009 @04:32PM (#29356499)

    Especially if you're outed by a friend who posts RL info without your permission that you can't retract.

    Had this happen to me once.

    That's why you have to be very careful about who your friends are. I am no longer surprised by someone who "suddenly changed" because it is not really sudden at all, it is merely subtle before it becomes bleedin' obvious. Sorry to hear you got screwed.

    And yes, I have trusted people I should not have trusted and gotten screwed. What I learned from it is that I ignored red flags and warning signs that should have tipped me off and so I set myself up for what should have been a predictable outcome. Usually this is because I denied the reality of what that person was telling me about themselves in favor of seeing only what I wanted to see. In that way, the person who meant me harm actually made me sharper, wiser, and more difficult to deceive. That could not have happened if 100% of my reaction was to blame that other person for doing something wrong, because that's for victims. Whether a victim gets screwed or not depends entirely on the other person. Whether I get screwed or not depends entirely on whether I make good decisions.

    Victims have another significant disadvantage: they don't profit from their negative experiences, because when everything is someone else's fault, there is no incentive to examine yourself and find out why you should have known better. Thus, where they should be learning and growing they are stagnant. The popularity of this mentality is really amazing.

  • by causality (777677) on Tuesday September 08, 2009 @05:20PM (#29357437)

    Dear AC, perhaps we are using different definitions of "obsession." Here's mine: when something cannot possibly benefit your life in any measurable way whatsoever, and you devote energy to pursuing it anyway, this is something of an obsession. Sorry, no; an obsession is just a troubling preoccupation. Benefit has nothing to do with it.

    Arguing semantics is much more useful when you are able to suggest a more suitable word. However, even then it's of little use, because even if I admit that "obsession" was a terrible choice of words, it doesn't do anything to change a single point I have made. Likewise, I can't help but notice you have not addressed any of the points I have made. So, I must conclude that this is the only fault you were able to find with my reasoning, which is pretty good, because it has nothing to do with my reasoning at all.

  • by shentino (1139071) on Tuesday September 08, 2009 @06:14PM (#29358343)

    By "outing" I mean the publication of ANY personal information that you'd prefer to keep secret.

    Sexual orientation, which btw doesn't apply in this case to this 24 year old virgin, is only one of many possible "secrets" that can be blown.

    Also, consider if I were an iranian dissident who had hard proof that the elections were rigged? Isn't it foreseeable that, with protesters being deemed to be against Allah by the powers that be, that outing me would put me in grave danger?

    Anonymity is a precious thing that should be respected. Whether it protects your ego or your life.

  • CT scans (Score:3, Insightful)

    by Cajun Hell (725246) on Tuesday September 08, 2009 @07:30PM (#29359313) Homepage Journal

    Have you ever thought about how a "cat" scan works? Forget the 3D aspects and let's just think about how the cross-sectional pictures work.

    Every given reading, is just shooting a ray through the target, and getting a single number out. This is analogous to aggregate summaries are personal details in data. You know the average income of people in zip code 12345, but no specifics. The trick is, later, just as that CT scan is going to shoot a ray through a certain point again from a different direction, your personal details are going to be summarized again by someone else, in a different way.

    A picture will emerge. The CT scan is going to "see" the bone as distinct from the tissue right here at this pixel, and this person's data will be un-summarized. It just takes enough rays, and eventually all ambiguity goes away.

    A long time ago (about 20 years ago, I think?) there was a neato explanation of a cat scan algorithm in Scientific American. I wish I could find it. Because I bet you could show that article to any "database guy" these days, and they'd nod and smile.

Life. Don't talk to me about life. - Marvin the Paranoid Anroid

Working...