Why Anonymized Data Isn't 280
Ars has a review of recent research, and a summary of the history, in the field of reidentification — identifying people from anonymized data. Paul Ohm's recent paper is an elaboration of what Ohm terms a central reality of data collection: "Data can either be useful or perfectly anonymous but never both." "...in 2000, [researcher Latanya Sweeney] showed that 87 percent of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex. ... For almost every person on earth, there is at least one fact about them stored in a computer database that an adversary could use to blackmail, discriminate against, harass, or steal the identity of him or her. I mean more than mere embarrassment or inconvenience; I mean legally cognizable harm. ... Reidentification science disrupts the privacy policy landscape by undermining the faith that we have placed in anonymization."
Damn voyeurism is all it is (Score:4, Insightful)
For almost every person on earth, there is at least one fact about them stored in a computer database that an adversary could use to blackmail, discriminate against, harass, or steal the identity of him or her. I mean more than mere embarrassment or inconvenience; I mean legally cognizable harm.
...And this is the first thing that the author(s) though of regarding data-mining? Okay, but how would this happen? Why go through all the trouble to gather all that data when you could just hire a P.I. or know (or bribe) a law-enforcement official or an ISP employee? It Reminds me of a conversation I had with a guy who bragged that he could get anybody's info because a very good friend of his worked at the DMV. There were a couple semi-profile firings at the State Department because some employees snooped through celebrities' records for no reason other than voyeurism..er..curiosity.
Those types, the ones with the direct access to the info, are the weakest link. They're only human. "Hey, Bob, there's this guy I really hate. Look up his IP logs and tell me what you see!"
It all boils down to voyeurism. People would rather bring others down before bring their own lives up. It's the nature of the beast! Pathetic.
Re: (Score:3, Insightful)
Do you mean, you think you could've gotten an individual's medical records in MA for less than $20? Or maybe you can't see why someone would dig up an individual's medical records? (I can think of many... but then my employer was extorted by someone who'd stolen a bunch of medical-related data from them not that long ago.)
I think I hear a bit of "nobody would go to all that trouble" in your message. If in the early days of WiFi networks I described to you in tedius yet vague terms how to compromise WEP e
Re:Damn voyeurism is all it is (Score:5, Insightful)
Do you mean, you think you could've gotten an individual's medical records in MA for less than $20? Or maybe you can't see why someone would dig up an individual's medical records? (I can think of many... but then my employer was extorted by someone who'd stolen a bunch of medical-related data from them not that long ago.)
I think I hear a bit of "nobody would go to all that trouble" in your message. If in the early days of WiFi networks I described to you in tedius yet vague terms how to compromise WEP encryption, you probably would've thought the same thing. Today anyone who cares to can break WEP using readily available tools - it's really no bother at all if you're even slightly inclined to do it.
I've seen companies with contractual and regulatory obligations to protect data privacy make half-gestures to make it look like they're honoring privacy while still engaging in whatever easy-money scheme or shortcut they want. Shedding light on why those half-gestures don't work is a big deal.
That's the thing that I also think people don't understand. With good reason, I am not satisfied merely that someone probably wouldn't want to abuse my information. I am satisfied only when I know that they cannot do so.
I think the solution is to have the concept of "intellectual property" work both ways. Obviously your private information has value, otherwise advertisers and other companies wouldn't go to such great lenghts to obtain and use it. The problem is that they obtain it without your consent and without directly compensating you. For example, if I don't actively block web bugs, cookies, HTTP "ping", analytics tools, and other similar attempts, then that data will be gathered whether or not I like it.
The reason why I actively go out of my way to prevent companies from gathering data on me is simple. No one asked me if I wanted to be data-mined. I refuse to honor agreements in which I did not participate. Why anyone else would do so is a mystery to me.
So make each individual's private data their personal property. They can set whatever value they like, and if that value is more than a company thinks it is worth, the company is free to decline the sale. Most importantly, any attempt to just take that data will be theft, and anyone who does this can be prosecuted in a criminal court. I mean, think about it: why is it "marketing" when a company helps itself to my information against my will and "piracy" or "industrial espionage" if I helped myself to THEIR zeroes and ones against their will?
Re: (Score:3, Interesting)
"why is it "marketing" when a company helps itself to my information against my will and "piracy" or "industrial espionage" if I helped myself to THEIR zeroes and ones against their will?"
Re:Damn voyeurism is all it is (Score:5, Insightful)
Then how do you explain shows like Entertainment Tonight and all of these magazines and Web sites devoted entirely to completely useless celebrity trivia? Y'know, the ability to obsess over the personal life of someone you have never met and will never personally know, merely because they can sing or act, should be recognized as a pathology. Voyeurism only seems to partly explain it; much of it seems to come from an empty and unsatisfying life that leads to an attempt to live vicariously through some sort of idol which is perceived to be successful, in that sense that "most men lead lives of quiet desperation". However stupid and useless it may be, I can't deny that many do consider it newsworthy and much of "the news" includes such elements.
Some perspective please. (Score:3, Insightful)
People are curious. They are curious about everything. It's an exercise in futility to pick and chose useful information over non-useful information since none of us knows what tomorrow holds. If someone want's to read celebratory gossip mor
Re: (Score:3, Interesting)
That's not really a valid question and I'll explain why. The difference is that rockets and space travel are about the actual technology. If the entertainment industry operated that way, then all of the discussion would be about photography/
Re: (Score:2)
Re: (Score:3, Insightful)
Except that most people who watch Entertainment Tonight and such aren't "obsessed" with celebrity trivia. Interest =/= obsession.
Dear AC, perhaps we are using different definitions of "obsession." Here's mine: when something cannot possibly benefit your life in any measurable way whatsoever, and you devote energy to pursuing it anyway, this is something of an obsession. To me, an interest is something different. The RIAA has an interest in strong copyright laws. Why? Because the RIAA is benefitted by strong copyright laws. Therefore, it's not a surprise that the RIAA tries to bring them about. However, it doesn't do a damned
Re: (Score:2)
Dear AC, perhaps we are using different definitions of "obsession." Here's mine: when something cannot possibly benefit your life in any measurable way whatsoever, and you devote energy to pursuing it anyway, this is something of an obsession.
Tonight, I spent some time pursing a better view of the beautiful sunset here in Paris, and then observing it, which could not possibly benefit my life in any measurable way whatsoever. I will do it again, the next time such an opportunity arises. I recommend that you
Re: (Score:2)
Dear AC, perhaps we are using different definitions of "obsession." Here's mine: when something cannot possibly benefit your life in any measurable way whatsoever, and you devote energy to pursuing it anyway, this is something of an obsession.
Tonight, I spent some time pursing a better view of the beautiful sunset here in Paris, and then observing it, which could not possibly benefit my life in any measurable way whatsoever. I will do it again, the next time such an opportunity arises. I recommend that you should be cautious before you label people obsessed.
Yes, but you saw the sunset for yourself. You didn't hear second-hand about someone else who saw a beautiful sunset. That's why I know you have missed my point.
I made a clear distinction when I was talking about celebrities. The distinction was between actually personally knowing someone, versus going to lenghts to obtain second-hand information about a person you have not met, do not personally know, and in all likelihood are never going to meet. Tell me, when you read my post did you overlook this
Re: (Score:3, Insightful)
Dear AC, perhaps we are using different definitions of "obsession." Here's mine: when something cannot possibly benefit your life in any measurable way whatsoever, and you devote energy to pursuing it anyway, this is something of an obsession. Sorry, no; an obsession is just a troubling preoccupation. Benefit has nothing to do with it.
Arguing semantics is much more useful when you are able to suggest a more suitable word. However, even then it's of little use, because even if I admit that "obsession" was a terrible choice of words, it doesn't do anything to change a single point I have made. Likewise, I can't help but notice you have not addressed any of the points I have made. So, I must conclude that this is the only fault you were able to find with my reasoning, which is pretty good, because it has nothing to do with my reasoning a
Paul Ohm? (Score:5, Funny)
Great, another Ohm's law [wikipedia.org] to learn.
Re:Paul Ohm? (Score:5, Informative)
Nonsense, it could be a extension of the current Law:
"In electrical circuits, Ohms' law states that the current through a conductor between two points is directly proportional to the potential difference or voltage across the two points, and inversely proportional to the resistance between them. In data anonymity, the law states that the general usefulness of any set of data that originally contained personally-identifiable information is inversely proportional to the degree of anonymity applied to said data."
See, on simple law to memorize, and now data analysts learn just a teensy bit about electricity and EEs learn just a teensy bit about data anonymization.
Re: (Score:3, Funny)
Re:Paul Ohm? (Score:5, Informative)
Okay, let's take a road. The speed at which traffic can travel depends on the quality of the surface, gradient, camber, zoning, etc. Let's call this the "road conditions", with a lower number being better roads.
The number of cars that want to get through that road is a primary unit, which we can refer to as the "volume of traffic".
The third major criteria is the speed at which the traffic actually flows. This is the "actual flow" of traffic -- in other words, the "influence of other cars" on the traffic congestion.
In other words:
volume = influence of traffic * road conditions
or:
V = IR
Duh. (Score:4, Informative)
Am I the only one who always gives their birthday as 01/01/1970 and their zip code as 20500?
I mean, seriously. They don't need to know. Why would I give 'em the right numbers? They're lucky I even allow them to have rough demographic data.
Re:Duh. (Score:5, Funny)
Re:Duh. (Score:4, Funny)
I put "please!" and it doesnt seem to help either.
Re: (Score:2, Funny)
Re: (Score:2)
And you wonder why you never get laid when you go to a bar.
Re:Duh. (Score:4, Funny)
And you wonder why you never get laid when you go to a bar.
Usually it's better to wait until you leave the bar.
Re:Duh. (Score:4, Funny)
I never pay for drinks, I know the password for the Wi-fi, and it never closes.
Problem is, the only girl who ever shows up is my sister.
20500 (Score:2)
Re:20500 (Score:5, Insightful)
Because everyone knows that EVERYONE in DC lies.
Re: (Score:2)
That sounds like a setup for a joke:
Bill Clinton, Marion Barry, and Scooter Libby walk into a bar...
Re:Duh. (Score:4, Insightful)
Am I the only one who always gives their birthday as 01/01/1970 and their zip code as 20500?
I use 1/1/1979 (it's closer to my real age) and 90210 instead. I get a lot of crosseyed looks and many times the cashier (or whatever human I'm dealing with) will end up entering in a local zip code instead but people are no longer arguing w/me about what I choose to provide them when pressured for information (I always politely reply, "no thanks," when asked for that type of information but will give them false shit when they ask again and whine that they'll be fired).
Why would I give 'em the right numbers? They're lucky I even allow them to have rough demographic data.
Because the majority of people have absolutely no problems handing over any and all information they're prompted for up to and including their e-mail address, phone number or even SSN! Because most people don't even blink, those of us that don't feel like it should be anyone's business (like the scanning of IDs at liquor stores or bars to check age--there is a birthdate listed on IDs for a fucking reason people--not that they can scan my rare earth magnet swiped ID anyway) are looked at like assholes when we refuse to provide information that no one really needs anyway.
Re: (Score:3, Informative)
(like the scanning of IDs at liquor stores or bars to check age--there is a birthdate listed on IDs for a fucking reason people--not that they can scan my rare earth magnet swiped ID anyway)
That's not to check age; that's to check for counterfeits with mismatched mag data, or mismatched 2-D barcode data, or missing UV ink prints, or missing holograms, etc. etc.
Re: (Score:2)
The way to defeat this is to use an out-of-state fake ID. Or to use an ID of somebody who looks like you.
The whole ID checking process has gotten asanine really...
Re: (Score:2)
An out-of-state fake ID will not necessarily work. There are interstate standards for the content of mag stripes and 2-D barcodes, for example.
Re: (Score:3, Interesting)
An out-of-state fake ID will not necessarily work. There are interstate standards for the content of mag stripes and 2-D barcodes, for example.
But no where near all states follow those standards. All you gotta do is make a fake-id for one of those states. Even if the state does follow those standards, if you pick a state far enough way you can make up pretty much anything, call it an id card (rather than a driver's license) and the person using the machine will have to make the human decision to accept the id anyway or not. As someone who made such a fake-id for a girl who wanted to appear younger than she was (got tired of the bouncers at the
Re:Duh. (Score:4, Funny)
And after that, it's to keep a list of everyone who has entered the bar for the history of it's operation. Much easier to identify "troublemakers" when you have a list of people who like to have fun once in a while.
You DO know that in many states, a bartender is legally responsible for anything you do while drunk from the moment you take a drink until you're finally sober, right?
Re: (Score:3, Funny)
This makes me think of a probably not unique idea. Most places that ask my my phone number are the same places asking over and over again. Radio Shack, Toys-R-Us, and Sears for example. What would be great is to memorize one of their phone numbers from the phone book and always give them that. Perhaps a number from a different store. Let their telemarketers waste time calling their own stores.
Re: (Score:2)
Your "account" is indexed under your phone number - they are looking it up to know what offers they should let you in on, check to see if you have a store credit card or should have one and of course to build their profile on you.
They don't care about your phone number other than that it is a unique identifier.
Re: (Score:3)
Your "account" is indexed under your phone number - they are looking it up to know what offers they should let you in on, check to see if you have a store credit card or should have one and of course to build their profile on you.
They don't care about your phone number other than that it is a unique identifier.
I have the money, they have the goods, we make an exchange. I like it when it remains that simple. Their mistake is assuming that I want to establish an "account" without first asking me. When it comes to my personal information, everyone is on a need-to-know basis. Almost no one needs to know. If they have an entitlement mentality that prevents them from respecting that, then I have no moral qualms whatsoever about giving them false information.
Re:Duh. (Score:5, Funny)
I once gave a gamestop employee my zip as 12345. He say "its ok if you don't want to give it." My reply was the no, I am from Schenectady, NY.
Re: (Score:2)
I use this exact same zip code. On web forms I usually put in 123 Fake Street.
Oh and bob@hotmail.com? I am really, really sorry about that man.
Re: (Score:2)
Re: (Score:2)
Re:Duh. (Score:5, Funny)
Yes you are. I always put put 90210. Phone number 867-5309. If anyone tries to find me, they're at least going to have that song stuck in their head and recall with disgust the shows they watched in the early 90's. Hopefully that will demoralize them enough to give up.
Re:Duh. (Score:4, Funny)
I would think 90210 is a more common choice for zip code. It's probably the most densely populated area on the planet according to dataminers.
Re: (Score:2)
No, I use 90210 because I know that's a valid code.
I've given out random birthdays so many times that I have to check my DL before I order a cake.
Re: (Score:2)
Yah, I do that too. I have AARP invitations on my wall because they mined some database that shows I'm in my 70s. I also have lots of high school age directed mail because other databases show me as a teenager. Oh, and Medicaid and insurance scams and political propaganda targeted at seniors -- I get literally dozens of those a week.
Re: (Score:2)
I know, I know, don't feed the trolls......
Re: (Score:2)
I will take your post at face value.
>> One is where the facts are embarrassing because of hypocrisy. That scenario is alleviated by loss of anonymity and increased transparency.
I have never heard this argument against ones ability to commit hypocrisy. This is interesting, although I do not support it because of believe in free speech.
>> Another is where the person is engaged in anti-social behavior because they are in a reactionary state and need assistance because they are going off the rails i
Re: (Score:2)
So you'll be posting your full name, home address, and telephone number to Slashdot?
Note, I hope you don't do that because it would be a very bad move on your part. However, it would also be consistent with your position.
I'm perfectly anonymous! (Score:3, Funny)
-- Anonymous Coward
Only three bits? (Score:5, Funny)
Holy hell forget about that anonymized data crap, I want to learn how she can compress that much data into three bits!
Re: (Score:2)
For values of "all" equal to or less than 8, to index a table...
Re: (Score:2)
I think you mean equal to or less than 7. You forgot to count zero.
Re: (Score:2)
No, I mean equal to or less than 8. When counting Americans, zero isn't a useful number, because if we used zero Americans for our research then we wouldn't have anything to publish.
Re: (Score:2)
...by which I mean, an empty table is != a table with one record, index # 000.
Re: (Score:2)
It's not like he's some kind of neuromancer.
Re: (Score:3, Funny)
Why, that's the simple part! You just use very big bits and hope they don't notice!
Mission Impossible (Score:5, Insightful)
I've pretty much given up any hope of being anonymous. It's just going to get exponentially more difficult as time goes on.
I had my credit card stolen once. It was stolen from the CC company. How is a business supposed to entrust me with thousands of dollars in credit if they don't know who I am? How is a credit card company supposed to function without a worldwide network which authorizes transactions.
If someone wants to find me they'll find me.
If someone wants to use my identity to frame me for a crime then they're just going to encounter a mountain of evidence from numerous sources which contradict their fabrication.
"My G1 was on a Starbucks Wifi at the time of the crime. I used my CC to purchase the drink. I received a text from a nearby tower. I posted a comment on breaking news story that is written in my style of writing. I was seen on 8 security cameras walking to the starbucks from my car. I used an automatic toll card 5 miles away from the coffee shop...." Good luck coming up with a large mountain of evidence to put me somewhere else.
Re: (Score:2, Insightful)
If access to the evidence you just stated was available to the framer it makes it very easy to find a likely fall guy according to their habits. Makes the alibi of overwhelming evidence evaporate into prime suspicion.
The best lies are those that are mostly truth.
Re: (Score:2)
So you're saying you robbed the coffee shop?
Re: (Score:2)
"You'll simply have committed the crime at a time and place for which you have no alibi, or in a way that makes the time and location irrelevant."
Given the existence of Twitter and GPS location, 'a time and place for which you have no alibi' would have to mean 'more than five minutes after my last tweet'.
I'm having trouble wrapping my head around that concept.
At least one fact about them could be used (Score:2)
[citation needed]
I can't think of anything I've done online (even my shemale midget fetish on youpron) that could be used to blackmail me, now i get that others are more ashamed about what they do online but "almost everybody"?
Re: (Score:3, Insightful)
I did think that was an overstatement that undermined the main point. None of my prescriptions would be embarassing to anyone but a holistic medicine believer, I've told some tasteless jokes online. If someone were to send that information to my family along with what porn I looked at, that would be awkward at most. And that's assuming it's credible, which it wouldn't be.
How exactly would this blackmail work? Bob, the evil co-worker threatens to tell your wife and boss you have had a sex change, a runni
Re: (Score:2)
Or, maybe, Bob, the evil co-worker, threatens to tell my wife and boss that I am a Nigerian prince who has obtained $ 1 billion USD in oil money that needs a US bank account to be successfully deposited...
(Unfortunately, I know that there is a fair amount of spam sent in my name. I get the backscatter from it.)
Re: (Score:2)
I'm skeptical about that claim, too, but I think the author also intended it to include real-world activities. For example, you've called in sick to work, but records of your activity suggest that you were actually at a job interview / romantic liaison / midget convention over on the other side of town.
Re: (Score:2)
I can't think of anything I've done online (even my shemale midget fetish on youpron) that could be used to blackmail me
Same here. However, the next bit of text is more relevant:
discriminate against, harass, or steal the identity of him or her. I mean more than mere embarrassment or inconvenience; I mean legally cognizable harm.
There's almost certainly something that can be used to discriminate against you, harass you, or steal your identity, causing legally cognizable harm. Blackmail is just for the people ashamed of what they do; the rest affects everyone.
Bah, humbug. (Score:3, Funny)
Forget anonymity. I'm better off living in a glass house, so it's easier for me to know when I need to yell "Get off my lawn!"
anonymization is bullshit (Score:2, Insightful)
Even if the data is completely and unreversably anonymized, it is still invasive. Look at the story yesterday about the marketers data-mining kids' online private conversations for consumer gadget preferences. Even if there's no way from that data to infer the preferences of any particular kid, they should still be able to talk to each other without having their conversation be part of a marketing survey.
Think also of a cafe that sells two kinds of food: apple pie (eaten by freedom-loving patriots), and f
Re: (Score:3, Interesting)
"Private should mean no disclosure, not anonymized disclosure, not aggregate disclosure, just plain no disclosure period."
The profit motive and privacy are at odds, trying to make the most money and sell the most stuff means you want to know everything about everyone so that you can one up you competitors, it's a race to the bottom. Ideals in the real world always submit to the pragmatic concerns of making money in a capitalist society.
Remeber "Mother Earth" and the Espionage Act (Score:3, Informative)
http://en.wikipedia.org/wiki/Mother_Earth_(magazine) [wikipedia.org]
Mother Earth was an anarchist journal that described itself as "A Monthly Magazine Devoted to Social Science and Literature," edited by Emma Goldman. Alexander Berkman, another well-known anarchist, was the magazine's editor from 1907 to 1915. It published longer articles on a variety of anarchist topics including the labor movement, education, literature and the arts, state and government control, and women's emancipation, sexual freedom, and was an early supporter of birth control. Its subscribers and supporters formed a virtual "who's who" of the radical left in America in the years prior to 1920.
In 1917, Mother Earth began to openly call for opposition to American entry into World War I and specifically to disobey government laws on conscription and registration for the military draft. On June 15, 1917, Congress passed the Espionage Act. The law set punishments for acts of interference in foreign policy and espionage. The Act authorized stiff fines and prison terms of up to 20 years for anyone who obstructed the military draft or encouraged "disloyalty" against the U.S. government. After Emma Goldman and Alexander Berkman continued to advocate against conscription, Goldman's offices at Mother Earth were thoroughly searched, and volumes of files and detailed subscription lists from Mother Earth, along with Berkman's journal The Blast, were seized. As a Justice Department news release reported:
"A wagon load of anarchist records and propaganda material was seized, and included in the lot is what is believed to be a complete registry of anarchy's friends in the United States. A splendidly kept card index was found, which the Federal agents believe will greatly simplify their task of identifying persons mentioned in the various record books and papers. The subscription lists of Mother Earth and The Blast, which contain 10,000 names, were also seized."
Mother Earth remained in monthly circulation until August 1917.[1] Berkman and Goldman were found guilty of violating the Espionage Act, (imprisoned for two years) and were later deported.
Re: (Score:2)
Labor organization is not Anarchism.
Most Anarchists aren't really Anarchists, they just oppose the current form of governance and want to replace it with something else.
Three things? Really? (Score:3, Insightful)
So, despite the Birthday Paradox [wikipedia.org], they can still identify 87% of Americans? For some reason I'm under the impression that there are a lot more zip codes with more than 366 people (heck, even 1000 to call upon 3 or 4 duplicates that should cover gender differences) than there are zip codes under that amount.
Re:Three things? Really? (Score:5, Informative)
That Paradox ignores the year. Add that in and it starts to become harder.
Re: (Score:3)
Your birthdate includes the year. Your birthday does not (at least for this discussion).
The party trick of finding two people with the same birthday (a good probability in any group of 30 people or more) doesn't require them to have the same year of birth (although in most gatherings there's a good chance of this as well since often it's already somewhat segregated by age).
Re: (Score:2, Informative)
Re: (Score:2)
Date of Birth != Annual Birthday
one being month/day/year the other being just month/day.
Re: (Score:2)
Well, as other people have pointed out, adding the year limits the number of collisions. So factor in year and maybe you need 80x the people to get the same obscurity. And you said 366 people. That eno
Couple of things.. (Score:5, Insightful)
Potential nitpick, but here goes.
The summary (not surprisingly for a /. summary) omits a couple of details that give the reader a rather partial picture.
For one, Paul Ohm is an Assistant Professor of law, and although the summary makes it sounds like the linked article would be from a technical perspective, (mostly) it is not.
A quote like:
"Data can either be useful or perfectly anonymous but never both."
needs a bit of background about the qualification of the person making that claim. Why? Simply because it sounds like a rather technical remark. If some computer science researcher made this claim, I would tend to take it more on the face value, otherwise I would take it with a grain of salt.
Now obviously this statement was not meant to be taken quite literally because the notion of "useful" is not precise. I can get reasonably useful information like "most of the people in my country like to buy branded stuff" or "most people who rent videos of actor X regularly, also rent the videos of actor Y regularly" without needing the underlying data to contain *any* personally identifiable information. The fact that extra data is store is a different thing.
I personally believe that instead of claiming that some researcher has argued X, it can be more informative to actually say what kind of researcher it is who made a claim. Not because only researchers in a certain area can be trusted, but because a little bit of background puts the claims in right perspective.
Err.. (Score:2)
English is not my first language, so I probably didn't catch the whole meaning, but...
The idea was that everyone can be identified with only the birth date, gender and ZIP code? So... err... There is, in fact, not even one ZIP code that has two people living there of the same gender that happen to share a birthday? Sure, to have the year coincide would take a bit more than just the date itself but it's hard for me to imagine that this could be true.
So... what did I miss?
Re: (Score:2)
You missed the 87% figure. For 13%, this data is insufficient. (6.5% will share their birthdate, zip, and gender with another 6.5%)
365*2 = 730 children per day per zip code could be uniquely identified using this information. If I understand this correctly, it implies that 730 / 93.5% = about 781 babies born per day per zip code (on a national average).
Re: (Score:2)
English is not my first language, so I probably didn't catch the whole meaning, but...
The idea was that everyone can be identified with only the birth date, gender and ZIP code? So... err... There is, in fact, not even one ZIP code that has two people living there of the same gender that happen to share a birthday? Sure, to have the year coincide would take a bit more than just the date itself but it's hard for me to imagine that this could be true.
So... what did I miss?
It takes more than just these three items. What was meant was that if you take these three items, and run them against a database of known items, you end up knowing more from the combination than from the two separately. In this case, if you have a database with redacted information, and a second, non related, database that happens to have the redacted elements from the first, by selecting a good set of common keys to run a union of the two, you can "un-redact" the missing information. Nothing new here. The
Re: (Score:2)
perhaps the 87% part?
Re: (Score:2)
Well, there is some collision, 13% of the people have one. But 87% don't. You pass the 87% likelihood that there will be at least one pair (ass
Re: (Score:2)
Sorry to respond to myself... but "You pass the 87% likelihood that there will be at least one pair (assuming equal birthdates over the last 80 years and equal gender ratios) at 58,438 people in a zip code." is a mistake.
SHould be: You pass the 87% likelihood that there will be at most one pair (assuming equal birthdates over the last 80 years and equal gender ratios) at 58,438 people in a zip code.
Anonymous can be useful.. (Score:5, Insightful)
Data can either be useful or perfectly anonymous but never both
What a load of bolaks....
Supposing you have a list of -just- birth dates for every citizen at the census. You -only- have only been given one piece of data per person, the date, nothing more. Just a huge list of dates, sorted chronologically.
1) The data has been totally anonymised.
2) You can do all kinds of meaningful analysis on the age demographics of the population. And make policy decisions based on that.
Fully anonymous data producing useful results.
Re: (Score:2)
Well, I think what your example demonstrates is that *application-specific* anonymization and, in your case, aggregation, can produce data that's both useful and actually anonymous. But I happen to agree with the article that, in the *general* case, it's impossible to take data and anonymize it in a way that retains it's usefulness across a large domain of potential applications while simultaneously protecting the anonymity of those in the database.
'course, when you think about it, that's common sense: To
Perfectly? (Score:2)
This is much too extreme. There are many good examples of useful data that is for almost all intents and purposes anonymous. Consider the example of anonymous lending libraries [wayner.org] from my book, Translucent Databases.
The simplest version just pushes the book title through a one-way function. The more complex version also hides the name in a similar way.
Can the anonymity be stripped away? There are coincidences and connections as Sweeney's examples and the Netflix examples show, but they can be fought by addi
Levels of anonymity? (Score:2)
Data can either be useful or perfectly anonymous but never both
I'm not sure I entirely agree with this statement. While it's tecnically correct, I believe it's misleading...
It's perfectly possible to hash personally identifiable information into an MD5 sum, to ensure that your records are unique, and then to generate useful statistics based on the resulting aggregate data without releasing significant personal information.
For instance:
Key = Hash(Your name + Your Zip + Your Birthday)
Zipcode
Birth Decade
Hobbie
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
At what point is it cheaper and more effective to hire a PI to follow me around and root through my garbage?
Re: (Score:2)
Yeah, but the whole point is, given a zip code, birth date, household income, and hobbies, I can probably figure out who you are.
Fundamentally, the issue is very simple: Given some sort of identifier, and a series of properties about that identifier, if you have enough dimensions of detail, you end up narrowing down your sample so much that you end up with a population of one, that being the person the identifier "hides". It's just that simple.
The only way to prevent this is to generate crosscuts of data
Re: (Score:2)
We go through the same basic process to find information through a search engine -- we attempt to find ways to narrow down the data in such a way that the information we are looking for exist w
Ohm is overwrought (Score:3, Informative)
I have worked with anonymized government data extensively, and birthdate and zipcode are always considered personally identifiable information. Sometimes birth year is available, and sometimes state or (rarely) county is available, but I have never even heard of a dataset with both. Datasets with month and day of birth are never considered to be anonymized, and are not released. The author of the paper is much overwrought.
I don't have the problem you suckers have (Score:2)
I have a twin brother living with me. Now try to identify me, Haha!
CT scans (Score:3, Insightful)
Have you ever thought about how a "cat" scan works? Forget the 3D aspects and let's just think about how the cross-sectional pictures work.
Every given reading, is just shooting a ray through the target, and getting a single number out. This is analogous to aggregate summaries are personal details in data. You know the average income of people in zip code 12345, but no specifics. The trick is, later, just as that CT scan is going to shoot a ray through a certain point again from a different direction, your personal details are going to be summarized again by someone else, in a different way.
A picture will emerge. The CT scan is going to "see" the bone as distinct from the tissue right here at this pixel, and this person's data will be un-summarized. It just takes enough rays, and eventually all ambiguity goes away.
A long time ago (about 20 years ago, I think?) there was a neato explanation of a cat scan algorithm in Scientific American. I wish I could find it. Because I bet you could show that article to any "database guy" these days, and they'd nod and smile.
Re: (Score:2)
Especially if you're outed by a friend who posts RL info without your permission that you can't retract.
Had this happen to me once.
Re: (Score:3, Insightful)
Especially if you're outed by a friend who posts RL info without your permission that you can't retract.
Had this happen to me once.
That's why you have to be very careful about who your friends are. I am no longer surprised by someone who "suddenly changed" because it is not really sudden at all, it is merely subtle before it becomes bleedin' obvious. Sorry to hear you got screwed.
And yes, I have trusted people I should not have trusted and gotten screwed. What I learned from it is that I ignored red flags and warning signs that should have tipped me off and so I set myself up for what should have been a predictable outcome. Usua
Re: (Score:3, Insightful)
By "outing" I mean the publication of ANY personal information that you'd prefer to keep secret.
Sexual orientation, which btw doesn't apply in this case to this 24 year old virgin, is only one of many possible "secrets" that can be blown.
Also, consider if I were an iranian dissident who had hard proof that the elections were rigged? Isn't it foreseeable that, with protesters being deemed to be against Allah by the powers that be, that outing me would put me in grave danger?
Anonymity is a precious thing tha
Re: (Score:2)
There may be only 365 possible birthdays, but birth *date* normally include the year of birth as well as the day.
Re: (Score:2)
Everyone was not born in the same year...
You have about 30,000 birth dates (that's about 80 years, so it's even bigger since people live longer than that often enough), 60,000 zip codes, and 2 sexes. So 3,600,000,000 combinations, 12 times the number of people.
Of course given demographics (see the baby boomers...) there's not exactly a good distribution of those birth dates. Far fewer living people were born on 1/1/1910 than on 1/1/1970.
Re: (Score:2)
http://en.wikipedia.org/wiki/Who
http://en.wikipedia.org/wiki/One_time_pad
Re: (Score:2)
It isn't a computer science article.
"Bits" is being used perfectly correctly for English, given the "a small piece or quantity of anything" definition my dictionary has.