Encrypted DNA Storage Investigated by DOE Researchers (darkreading.com) 42
Biological engineers at a Department of Energy lab "are experimenting with encrypted DNA storage for archival applications." Slashdot reader ancientribe shares an article from Dark Reading:
Using this method, the researchers could theoretically store 2.2 petabytes of information in one gram of DNA. That's 200 times the printed material at the Library of Congress... Instead of needing a 15,000 square-foot building to store 35,000 boxes of inactive records and archival documents, Sandia National Laboratories can potentially store information on much less paper, in powder form, in test tubes or petri dishes, or even as a bacterial cell... "Hard drives fail and very often the data can't be recovered," explains Bachand. "With DNA, it's possible to recover strands that are 10,000 to 20,000 years old... even if someone sneezes and the powder is lost, it's possible to recover all the information by just recovering one DNA molecule."
Mutation (Score:1, Insightful)
You'd need robust error detection and correction because of mutation and damage.
But copying seems trivial.
Re: (Score:2)
Re:Mutation (Score:4, Insightful)
I think mutation isn't really that much of an issue if the DNA isn't actually doing anything (being duplicated or transcribed to RNA).
It's supposed to be one of the more stable ways of storing data, much better than tape in fact [splice-bio.com]. What I'd be more worried about is reading it again - current ways of reading DNA can misread it and have problems with long sequences of the same base pair, so some kind of an encoding to avoid those would be needed.
Re: (Score:2)
It claims DNA can remain stable for more than 500 years. And the life expectancy of tape only appears to be 30 years or so (found from other sources).
What I find surprising is that (printed)CDs don't have a much longer lifespan, but it seems they are prone to corrode.
Re: (Score:1)
Re:Mutation (Score:5, Funny)
You'd need robust error detection and correction because of mutation and damage.
We already have that [wikipedia.org]. There are a few billion years of prior art.
But copying seems trivial.
The hard part is writing the device driver to interface the ribosome to /dev/dna.
Re: (Score:2)
But copying seems trivial.
The hard part is writing the device driver to interface the ribosome to /dev/dna.
Will we rephrase Darwin Awards as storing to /dev/null?
The sheer scale of it (Score:3)
Re: (Score:2)
At a storage symposium a couple of years ago I met my equivalent in the DNA research sphere, his data requirements blew me away. And all encoded in my cells.
If he is storing human DNA data, he is doing something wrong. A human has about 4 billion base pairs, which are roughly 2 bits each, so that is 500 MB. You could fit that on a CDROM with room to spare. But humans share 99% of their DNA, so you would really only need to store the diffs. 1% is 5 MB. But even that is overkill, since humans don't differ from each other randomly, but in common sequences where you have either one sequence or another across wide segments of the population. So a human's DNA c
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
Re: (Score:2)
A human has about 4 billion base pairs, which are roughly 2 bits each, so that is 500 MB. You could fit that on a CDROM with room to spare. But humans share 99% of their DNA, so you would really only need to store the diffs. 1% is 5 MB.
A copy of the (haploid) reference genome encoded as 2 bits per base pair comes in at about 800MB:
http://hgdownload.soe.ucsc.edu... [ucsc.edu]
Run that through something like Z-zip and you can store it in less that 640MB, so it will indeed fit on a CD. Each of us has a diploid genome, though (a copy from each parent), so you really need to store double that if you take no account of the high level of similarity between both copies. If we assume a known reference genome, however, the 'diffs' are as you suggest very small
Harry Harrison had it decades ago (Score:2)
see subject
DNA storage capacity seems to be wildly overstated (Score:1)
Whenever the press covers the "data storage in DNA"-topic, they boast about huge storage capacities based he assumption that you can basically store 2 bits per base pair. But DNA has not quite evolved to be a long-term mass-storage device. DNA is rather an energy-efficient way to store relatively small amounts of data (~0.8 GB of very redundant data in a human) that exists in so many copies (billions in a human) that it doesn't matter too much if millions of those billions of copies suffer some "bit rot" ov
Re: (Score:2)
True enough. Although looking at the figures given in the summary, there's one hell of a lot of redundancy in their 2.2 petabyte/gram estimate. Looking up the molecular masses of the base pairs plus the sugar chain to make up a DNA molecule and assuming 2 bits per base pair, I get approximately 160,000 petabytes per gram of material (no redundancy), so the estimate given in the summary has a redundancy factor of about 73,000.
Too long to handle? (Score:2)
1 gram of dna * 1 complete strand of dna / (3.59 x 10^-12 grams) = 278 x 10^9 strands = 278,000,000,000 strands of dna.
Length of human dna stretched out: about 2 meters
(278 x 10^9 strands) * (2 meters / strand) = 554 x 10^9 meters
I can't conceive of how you can organize that in order to read it.
Then again, I don't know the length of a blu-ray, if you could unravel it and stretch it out straight. Or that of a record.
Already done (Score:2)
Quartz Glass (Score:4, Informative)
DNA degrades after just a few years (Score:3)
I work for a DNA lab. After about 10 years, DNA samples that have been sent to us are basically unusable because they degrade over time. Sure, it might be possible to still read some strands of the remaining DNA, but significant percentages are lost. DNA archaeologists don't mind, because they are looking for whatever fragments they can still read. But if they required most of the DNA to be readable after long periods of time, they would be out of luck.
Re: (Score:1)
Reading around a bit I think you must be receiving DNA from live samples? Every article on this subject refers to various materials in organisms that will be mixed with samples that will cause DNA to degrade.
You do however read about ideal conditions. Those would be the conditions these DNA data storage schemes are talking about. The DNA is synthesized and the end product is just the DNA.
Also, the lengths involved aren't going to be huge in the schemes I've read about. You're going to have lots of short
DNA reading techniques require massive redundancy (Score:2)
Today's DNA reading techniques begin with PCR, a process that multiplies small amounts of DNA so that millions of copies are made. These copies are needed to be accurately read by the equipment, in order to distinguish between "good" copies and noise. Getting the results amounts to statistical analysis of the number of A, T, C, or G results read at a certain location; a "call" can be made only if a high enough percentage of the results agree.
The bit density claims are massively overstated, and reading the
10,000 times slower/costly to write than read DNA (Score:2)