Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage Biotech Science

Researchers 'Upgrade' DNA Alphabet Beyond A, C, G, T to Expand Data Storage (cnet.com) 75

"Every day, several petabytes of data are generated on the internet," says Kasra Tabatabaei, a researcher at the Beckman Institute for Advanced Science and Technology. "Only one gram of DNA would be sufficient to store that data."

So the Institute is now announcing the results of a project Tabatabaei worked on "to transform the double helix into a robust, sustainable data storage platform." CNET reports: Tabatabaei is the co-author of a new study, published in last month's edition of the journal Nano Letters... Essentially, the study team is the first to artificially extend the DNA alphabet, which could allow for massive storage capacities and accommodate a pretty extreme level of digital data.... DNA encodes genetic information with four molecules called nucleotides. There's adenine, guanine, cytosine and thymine, or A, G, C and T. In a sense, DNA has a four-letter alphabet, and different letter combinations represent different bits of data....

But what if we had a longer alphabet? Presumably, that'd give us a much deeper capacity. Following this line of thought, the team behind the new study artificially added seven new letters to the DNA repertoire.... "Instead of converting zeroes and ones to A, G, C and T, we can convert zeroes and ones to A, G, C, T and the seven new letters in the storage alphabet."

One of the study's co-principal investigators said their work "provides an exciting proof-of-principle demonstration of extending macromolecular data storage to non-natural chemistries, which hold the potential to drastically increase storage density in non-traditional storage media."
This discussion has been archived. No new comments can be posted.

Researchers 'Upgrade' DNA Alphabet Beyond A, C, G, T to Expand Data Storage

Comments Filter:
  • Comment removed based on user account deletion
    • One - can this technique ever unintentionally give rise to a new kind of life?

      No. A biological ribosome would have no way to interpret the new codons and no mechanism to synthesize the new nucleic acid monomers.

      Two - how long before someone intentionally incorporates this DNA into living cells (because you know someone will>?

      Someone may do it out of curiosity, but such a life form would be unlikely to survive and would have no obvious benefit. Life uses 64 possible nucleic acid triples to specify 23 different amino acids during protein synthesis. So there is already unused capacity. If life could benefit from additional DNA monomers, it would have evolved eons ago.

      • I'd like to add emphasis to something you said for anyone reading- the 22 different amino acids.
        You can have a quadruple helix with 12 monomers and 20,736 qaudruplets.
        If the 22 amino are what the rest of the chemistry have evolved to build are all there is, then there's not going to be a significant difference in the biochemistry of the life (minus the fact that it now needs to waste an enormous amount of energy on its DNA). While human DNA only codes for 22 proteinogenic amino acids, the body produces hu
        • by HiThere ( 15173 )

          The thing is, your supposition is false. There are lots more amino acids. Many of them are even used, just not in genetic coding.

          FWIW, there CAN be advantages to adding additional amino acids to the code, if only the disrupt viral activity. But the don't appear to be huge, and would be difficult to evolve, as you need to evolve several different pieces of the mechanism at the same time. There are a couple of bacteria that have been created of this nature, and their advantage isn't to the bacteria, it's

          • The thing is, your supposition is false. There are lots more amino acids. Many of them are even used, just not in genetic coding.

            The thing is, you seem to have difficulty reading.
            Allow me to quote myself:

            While human DNA only codes for 22 proteinogenic amino acids, the body produces hundreds.

            FWIW, there CAN be advantages to adding additional amino acids to the code, if only the disrupt viral activity.

            Objection, speculation.
            I argue that if there were an advantage, life would have evolved the ability to code for them, rather than methods of catalyzing them after ribosome assembly.
            Evolution would suggest that the cost of encoding more is less efficient.
            I'd love to see some evidence backing up your assertion, though.

            But the don't appear to be huge, and would be difficult to evolve, as you need to evolve several different pieces of the mechanism at the same time.

            Ya, that's not backed up by any kind of evidence whatsoever.
            All kinds of life exist with all kinds of goofy metabo

            • by DRJlaw ( 946416 )

              This is nonsensical.
              Life does not rely on the existence of amino acids. It produces them.

              If you're attempting to imply that they create an expanded genetic code (I don't believe you) and that it couldn't produce the amino acids that the codons coded for due to ribosomal insufficiency, then, well duh.

              But until you provide a citation for that, I'm going to call bullshit.

              Citation [nature.com].

              We have therefore carried out directed evolution experiments with an orthogonal translation system that inserts 3-nitro-L-tyrosine a

              • This news is four years old. If you're going to pretend to be an authority, at least be up to date.

                If you're going to pretend to be a lawyer, at least learn to follow a discussion.

                FWIW, there CAN be advantages to adding additional amino acids to the code, if only the disrupt viral activity. But the don't appear to be huge, and would be difficult to evolve, as you need to evolve several different pieces of the mechanism at the same time. There are a couple of bacteria that have been created of this nature, and their advantage isn't to the bacteria, it's to the researchers, as the bacteria can't live without being supplied with amino acids that aren't found in the normal ecosystem.

                The life created didn't contain an expanded genetic code, per se. It contained a codon translation ambiguity (something that doesn't exist in any of the several dozen known genetic codes), and they then watched to see how the organism would resolve the ambiguity.
                Which is really cool, but not at all related to what the person I was replying to was talking about.

                • by DRJlaw ( 946416 )

                  If you're going to pretend to be a lawyer, at least learn to follow a discussion.

                  Oh, I can. The problem is that you can't.

                  HiThere's comment:
                  FWIW, there CAN be advantages to adding additional amino acids to the code... There are a couple of bacteria that have been created of this nature, and their advantage isn't to the bacteria, it's to the researchers, as the bacteria can't live without being supplied with amino acids that aren't found in the normal ecosystem. (So one of the things a life form would need

                  • Oh, I can. The problem is that you can't.

                    Clearly not.

                    FWIW, there CAN be advantages to adding additional amino acids to the code... There are a couple of bacteria that have been created of this nature, and their advantage isn't to the bacteria, it's to the researchers, as the bacteria can't live without being supplied with amino acids that aren't found in the normal ecosystem. (So one of the things a life form would need to ensure was that it could synthesize the new amino acids.)

                    The problem is that is incorrect, and the citation you use does not describe anything relating to that.

                    It's exactly what the person you were replying to was writing about. It may not have been what you were writing about, but that would be because you've mentally substituted "nucleic acid" for "amino acid," which is a pretty fundamental error.

                    No, it is not. The person above clearly did not describe creating a codon ambiguity and see how life attempted to survive this nearly-universally-fatal situation.
                    Ask yourself, when you create an ambiguity between a stop codon, and any amino acid, proteinogenic or otherwise, what's the outcome?
                    Precisely nowhere were "nucleic acid" and "amino acid" misused or misunderstood. You're the only per

                    • by DRJlaw ( 946416 )

                      Clearly, as you can see, we're talking about ncAAs. Both of us are.
                      However, the article you cite does not describe a process of this nature.
                      It does not describe the creation of a bacteria that codes for an ncAA that it cannot produce, which is what this person is describing.

                      Ummm, yes it does. You're just a moron.

                      Here we utilize an engineered [beta]-lactamase (bla) that is structurally dependent on OTS incorporation of the ncAA 3-nitro-L-tyrosine (3nY)9. This 'addicted' bla has allowed us to overcome fitnes

                    • You literally quote how you're wrong, and you don't even see it.
                      They literally describe right there how the organism is capable of synthesizing the ncAA, in this case ncAA 3-nitro-L-tyrosine (3nY)9.
                      Further, in order to enforce it staying in the genome, they engineered beta-lactamase to use the ncAA, because the organism would lose it otherwise. Groups showed successful replication both in absence of the ncAA available environmentally, and presence.

                      When you're in a court room, and a lawyer proves that yo
                    • by DRJlaw ( 946416 )

                      You literally quote how you're wrong, and you don't even see it.
                      They literally describe right there how the organism is capable of synthesizing the ncAA, in this case ncAA 3-nitro-L-tyrosine (3nY)9.

                      Wrong. They literally describe how the organism is capable of synthesizing [beta]-lactamase by incorporating the ncAA 3-nitro-L-tyrosine (3nY). You know, the non-canonical amino acid that they repeatedly describe adding to the growth medium.

                      Again,
                      Here we utilize an engineered [beta]-lactamase (bla) that is struc

                    • Wrong. They literally describe how the organism is capable of synthesizing [beta]-lactamase by incorporating the ncAA 3-nitro-L-tyrosine (3nY). You know, the non-canonical amino acid that they repeatedly describe adding to the growth medium.

                      Heh. Wow. Alright, back to Cite 1.

                      Here we utilize an engineered -lactamase (bla) that is structurally dependent on OTS incorporation of the ncAA 3-nitro-L-tyrosine (3nY)9

                      Expanding the standard set of proteinogenic amino acids can be accomplished through changes to the underlying translational machinery. Orthogonal translation systems (OTSs) comprising aminoacyl-tRNA synthetase (aaRS)/suppressor tRNA pairs have been developed that do not significantly interact with the host translational machinery or interfere with already occupied portions of the genetic code8,9,10.

                      I'm sorry that you're scientifically illiterate. Really, I am.

                      the ncAA was added to the growth medium to test if that altered the outcome of the ambiguity of the codon translation, and dependence of the genome to the ncAA. This is why they give results for both the ncAA existing, and not existing, in the growth medium.

                      Each media condition was supplemented with 10 mM 3nY, matching the concentration of L-serine, the most abundant amino acid in RDM.

                      That's out of context, of course, but yes, it's close to correct.
                      Let's look at the test group instead of the media they evolved samples in:

                      In general, doubling times were measured in the absence of ceftazidime, and in three different amino acid environments: without ncAA, with 3nY, or with 3iY. Additionally, doubling times were measured with 2 or 22gmL1 CAZ (progenitor and evolved cells, respectively), both with and without 3nY.

                      If you look at

                    • by DRJlaw ( 946416 )

                      I'm sorry that you're scientifically illiterate. Really, I am.

                      Again with the projecting.

                      If you look at Fig 3., you will see that without supplementation was also tested, of course, and showed the lowest doubling time (highest growth) of all groups.
                      Further, you'll see that neither OTP or supplementation of 3nY had an appreciable difference between the control and addicted groups, minus RDM-13.

                      If you grossly misinterpret the text concerning Fig. 3 and its contents, sure. If you bothered to look at Supplement [springer.com]

                    • by DRJlaw ( 946416 )

                      So, should I assume from your sudden interest in Russian grammar [slashdot.org] that you haven't found a novel way to reinterpret the growth curves [springer.com] of Supplemental Figure 1 in citation 1 [nature.com] to explain away how the pADDCITED strains depend upon exogenous 3-nitro-L-tyrosine to inactivate the antibiotic? You know, strains dependent upon "an ncAA that [they] can synthesize just fine," but in each and every case don't?

                      For someone who recently instructed me [slashdot.org] that:

                      You made a citation without understanding it's content. It's ok, it

                    • by DRJlaw ( 946416 )

                      So, should I assume from your sudden interest in Russian grammar and internet connectivity [slashdot.org] that you haven't found a novel way to reinterpret the growth curves [springer.com] of Supplemental Figure 1 in citation 1 [nature.com] to explain away how the pADDICTED strains depend upon exogenous 3-nitro-L-tyrosine to inactivate the antibiotic? You know, strains that you claim are dependent upon "an ncAA that [they] can synthesize just fine," but in each and every case don't?

                      For someone who recently instructed me [slashdot.org] that:

                      You made a citation wit

      • by Viol8 ( 599362 )

        I suspect there's probably a good reason evolution settle on 4 bases and I wouldn't be surprised if its do to with reducing complexity, copying efficiency and/or long term stability.

        • by HiThere ( 15173 )

          That's plausible, but it could also be happenstance. Like five fingers. We know that's happenstance by looking at, e.g., horses. Certain purposes are advantaged by certain configurations, others by others. The one that survives is the one that's "good enough" in the various environments that it encounters. (Do you think you really get much value out of your little toe?) But once a choice is "fixed" in a certain gene line, other processes evolve to act optimally in its presence. So happenstance is a r

          • Comment removed based on user account deletion
            • by HiThere ( 15173 )

              IF you assume that that's the only change, then yes. But evolution doesn't work that way. Most cats have four toes on their hind feet. But there are exceptions. https://www.halifaxvethospital... [halifaxvethospital.com]

              • Comment removed based on user account deletion
                • by Viol8 ( 599362 )

                  Or a quadroped becomes aquatic with no ability to walk at all, eg cetations.

                  • by HiThere ( 15173 )

                    Well, that has lots of intermediate stages. Losing a toe and then adapting to that loss are much simpler changes that one can see ongoing right now by looking at variations within a single species. (Although domestic cats aren't really a good example. And that example is more about adding a toe.)

          • Like five fingers. We know that's happenstance by looking at, e.g., horses

            Acanthostega and Ichthyostega would be better comparisons than horses. Horses have (today) one (rarely three) digits because they descended from ancestors with five digits. (And sometimes the suppression of development of those other digits goes wrong ... which is why you get the occasional three-toed horse.) It's one of the better documented sets of "transitional species" in the record.

            Acanthostega [wikipedia.org] and Ichthyostega [wikipedia.org] on the other h

      • A biological ribosome would have no way to interpret the new codons and no mechanism to synthesize the new nucleic acid monomers.

        Life... hum.... finds a way. /JurassicParkMeme

    • No & never.

    • You should watch X-Files. It will give you all the answers. ;)
      • Nah, they clearly state their disclaimer "the truth is out there", so not _in_ there. Look elsewhere.
    • This is going to give me nightmares. Never never must 4chan be allowed to be coded into DNA if it became alive!!

    • Someone will do it immediately if they include the right letters . . .
      D, E, Z, N, U and S specifically, or B, O, S

      Followed by corporate sponsorship . . . actually, this seems less stupid than NFTs now that I think of it.
  • This has already been done by a group in Florida and a few other studies. https://www.nature.com/article... [nature.com]

    I would rather have my long-term storage done in something like a crystal that can't be wiped out by a grill or a bottle of bleach, but perhaps the storage density of DNA is so superior that it's better, but the equipment to read the DNA would seem to be huge.

  • does 11 decode to 10d? what even is the 10th dimension?

  • Are you telling me the breakthrough was to not store a t g c as ascii and rather as a 2 bit binary?
    • Comment removed based on user account deletion
    • Almost, I think they also applied some simple principles of compression by looking for most common combinations as well (they say they use AI for it, not sure why AI was needed, but I guess it makes it mode fancy). I think what we see here is a lack of cross discipline collaboration, which is surprisingly common - one branch of science is struggling with a problem which has been solved long ago in another branch, then they make the "invention" and follow similar path, just years behind, purely because they
      • You have this all completely wrong. Itâ(TM)s not about representing DNA in computer memory - itâ(TM)s about storing information in DNA. At the moment there are 4 possible molecules, so to encode for example 1 byte of data as DNA you need 4 molecules (4*4*4*4 = 256 possibilities). They are saying they have a way to make 7 more molecules (that are artificial) that work and can go into DNA the same as the naturally occurring ones which extends it to 11 possibilities per molecule.
        • Thanks, that makes much better sense.
        • Ah, I completely misunderstood the article then. So they are using DNA as computer storage then? Is it still self-replicating with the new artificial molecules?
          • So they are using DNA as computer storage then?

            That is one of the directions people are going in. Because it approaches 1 bit per (approx) 100 atoms, the data density is high (as TFS says). A 1nm microprocessor fab would, OTOH, store data at about 1bit per 250 atoms - and we're a long way from 1nm fabs at the moment.

            DNA is also not terrible on the chemical stability front too. Keep it at low oxygen levels and liquid nitrogen temperatures, and it's quite resilient.

            Is it still self-replicating with the new

        • Comment removed based on user account deletion
    • Nah, it's related to how nucleotides self-identify. "I'm a G but I self-identify as a Q, personal pronoun 'flibbertigibbet'".
  • They shouldn't have announced with an odd number of new "letters" to add. If they plan to use them all, that means that at least two will be bonding on one side to the same letter on the other side, which would lead to encoding errors. If they want a robust system they would need 8 new letters, 4 pairs of 2. 6 would be better than 7 as long as they are 3 pairs that each uniquely fit with only one other, like our natural DNA does. They have a lot more work to do if 7 is really where they are at. You'd think

    • by Zocalo ( 252965 )
      I'd assume they are still working on adding further letters, but it really depends on their precise structure and just how how the new molecules bind into the DNA chain; DNA molecules basically have three connectors, across the chain and either side along it, so introducing new combinations of connectors makes things a lot more flexible in what sequences are valid.

      For instance, with some spare letters you have the possibility of being able to have two letters represent the same binary pair, which might e
    • by Entrope ( 68843 )

      I was wondering the same thing, and disappointed (but not surprised) that TFA doesn't answer the question. Maybe one of the bases pairs with itself.

    • CHKSUM bit my friend!
  • A base 11 number system is just what we needed

    • by Snard ( 61584 )
      11? That's ridiculous, it's not even funny. (https://www.youtube.com/watch?v=QDmWYVdN8ug&ab_channel=NegativlandVideo)
    • But why don't you just make the 10 number system store more?
  • I know non-coders thinking now that I'm dissing the data structures but I implore you, ladies and gentlemen.
  • "...of extending macromolecular data storage to non-natural chemistries..."
    Great! What could go wrong?

  • What's the point of using DNA for data storage? Unless you're writing science fiction stories and need a plot device, anyway...

    • In theory it is very compact. Most digital storage technologies are either 2d, or just multiple layers of 2d. With DNA you get a string of linear data, but it can pack itself in dense 3d space.
      I think that practical difficulties (access time, error rate, lack of durability, etc) will make it mostly useless anyway.

  • I do hope no VIRUS (Covid-19????) MALWARE or TROJAN gets into your DNA fiddling you understand so little....
  • Excellent a new super battery! This will solve all our problems. This is a new super battery, right? Where's the daily super battery article?
  • ... AND poured freon on my workstation when the fan died!

    Meet the BOFH with 7 extra nucleotides.

    --
    A world "economy" based on the automated delivery of natural resources and manufactured goods to areas that want them is surely preferable to a global oligarchy of bankers charging us interest on breathing.

  • Such a nice post i found here on your webpage thanks for it and keep sharing more posts, really very informative. BUY OFFICE 2016 HOME AND BUSINESS [digitalsof...ompany.com]
  • This is not a great way to improve storage density. Biological DNA is essentially base 4 (symbols A,C,G,T). You would need twelve more letters to double storage density, i.e. to get to base 16 or hexadecimal. (You can represent any hex digit as two nibbles of base-4.) Adding that many letters, each a distinct amino acid, along with the necessary 12 transfer RNAs and suitable transcription enzymes would be complicated and could increase error rates. Biological DNA already has very high data storage density.
  • Thought they tried that in the Species franchise and only ended up in a lot of bad sex.

  • by Wolfrider ( 856 )

    So much for the movie "Gattaca" ;-)

  • The standard 4 nucleotides work well for compressing efficiently to a binary alphabet (2 bits per nucleotide). Why not expand to a total of 16 nucleotides, for efficient compression of 1 nibble per nucleotide? btw: This study has nothing to do with "life," because just making some rando macromolecules doesn't confer them with life (e.g., the means of sustaining themselves and self-replicating). Most people don't appreciate the high specificity of enzymatic reactions and biomolecular interactions. This spec
  • I call dibs on 'B'.

  • I first heard about DNA storage back in the 1990s. Why can't I buy a DNA hard disk by now?

A computer lets you make more mistakes faster than any other invention, with the possible exceptions of handguns and Tequilla. -- Mitch Ratcliffe

Working...