Pocket-Sized DNA Reader Used To Scan Entire Human Genome Sequence (arstechnica.com) 76
An anonymous reader quotes a report from Ars Technica: A few years back, a company called Oxford Nanopore announced it was developing a radically different way of sequencing DNA. Its approach involved taking single strands of the double helix and stuffing them through a protein pore. With a small bit of current flowing across the pore, the four bases of DNA each created a distinct (if tiny) change in the voltage as it passed through. These could be used to read the DNA one base at a time as it wiggled through the pore. After several years of slow progress, Oxford Nanopore announced that its sequencing hardware would be as distinctive as its wetware: a USB device that could fit comfortably in a person's hand. As the first devices went out to users, it became clear that the device had some pros and cons. On the plus side, the device was quick and could be used without requiring a large facility to support it. It could also read very long stretches of DNA at once. But the downside was significant: it made lots of mistakes.
With a few years of experience, people are now starting to learn to make the most of the devices, as demonstrated by a new paper in which researchers use it to help sequence a human genome. By using the machine's long reads -- in one case, nearly 900,000 bases from one DNA molecule -- the authors were able to get data out of areas of the human genome that resisted characterization before. And they were able to distinguish between the two sets of chromosomes (one from mom, one from dad) and locate areas of epigenetic control in many areas of the genome. In light of all the distinct information it can provide, the machine's error rate is seeming like less of a problem.
With a few years of experience, people are now starting to learn to make the most of the devices, as demonstrated by a new paper in which researchers use it to help sequence a human genome. By using the machine's long reads -- in one case, nearly 900,000 bases from one DNA molecule -- the authors were able to get data out of areas of the human genome that resisted characterization before. And they were able to distinguish between the two sets of chromosomes (one from mom, one from dad) and locate areas of epigenetic control in many areas of the genome. In light of all the distinct information it can provide, the machine's error rate is seeming like less of a problem.
GATTACA, anyone? (Score:1)
Just remember it was a cautionary tale and NOT an operations manual.
Re:GATTACA, anyone? (Score:4, Insightful)
Just remember it was a cautionary tale and NOT an operations manual.
Don't be ridiculous! I mean, everyone knows that 1984 is the real instruction manual. ;)
Re: (Score:1, Troll)
Don't be ridiculous! I mean, everyone knows that 1984 is the real instruction manual. ;)
No, it isn't. Orwell was an optimist.
Re: (Score:2)
our brains already do the gross DNA analysis with sexism, racism and stereotypes, this is just a fine tuning.
Re: (Score:1)
Not quite.
The problem with stereotypes is that the same inductive reasoning that goes into creating superstitions is the basis for stereotypes. And while that kind of reasoning has it's uses, well. The turkey that walks up to the farmer gets a treat every day of it's life... until Thanksgiving.
In particular there are two ways in which stereotypes fail to be useful:
1. They presume correlation is causation.
The problem here is if (for example) it isn't that black people are dumb and prone to criminal behavior
Re: (Score:2)
actually, many stereotypes are VERY useful, if something is true 95% of the time it's a useful generalization.
Also, the turkey was doomed anyway, going for treat or not. So being friendly to farmer made life happier.
Who remembers the Human Genome Project? (Score:2)
This reminds me of the Human Genome Project. After a few years of trying to get funding for a fifteen year project to sequence the entire human genome, the Reagan administration allocated $3 billion to get started. It was "finished" 13 years later. Now this iPhone doohickey does it in seconds or minutes.
Easy Fix (Score:2, Insightful)
Just do multiple passes and match the commonalities. Should be an easy way to sort out the errors and make it much more accurate
Re:Easy Fix (Score:5, Informative)
That's what we do now with short reads. It kind of works, but only because we understand in a lot of detail about how errors happen.
For example, 454 sequencing tends to get the number of nucleotides in a repeat sequence wrong. So, for example, CTAAAGT might be read as CTAAAAGT. Illumina sequencing doesn't have that problem, but tends to degrade along the length of the read. So the last few nucleotides are more likely to be wrong than the first few.
And this is just read errors; with short-read sequencing, there are also PCR amplification errors, which is why we think nanopore sequencing will do better. When you start "unwinding" a chromosome, the parts that you unwind first tend to get amplified more than the parts that you unwind nearer to the end. Some sequences are amplified more than others for chemical reasons, and the relative error might depend on the specific revision of reagent chemicals.
We don't really understand enough about nanopore sequencing to be able to develop appropriate algorithms to match long-read sequences together. We don't even know what the right number of multiple passes is yet. And that's important, because genomics and transcriptonomics are important, but the bigger issue for researchers is economics.
Re: (Score:1)
Isn't the Illumina problem fixed by paired end reads on the rather short fragments?
Re: (Score:3)
Yes and no. Paired end reads give you either longer reads or longer range information. The problem isn't fixed because as the technology gets better we just push up the read length.
Re: (Score:2)
With machine learning, you can theoretically use a known set of good DNA reads to determine what needs adjustment. That, of course, requires a human to train the machine learning algorithm to better-interpret the data and learn properly. It also requires a lot of manual setup reading and rereading known DNA, as well as making adjustments to the hardware to decrease its error rate as you discover particular error conditions for which you can correct directly.
Even with all the manual work involved, it's g
Re: (Score:3)
"Just do multiple passes and match the commonalities. Should be an easy way to sort out the errors and make it much more accurate"
Just like an idiot calculating stuff, making him do it multiple times and he'll will be a stable genius.
I would be interested (Score:2)
Re: (Score:2)
Your own, if you play "pocket pool"
Re: (Score:1)
Is that your fantasy about me; sorry I'm straight.
Meanwhile, we found several faggots' DNA in your underwear's rear panel. You're what the Navy calls "a friendly port".
Re: (Score:2)
You can find out yourself for the low, low price of $1000 USD. ... or wait a few months for SmidION to come out, which will be a bit cheaper, and plug into your iPhone or Android device.
Re: (Score:2)
Can't wait for Apple/Google to have my sequenced DNA information... what could possibly go wrong?!
Re: (Score:3)
String or nothing!
Wrong use. (Score:2)
Just need to ID marker DNA sequences not the whole thing.
Scanning for Flu. Searching for H1N1, Negative, H1N2 Negative........... H2N3 POSITIVE! Confirmation Scan? Y/N?
Re:Wrong use. (Score:5, Informative)
Direct RNA sequencing can be done with the MinION as well, no hardware change needed:
https://store.nanoporetech.com... [nanoporetech.com]
Depending on how important it is to sequence all RNA, polyadenylation prior to sequencing might also be needed.
Re: (Score:1)
EBOLAIDS ... POSITIVE! Confirm plane reservation to Madagascar? Y/N?
I have one of these... (Score:5, Interesting)
... and it (kinda) works as advertised. It is also VERY low cost (compared to the previous generation of sequencing machines which cost 700K and up, it costs about $1K). The main disadvantages are that 1) it's still inaccurate, maybe only in the ~90% accuracy rate (not a good thing when you're reading 3B base pairs) and 2) the reagents and flow cell used are expensive (so on big jobs you're almost better off using a traditional sequencer). Still, it does do LONG reads which gets over one of the big disadvantages of the previous gen. machines.
Even with a high error rate, if the errors are UNBIASED then you can overcome them by simply sequencing the same area over and over again to come up with a consensus. This is called "coverage" and usually a factor of 10X is used but if the sequencing technology is cheap enough why not do it 30X or 100X or more?
For us citizen scientists, you'll still need a way of processing and purifying your DNA, I'm trying to get a Bento Lab (hopefully shipping in a month or two). Also the technology will hopefully get better and better, the next version will supposedly have the nanopore membrane separate from the flow cell so the whole thing won't have to be replaced when the membrane is used up. (The version after THAT supposedly will a tiny device directly attachable to an iPhone with an even tinier replaceable membrane so maybe it'll become really cheap to sequence DNA; at parties even :). Finally, I think they may be moving to freeze dried or otherwise non-perishable reagents so the storage requirements will become a little easier (I have a dedicated battery backed freezer at home).
Now with CRISPR kits for only $40, there's no end to the fun (and disasters) that we can do with our basement genetic experiments!
I should mention you'll need a little lab experience and know how to use a pipette and have steady hands! Go take some courses at the local community college and you'll be good to go. (Of course in order to interpret your results you'll need to study BioInformatics, my specialty :)
Re: (Score:3)
Those algorithms, largely based on De Bruijn graph methods, are specifically designed to handle the short-read, high-coverage case. There's no reason to think that they will work well on the long-read low-coverage case.
You might be better off just BLASTing them together.
Re: (Score:2)
minimap2 [github.com] works better for long reads, and can be used in the Canu [readthedocs.io] assembler as the overlapper component for doing read correction.
Re: I have one of these... (Score:1)
I work for a major company pursuing orders of magnitude synthesis and sequencing more than just about anybody else in the world. We have a bunch of these things in addition to the more traditional sequencers. They fit our long read pipeline very nicely but I'd hesitate to use them on their own.
Re:I have one of these... (Score:4, Interesting)
it's still inaccurate, maybe only in the ~90% accuracy rate (not a good thing when you're reading 3B base pairs)
Former de novo assembly software writer here. Do we have a good handle on the kinds of errors that you tend to find? You know how 454 reads tends to miscount repeat sequences and Illumina tends to decline in quality along the read. Do we understand where the errors come from?
Also, are the errors correlated? If you try to sequence the same 500k read twice, will it make errors in the same places?
Re: (Score:2)
Right now regular Illumina short reads with a little bit of long reads are enough to get phased SNP information from most relevant parts of a human genome. It's also cheaper at scale, human genome sequencing at 3x can be done for less than $500.
With de-novo assembly it's a bit different. Nanopores pr
Re: (Score:2)
Right. So for de novo (as noted, that was my field) it seems to me that the best approach might be to build and clean up a de Bruijn graph from short reads, and then align long reads to the graph to get contigs.
Re: I have one of these... (Score:2)
Re: (Score:2)
The benefit of doing it the other way is you can use existing efficient graph cleanup algorithms like tour bus.
It will be interesting.
Re: (Score:2)
The benefit of creating scaffolds first from long reads is that it's a lot easier to capture regions where there is a Very-long Complex Tandem Repeat (VeCTR). These regions are collapsed in scaffolds assembled from short reads.
Re: (Score:2)
That case would still work because CTRs correspond to a loop in the de Bruijn graph. The theory is that all true contigs are paths in the graph, and you can use the long reads to find each one.
But I agree that you could do it either way and we don't know which one would be better until we have more experience.
Re: (Score:2)
Not necessarily. If the unit length of the repeat is greater than the fragment length (I've seen tandem repeats with unit lengths of 40 kb), then the region will not be detected as repetitive.
Re: (Score:1)
Fuck the iPhone accessory thing.
I want my DNA sequenced, but I don't want to hand it to some bullshit Cloud AI IoT App company that will sell my DNA to advertisers.
"Hi! I see your sequence here is AGTAGG, would you like some hard liquor?"
Re: Use this to scan a woman's vagina (Score:3)
Slashdot trolls in 2017 are fucking lame. Come on man put some effort into it. This is not /b/, itâ(TM)s Slashdot and our trolls traditionally put effort into their work
Re: (Score:2)
They're even worse this year:)
Re: Use this to scan a woman's vagina (Score:2)
Fair point!
Re: (Score:2)
Allow me to be the first to wish you a happy New year:)
Cheers!
Q&A from a previous time (Score:5, Insightful)
I did a Q&A on this sequencer on SoylentNews a couple of years ago:
https://soylentnews.org/articl... [soylentnews.org]
The technology has improved substantially since then. Feel free to ask me any more questions about the sequencing. Although I'm not an author on this paper, I'm fairly familiar with the sequencing project that was done, and am happy to answer any general questions you might have on this technology.
Re: (Score:2)
Are the errors random, or are they consistent? That is, can we just run strands through enough times to get the error rates down to acceptable levels?
Re:Q&A from a previous time (Score:5, Informative)
Some errors are random, some are systematic. The systematic errors tend to be either small shifts in long stretches of the same base, or interesting features of the DNA (e.g. methylation), and there are a few people trying to work out what those interesting features are.
A key obstacle to getting people interested in nanopore sequencing (or other types of observational sequencing) is that we have been locked in for so long to the idea of DNA as a sequence of letters that we forget there are other things attached to it that also have functional roles. Nanopore is more accurate when matching sequences at the signal/electrical level, but almost no one is doing that yet.
Reminds me proteomics (Score:2)
Nanopore is more accurate when matching sequences at the signal/electrical level, but almost no one is doing that yet.
Reminds me matching peptide sequences at the mass-spectrometry level in proteomics (Disclaimer: used to work at GeneBio).
Re: (Score:1)
Re: (Score:3)
It's already done:
https://en.wikipedia.org/wiki/... [wikipedia.org]
Misuse around the corner (Score:2)
Employers will use such under the table to screen candidates for medical and/or genetic problems. I've worked for slimebags who would happily cheat at anything to gain an edge.
Re: (Score:2)
Should I add this to the ACA?
usb dna reader (Score:2)
Assembly and errors (Score:2)
There are two plagues in current WGS: errors in sequence: frameshifts on monomer runs, flaky stop codons in the middle of ORFs etc, and problem of assmbly of short reads in repeated sequences.
This method helps the second problem.
Errors in sequence can be minimized by doing things several times.
The Back Story To 454 (Score:1)
Someone mentioned 454?
To the best of my knowledge, 454 was a small company on the East Coast, maybe New Hampshire? They were acquired by Roche; the whole operation was moved west.
Did I say the whole operation? Well, they picked and chose who they wanted and who they didn't. In the case of the IT department, they brought exactly one guy west, and, I infer, laid off everyone else.
I came in as a contractor - I gathered the impression that part of the deal involved a two-week-long, all-expenses-paid vacation in
Re: (Score:2)
Or just Cassava-killing viruses:
How a TED Fellow is working to save African cassava from whiteflies [ted.com]
Re: (Score:2)
The MinION is really good at finding structural variants (i.e. large-scale changes in DNA sequence), but not so good for single point variants (accuracy for single base-called sequences is 85-95%, getting to about 99% in consensus; accuracy is much higher at the signal level, but there are no well-developed programs that do variant matching/detection at the signal level).
I try to encourage people to use the first $1000 for a pilot run, just to see if the MinION is suitable for what they want.
Finding causal