Service Provider Cloud

Microsoft Researching Storage Based on Biological DNA

SAN JOSE -- @Scale -- Microsoft is researching DNA-based storage, a technology that promises to compact a data center's worth of information into a space the size of a few sugar cubes.

DNA-based storage would require less space than other media, and consume less power -- important in the age of web-scale data centers. But the medium is really slow, with data retrieval times of about 10 MBytes per week.

"Most of this, by the way, is dominated by FedEx moving test tubes around," said Luis Ceze, a Microsoft researcher and a professor at the University of Washington, who presented on DNA storage during the keynote at Facebook's @Scale conference today.

Read time can be speeded up in some obvious ways -- putting everything in the same building, for example. And given how quickly DNA sequencing is advancing, Ceze believes 100Gbit/s read speeds might eventually be possible.

It helps that the data doesn't need to be retrieved perfectly (more on that in a moment).

The primary motivation for this research is to save power at web scale. But there are other factors, too. Every other storage mechanism is reaching its limits and will inevitably deteriorate. DNA offers the promise of more efficient storage that can last hundreds of thousands of years.

(By the way, all the DNA here is synthetic. It's not as if they're injecting mice with data.)

DNA is made up of combinations of four nucleotides: adenine, cytosine, guanine and thymine. So the translation from bits into nucleotides seems straightforward -- 00 could be "A," 01 could be "C," and so on.

Of course, it's not that easy. Repeating one nucleotide too many times -- such as C-C-C-C -- makes the sequence more fragile and harder to read, so Microsoft adds coding tricks to prevent such combinations. Long chains are prone to instability, so Microsoft puts only 150 nucleotides in a chain -- but that means adding codes to preserve the proper order of these chains.

And finally, DNA replication is inherently imperfect, so error-correcting codes go in there as well.

Reading the data involves DNA sequencing, but you can't exactly grab a nucleotide chain with tweezers. The approach is to use polymerase -- an enzyme used to sequence DNA and RNA molecules -- to make lots of copies of a strand of interest (that's another benefit to DNA storage: unlimited copies nearly for free) and sequence a bunch of them, finding a consensus about what the chain was supposed to be.

That brings up a point: This type of storage smashes the expectations of precision that we used to have with tapes and hard disks. That's OK, though, because software itself might give up some of that precision for the sake of saving energy.

Ceze referred to an area of study called approximate computing, where a processor's "thinking" can be made less thorough in exchange for consuming less power. It's the same way our brains work, he said; full attention takes more energy. This approach of accepting good-enough rather than perfect might be practical in some cases, "because most applications do not require perfect communication and storage accuracy," he said.

— Craig Matsumoto, Editor-in-Chief, Light Reading

mhhfive 9/6/2017 | 3:14:12 PM
Re: Had a chance The idea is not new, but perhaps the engineering challenge is becoming more practical by the minute. It's interesting that DNA is the choice of material but it might be problematic, too. What if we encode information and we unknowingly create a supervirus by accident among the gazillions of sequences that we're just storing...? :P

We might have to keep all this info in biological containment facilities -- which might increase the costs a bit.

Horizontal gene transfer is a thing, actually.
Ariella 9/6/2017 | 1:53:07 PM
Re: Had a chance I recall reading about the data storage potential in DNA years ago. I suppose it is now becoming more feasible. The possibility really just cries out movie plot in which crucial data is planted in a person who is abducted by the bad guys and can only be rescued by a farfetched plot to break into the building from the top and and put on masks that allow you to pass for the guys on the other side. 
JohnMason 9/3/2017 | 3:25:04 PM
Re: Had a chance Unless, of course, this isn't really basic research anymore, but an engineering problem of how to best use existing knowledge.
JohnMason 9/3/2017 | 3:19:52 PM
Had a chance I had a chance to look at Dr. Ceze's 2016 paper on the subject. The breadth and potential of DNA data storage remind me of the kind of basic research that was done at the old Bell Labs. I wonder how long Microsoft will commit to this line of research.
Sign In