DNA data storage holds the promise of putting huge amounts of information into a test tube — but who wants to carry test tubes around a data center all day?
Researchers from Microsoft ahd the University of Washington are working on a better way: a completely automated system that can turn digital bits into coded DNA molecules for storage, and turn those molecules back into bits when needed.
They used their proof-of-concept system, described in a paper published today in Nature Scientific Reports, to encode the word “hello” in strands of DNA and then read it out. That may sound like a ridiculously simple task, but it served to show that the system works.
“We have conviction that DNA molecules are good candidates for data storage. But we are, at heart, computer architects. We really want to figure out what a future computer could look like,” Luis Ceze, a professor at UW’s Paul G. Allen School of Computer Science and Engineering, told GeekWire. “What’s exciting for us here is that It’s one step toward showing a computer system that has a molecular component and an electronic component.”
The mechanism for DNA data storage is similar to the way the DNA in our cells encodes genetic information: Instead of using electronic ones and zeros, the encoding system translates data into DNA base pairs, using the chemical “letters” for adenine, cytosine, guanine and thymine (A, C, G, T). “Hello,” for example, could be coded into the chemical string TCAACATGATGAGTA.
It’s important to note that the custom-made molecule doesn’t do anything genetically. Rather, the system merely uses the chemicals in DNA as code.
“There are no cells, no organisms,” said Microsoft principal researcher Karin Strauss.
The method dramatically increases the density of data storage. Theoretically, you could store a billion billion bytes of data (known as an exabyte) in a cubic inch of DNA, Strauss says.
In past experiments, the Microsoft-UW team has used DNA to encode files ranging from historical texts to cat pictures to a high-definition OK Go music video. UW’s Molecular Information Systems Laboratory even has a “Memories in DNA” website where you can upload your own files for DNA storage.
But that work involved a lot of manual steps to figure out the code, send an order to get the molecules synthesized, wait for the DNA to come back in the mail and then run the experiments. Because so much handling was involved, there were lots of opportunities to make mistakes. That would never fly in a commercial setting.
“You can’t have a bunch of people running around a data center with pipettes — it’s too prone to human error, it’s too costly and the footprint would be too large,” study lead author Chris Takahashi, senior research scientist at the Allen School, explained in a Microsoft blog posting. That’s why an automated system is a big deal.
The system’s software translates digital code into DNA code. That code is then automatically sent to a synthesizer that combines the required chemicals and liquids, in just the right order and proportions, and then spits out the custom-made DNA molecules into a storage vessel.
To read out the data, the DNA is drawn into an apparatus that adds chemicals and pushes them through a nanopore DNA sequencing machine. The sequence is automatically converted into the ones and zeros of digital data.
Ceze said the procedure still took 12 to 16 hours, but the elapsed time wasn’t the point of this experiment. Rather, the point was to show that an automated system could do the work reliably from start to finish.
The Microsoft-UW team has also created a programmable system that can move droplets of fluid around on a digital microfluidic device dubbed PurpleDrop . The operating system, known as Puddle, can be used to issue commands for a microfluidic system, much as a more conventional operating system like Linux can issue commands for an electronic computing system.
Here’s a sample of Puddle code:
a = input(substance_A) b = input(substance_B) ab = mix(a, b) while get_pH(ab) 7: heat(ab) acidify(ab)
“What’s great about this system is that if we wanted to replace one of the parts with something new or better or faster, we can just plug that in,” Microsoft researcher Bichlien Nguyen said.
Eventually, a next-generation DNA data storage system could be combined with devices like PurpleDrop and software like Puddle to create a computer environment based on microfluidics instead of electronics. Ceze said that would probably lead to hybrid computer systems that blend the processing power of electronic computing with the data storage density of DNA.
“Our vision for using molecules is for applications that have a very large of data,” he said. “The kind of computing that we are exploring is pattern-matching and approximate search. If you have a large collection of images and video, how do you find similar images, how do you find similar videos?”
Ceze and his colleagues already have demonstrated how DNA-based computing can “fish” through huge databases for images that match a given query. That kind of capability is something that the Pentagon’s Defense Advanced Research Projects Agency, or DARPA, is very interested in developing.
Also this week, researchers at Caltech and the University of California at Davis published a paper describing a data processing system that uses self-assembling DNA molecules to run algorithms. “It’s super-interesting,” Ceze said. “It allows you to do computation at the molecular scale … but it’s not really about processing large amounts of data, which is our goal.”
DNA-based computer systems aren’t likely to show up at Best Buy anytime soon.
“We’re really imagining this being deployed in the cloud. … The scenario that we see is replacing parts of a larger-scale system that sits in a data center with system components that use molecular data storage and molecular data search,” Ceze said.
Strauss isn’t willing to predict how long it will take to add DNA to Microsoft Azure, but she’s confident that Microsoft and UW will do what it takes to turn the experiment into a product.
“We have a very special team here,” she said. “We’ve very lucky to be in an environment where people are willing to make bets and innovate.”
Takahashi, Nguyen, Strauss and Ceze are co-authors of the open-access study in Nature Scientific Reports, “Demonstration of End-to-End Automation of DNA Data Storage.”
Update for 11:21 a.m. PT March 21: We’ve tweaked this report to reflect the theoretical data density estimate for DNA storage more accurately, and also revised a reference to the software used for the experiment described in the research paper.