DNA – the next generation of data storage?

DNA – the next generation of data storage?

We generate a tremendous amount of data from an exponentially growing number of connected devices. The entire internet (and all the data we host on the cloud) is forecast to reach 16,000 exabytes by 2017 (an exabyte is 1 billion gigabytes). That’s a lot of warehouses & data storage centers [whether all of it should be stored is the topic of a very different post…and yes I DO need ALL of those puppy videos, thank you for asking!].

How we store our data has evolved significantly over the last several decades from magnetic tapes & disks, to solid-state flash drives, and perhaps eventually to DNA? Moore’s Law has enabled huge leaps and bounds in the cost-performance ratio of both computing power and storage capabilities, but have we reached the limits of conventional storage mediums? Especially for applications where very long term storage, and limited accessibility, is desired?

The idea that genetic material, or synthetic DNA, could be used to store information, beyond that of the genetic nature, is moving from the realm of sci-fi into that of reality. Welcome to the future where we can go from binary to biology codes, converting zeros and ones to C-G-A-T (Cytosine, Guanine, Adenine, and Thymine) – the building blocks of DNA.

But wait, how new is this idea really? Well – we already know that DNA is suitable for long-term storage as long as it’s stored appropriately in a dark, cool, and dry place. Scientists have been able to sequence the preserved DNA of long extinct animals such as the woolly mammoth. There have also been attempts to extract ancient DNA from fossils and other preserved remains (dinosaurs, mummies etc.) but that’s often of very low quality and/or prone to contamination – but still – the look back potential is in the order of tens of thousands of years as opposed to decades. Further, the idea that digital information could be stored at the molecular level, and potentially in DNA, was first conceptualized in the mid 1960s by a Soviet scientist named Mikhail Neiman.

A confluence of factors including advances in computing power, improvements in genetic sequencing and genomics, and falling costs of producing synthetic DNA have enabled the conceptualization of synthetic DNA as a form of data storage to move into reality.

Researchers at the University of Washington, in conjunction with Microsoft, recently published results detailing the development of a complete system to store digital data in DNA. DNA as a storage mechanism has significant efficiencies over conventional data storage technologies. What would normally reside in servers in a data warehouse the size of a big box store could fit in a sweet package the size of a sugar cube. As an added bonus, the DNA can also reliably preserve the encoded information for centuries. Current storage methods can degrade over time, and the technology required to retrieve the data has become obsolete. Good luck finding a device to read the data on that 3.5inch floppy disk you found lying around!

What’s really interesting is that we aren’t just talking about lines of text that can be encoded in the DNA building blocks A-C-T and -G. Researchers successfully encoded the digital data from image files into the nucleotide sequences of snippets of synthetic DNA. They were also able to retrieve that data and reconstruct those images! Recovering the data is actually relatively straightforward. Known DNA sequencing techniques can be used to read the information and convert it back to the original format.

Today this technology is very expensive – The European Bioinformatics Institute (EBI) estimates a cost of $12,400 to encode 1 mb of data. However, exponential decreases in costs of DNA synthesis and sequencing mean that it will be cost-competitive in the not so distant future.

While your smartphone won’t be encoding your latest selfie in DNA anytime soon, there are real applications for large data sets where long-term storage capability is critical. Think large archives where low levels of recall may be required, such as preservation of historical texts, duplications of data sets, government archives, and more.

Associate, Pangaea Ventures Ltd. Sarah is an environmental scientist and MBA who has been active in sustainability efforts for York and Dalhousie universities, as well as the City of Toronto's Environment Office.View Sarah Applebaum's profile on LinkedIn


  • No comments made yet. Be the first to submit a comment

Leave your comment

Guest Thursday, 21 November 2019