How DNA Can Be Used to Store Digital Data

DNA data storage involves encoding and decoding binary data into and from artificially created DNA strands. This approach uses DNA molecules to store digital information, offering an alternative to conventional methods like hard drives and magnetic tapes. It leverages DNA’s inherent ability to carry vast amounts of genetic information. Researchers are exploring this technology as a solution to the increasing global demand for data storage.

Why Use DNA for Data Storage?

DNA offers advantages as a data storage medium due to its high density and longevity. One gram of DNA can theoretically store up to 215 petabytes (215 million gigabytes) of data, meaning all the world’s data could fit into a container smaller than a sugar cube. This compaction far surpasses traditional storage, where flash memory stores 1 bit in approximately 10 nanometers, while DNA can store 2 bits in 0.34 nanometers.

Beyond density, DNA exhibits high stability, with a half-life estimated to be over 500 years under proper storage conditions. This allows data to be preserved for thousands of years without significant degradation, making it suitable for long-term archival purposes. In contrast, magnetic tape, a common archival medium, has a lifespan of around 30 years, even in optimal conditions. Once data is encoded into DNA, it can be stored at room temperature without requiring continuous energy or cooling, which is an energy efficiency improvement over traditional data centers.

The Process of DNA Data Storage

Storing digital information in DNA involves several steps, beginning with encoding the data. Digital data, in binary (0s and 1s), is translated into the four nucleotide bases of DNA: adenine (A), guanine (G), cytosine (C), and thymine (T). This conversion often maps bits to specific nucleotide sequences, sometimes using methods like converting binary data to ternary (base 3) first to prevent repetitive sequences that can cause errors during synthesis.

Once encoded, DNA strands are artificially created through DNA synthesis or “writing.” This involves chemically assembling nucleotides in the precise order determined by the encoding algorithm. Companies like Twist Bioscience synthesize these encoded DNA sequences. The synthesized DNA can then be stored in various stable environments, such as being frozen in solution, as droplets, or on silicon chips, ideally in cool and dry conditions to prevent degradation.

Retrieving stored data involves “reading” the DNA through a process called sequencing, which determines the exact order of the nucleotide bases. Technologies like Illumina’s iSeq 100 or nanopore sequencing are used to read the DNA strands. Finally, the sequenced DNA information is fed into a decoding algorithm, which translates the nucleotide sequence back into its original digital binary format, recovering the stored file. Error-correcting codes, such as Reed-Solomon, are often incorporated during encoding and decoding to ensure data integrity, addressing potential errors that can occur during synthesis or sequencing.

Current Research and Applications

Current research in DNA data storage is advancing its capabilities. Scientists successfully encoded the entire 16 GB text of the English Wikipedia into synthetic DNA. Researchers have also developed custom DNA data writers capable of writing data at speeds of 1 megabit per second (Mbps).

North Carolina State University and Johns Hopkins University researchers recently developed a DNA-based system integrating data storage and computing functions. This system uses a unique soft polymer material called a dendricolloid, providing a high surface area for DNA deposition without losing density. This innovation allows for operations like copying, erasing, and rewriting data directly on the DNA, and has been tested to solve computational problems such as Sudoku puzzles. The technology shows promise for applications requiring secure, long-term data storage in sectors like healthcare, finance, and government.

The Road Ahead for DNA Storage

Despite its potential, DNA data storage faces several challenges for widespread adoption. A primary challenge is the high cost of DNA synthesis and sequencing. In 2020, the cost of DNA storage was estimated at about $3,500 per gigabyte (GB) for writing and $800 per GB for reading, which is substantially higher than traditional hard disk drives at approximately $0.02 per GB. Researchers anticipate these costs will decrease as the technology matures and scales up.

Another limitation is the slow speed of writing and reading data compared to electronic storage. Writing one GB of data to DNA can take several hours, while reading it back might take approximately 90 seconds per GB, much slower than hard disk drives which operate in milliseconds. While new technologies are being explored, such as light-based encoding and decoding, to increase speed, this remains an area of active development. Additionally, DNA is susceptible to errors during synthesis and sequencing, with an estimated error rate of about 0.03% per GB, necessitating robust error correction mechanisms to maintain data integrity. Scalability from laboratory proofs to industrial-scale implementation also presents a challenge, requiring rigorous system design for larger data volumes.