DNA Sequencing at 40: Past, Present and Future

“DNA is like a computer program but far, far more advanced than any software ever created.” - Bill Gates 

Today, DNA Sequencing has become one of the most commonly used techniques in the field of molecular biology. It has become an integral part of various experiments and procedures since its origin. DNA Sequencing has come a long way in the past 40 years, and it currently addresses a breadth of problems for which it has proven very effective. To understand its history and applications better, it is rather important to learn about the origin of the biopolymer - DNA. 

Discovery of Double-stranded DNA 

Figure 1: (Left to Right) A caricature of James Watson and Francis Crick holding the DNA, the double-stranded DNA, a silhouette of Rosalind Franklin

In the early 1950s, Rosalind Franklin performed X-ray crystallography experiments to capture the structure of DNA. Franklin's photographs were described as "the most beautiful X-ray photographs of any substance ever taken" by J.D.Bernal. Around the same time, James Watson and Francis Crick put together available information from various sources, which included X-ray images and model building techniques, and solved the most central question in molecular biology that had baffled other scientists for decades. 

But when did Sequencing start? 

Fred Sanger, Gilbert and Maxam built the initial blocks of DNA Sequencing. Sanger established the sequence of insulin, making it the first protein ever to be sequenced, followed by Gilbert and Maxam who deciphered the sequence of the lactose-repressor binding site. However, sequencing DNA was still a difficult task. Both these parties then individually came up with different methods of DNA Sequencing for which they were given the Nobel prize in 1980. Sanger’s protocol involved four extensions of a DNA primer which involved a chain-terminating nucleotide each. On the other hand, in Gilbert’s method for sequencing, four separate chemical reactions were set up to create base-specific partial cleavage. In both the above methods, we get to know the relative positions of nucleotides in the strand, thereby deciphering the entire sequence. Then came another technique, called Shotgun Sequencing, which involved chopping the DNA strand into multiple pieces, sequencing each strand individually, and then aligning them based on their overlap. 

Figure 2: (Left to Right) Sanger Sequencing protocol, Gilbert and Maxam method for Sequencing Reference - https://www.onlinebiologynotes.com/sangers-method-gene-sequencing/

Ever since these mechanisms came into existence, the amount of sequenced data generated grew exponentially! Hence, additional tools and central data repositories such as GenBank-DDBJ, NCBI, etc. were set up. BLAST, a search tool which allows one to align their DNA strand with a previously annotated one, has now proved extremely beneficial to the entire scientific community. Sanger sequencing machines, which use bots to perform the experiments for us, were developed later. How amazing, right?

Sequencing the entire human genome? Possible? 

Figure 3: (Left to Right) A figurative model depicting synthesis of Human Genome, Shotgun sequencing applied for the HGP. Reference - http://sitn.hms.harvard.edu/flash/2019/lessons-from-the-human-genome-project/

With all the available data and documentation, scientists next set out to sequence the entire human genome. This project was fragmented into multiple sub-projects, and each was assigned to a different country. The Human Genome Project (HGP), one of the most ambitious scientific projects ever undertaken, took 13 years to sequence the entire human genome, thereby accomplishing a monumental goal at the end of it. But wait! Every task needs competition. Doesn’t it? In parallel with the HGP, Celera promised to sequence the genome effectively and efficiently, but in three years. At the end of the game, the results were analysed, and it was concluded that both the HGP and Celera had similar performance rates. Hence, the competition was tied. 

Efforts on sequencing thenceforth were reduced by a large fraction, as the above techniques and the HGP laid the groundwork for thousands of scientific studies associating genomes and other problems. 

The Next-Generation is always smarter! 

Figure 4: (Left to Right) Depiction of Sequence by Synthesis, (right-top) depictive model of how DNA is bound to the flow cell, (right-bottom) fluorescence observed on the flow cell. Reference - https://binf.snipcademy.com/lessons/ngs-techniques/illumina-solexa

The invention of ‘Next-generation DNA Sequencing’ (NGS) marked a new era for the field of genomic analysis and documentation. NGS technologies provide higher throughput data for lower cost and enable population-scale genome research. Instead of carrying out one reaction in a tube, the DNA fragments are immobilised on a flow cell where the templates have common access to one reagent volume. While the DNA molecules are being synthesised, polymerase-mediated incorporation of fluorescently labelled nucleotides occurs, with each nucleotide being tagged with a different fluorescent dye. With the addition of nucleotides, the tags fluoresce which is captured by the machine, thereby enabling the sequencing of the DNA strand. This fascinating technique is also known as ‘Sequence by Synthesis’ (SBS) and is prevalent in all modern-day sequencing methods. 

Fancier than NGS? You got to be kidding! 

Around the time when NGS was ruling genomic studies, there was yet another revolutionary technique that came up. It had even higher throughput and was able to sequence larger strands of DNA within a few minutes! The first of this kind is called the PacBio - Single-Molecule Real-Time Sequencing technique, in which time and space complexity is way lower than that of any of the previous methods that we have reviewed. The DNA Polymerase enzyme is held in a fixed position, and the DNA strand moves around it and is sequenced while being synthesised. Now, if you had thought that any of the above techniques are groundbreaking and exhilarating, then wait! The Lord of the sequencers is yet to be revealed. The second of this kind is the most recent one - Nanopore Sequencing. Here’s how it works. A single-stranded DNA fragment is sent through a nanopore and, based on the electrostatic interaction of nucleotides with the pore, the sequencer records the plausible nucleotide that was 

encountered. The machine looks like a regular USB/pen-drive, weighs only 70 grams, and can sequence the entire human genome in less than a day!

How cool!

Applications of Sequencing 

For its first 25 years, the primary purpose of DNA Sequencing was the partial/complete sequencing of genomes. However, with advancements in the field, the range and scope of DNAseq applications go from Plank to Parsec. Today, DNA sequencing is applied in varied key areas marking itself as one of the most popularly used techniques in the fields of synthetic and molecular biology. De novo genome assembly requires DNAseq on a fundamental basis so that unknown stretches of DNA could either be sequenced first or matched with an already annotated sequence. Sometimes, when we lack confidence in previously available sequences, or in case we want to study the evolutionary perspectives of a gene, genome resequencing proves to be highly useful. It helps one identify mutations in particular fragments, or even reduce the error rates of the available sequence by resequencing and comparing with the ones in the repositories. Yet another important application of DNAseq is observed in clinical practice. Medications, diagnostics and treatments could be much improved if the sequence of the defective gene is available in certain diseases such as cancer. 

What future awaits us? 

“There are millions of species on earth (and far more extinct species), each with a genome waiting to be sequenced, as well as countless microbiomes and metagenomes.” With such a genetic diversity amongst species, it is rather important to analyse our pattern of evolution by sequencing each one’s genome and performing comparative studies. The nanopore sequencer will act as a real-time portable sensor which can analyse the sequence of the genome just like there are devices to analyse the pH or temperature of any solution. Other unconventional applications of DNAseq are to use DNA as a storage device by treating it as an alternate way of representing information. This, again, would require DNA sequencing. 

In conclusion, DNA sequencing is essential to a molecular biologist, just like how a microbiologist would perceive a microscope. The future certainly holds many more improvements and innovations with surprising and contemporary applications of this technique. So, stay tuned!

Written by

Sahana Gangadharan

Reference

Shendure, J., Balasubramanian, S., Church, G. et al. DNA sequencing at 40: past, present and future. Nature 550, 345–353 (2017). https://doi.org/10.1038/nature24286