Assembling vertebrate genomes from short reads

Genome Biology logoNext generation sequencing (NGS) has heralded a new era in genomics, as the output of sequence data is now readily available for a cost that is five orders of magnitude cheaper than when the first human genome sequence was released. But one limitation of the data produced by the new sequencing technologies is that they are harder to assemble into a single contiguous sequence representing an entire chromosome, due to the short lengths of their reads and the relatively high error rates in identifying each base.

In a new article published in the March issue of Genome Biology, researchers from the The Genome Center at the Washington University School of Medicine wondered whether the shorter contiguous sequences that could be assembled with NGS reads could still be used for meaningful analysis of higher eukaryote genomes. To answer this question, they used two NGS technologies (Illumina and 454) to sequence the same sample of chicken DNA previously sequenced by the traditional Sanger method.

As they describe in their Genome Biology article, they found that, although – as expected – Sanger sequencing produced far longer contiguous sequences than Illumina or 454, the NGS reads were able to produce assemblies with a gene coverage as high as 93%. Further, the accuracy of the assembled sequence was the equal of that obtained with the Sanger method. The Illumina and 454 assemblies between them also included over 30 million base pairs absent from the Sanger assembly.

This chicken case study is timely in the light of the recent PNAS article on ALLPATHS-LG, a software tool demonstrated by its developers to generate NGS read assemblies of unprecedented quality. Together, these articles may prompt the genomics field to re-examine the maxim that NGS is not well suited to large genome assembly.

View the latest posts on the On Biology homepage