Beijing, China
July 26, 2011
The world's largest DNA-sequencing outfit is looking to the clouds. The BGI (formerly the Beijing Genomics Institute) this month announced plans to roll out its cloud computing capabilities, which it hopes will help it to dominate bioinformatics in the same way it does the world of sequencing (see Nature 464, 22–24; 2010).
Research labs often lack the storage, computing power and technical know-how to cope with the current deluge of genomic data. Services such as the BGI's offer a solution, says Cliff Reid, chief executive of Complete Genomics in Mountain View, California, a competitor of the BGI that sequences and analyses human genomes. "The cloud is going to be central in the entire world of DNA sequencing."
Cloud computing marshals the power of a network of computers that can be accessed remotely to store and analyse data. Creating a bespoke cloud network to boost the data-crunching power of the Shenzhen-based BGI was a logical move, says Magic Fang, director of the institute's bioinformatics centre. "Competition in sequencing service is becoming more and more intense," he notes. Meanwhile, the flood of genome data is outstripping scientists' capacity to handle it, as next-generation sequencing techniques drive down the price of reading genomes even faster than the cost of data storage is falling (see 'DNA and chips').
Increasingly, Fang says, scientists who turn to the BGI for its sequencing muscle are also asking for help processing and analysing the data that its next-generation sequencers — which number more than 150 — churn out. By developing its own cloud service, the institute hopes to lure high-profile projects and collaborators, even if they never set foot in China.
Xueping Quan, a bioinformatician at Imperial College London, made the trip to Shenzhen with raw sequencing data from a plant species whose DNA sequence was too large and complex for her team to assemble into a complete genome. The BGI used its cloud to stitch the data together within a month, and now Quan plans to work with the institute to produce additional genomic data and create a more thorough genome sequence.
At present, the principal use of the BGI's cloud is in the assembly of genomes such as Quan's plant. However, Sifei He, who manages its cloud service, says it will be able to run other bioinformatics software, such as programmes to scour genomes for single-nucleotide variants, or to find places where large chunks of the genome are duplicated or missing.
The BGI is not the only sequencing centre pursuing this strategy. Complete Genomics has been storing data for researchers on Amazon's well established Elastic Compute Cloud for nearly a year, says Reid. He expects that scientists will eventually analyse and store data entirely in such clouds, and only download the final results. Meanwhile, companies such as GenomeQuest of Westborough, Massachusetts — which inked a deal last year to provide bioinformatics to biotechnology giant Syngenta of Basel, Switzerland — and DNAnexus in Palo Alto, California, already run genome-analysis software on clouds.
But it is the BGI's combination of sheer sequencing muscle and in-house cloud computing that makes it stand out as a 'one-stop shop', says David Dooling, a bioinformatician at the Genome Institute at Washington University in St Louis, Missouri. "It certainly makes sense," he says. "As more and more people are sequencing and those people are less and less experienced with sequence analysis, you're going to see the proliferation of these vertically integrated solutions."
Paul Flicek, a bioinformatician at the European Bioinformatics Institute near Cambridge, UK, says that it's too early to tell whether the BGI's cloud will address the bioinformatics needs of naive users. But "if somebody can produce a cloud service that's ideal for bioinformatics and costs less than Amazon, there's a niche market there they could really capture", he says.