Thursday, May 06, 2010

Better the metagenome you know than the metagenome you don't...

ResearchBlogging.org
Morgan, J., Darling, A., & Eisen, J. (2010). Metagenomic Sequencing of an In Vitro-Simulated Microbial Community PLoS ONE, 5 (4) DOI: 10.1371/journal.pone.0010209

A new era for the design of metagenomic controls starts! Morgan et al. present the benchmarking of metagenomic tools using artificial "microbial communities" mixed up in the lab.

The Hook...
Metagenomics is a fancy name for what's actually a large and obscure toolbox of molecular biology procedures and computational algorithms that promises to help us in the understanding of whole, natural microbial communities. It is so exciting because it allows us to study organisms (bacteria and archaea specifically) that would otherwise remain unacknowledged because we cannot grow them in the lab. It also provides for the first time the opportunity to analyse whole natural communities, and not only sectors of it (like "granivorous community" or "photosynthetic guild"). The comparison of natural functional communities would help us understand a lot about how communities are assembled, how they evolve and change in time and how are they affected by external disturbances.

Having said that, we still lack the tools to analyse such large databases and the quality standards to produce and compare metagenomes. This happens each time a new technology appears, because there has been not enough time to try and experiment with it as to accurately know its flaws. This is even worse with metagenomics since no whole community has ever been studied and so we don't really know or even suppose how our data should look like. Here's where Morgan et al. come to rescue with a very neat approach.
The Setting...
Their logic is simple and clear: since we do not have any community whose composition is completely known, let's make one. So they retrieved ten different microorganisms from the culture collections whose genomes have already been sequenced, and prepared aliquots so that they would have the same number of cells from each organism. Then they mixed them up, extracted the whole community DNA with three different DNA-extraction protocols and then sequenced four metagenome databases (one was replicated with an alternative sequencing method).

The Bad...
Surprisingly, none of the sequenced metagenomes reflected the original composition of the community mix. This can be caused for a number of reasons: the size of a genome and the number of genome copies per cell affect the probability of sequencing; differences in cell wall and matrix thickness and composition could prevent efficient DNA extraction; specific DNA segments might be harder to clone and/or sequence... When they compared between metagenomes, they found that most differences were due to the type of DNA extraction utilized. That is, the same community will result in different metagenomes when different DNA extraction methods are used. This also means that metagenomes obtained with different DNA extraction protocols should not be compared. Ever.

It still puzzles me one thing: the love for BLAST. Even when they assigned each sequence to a specific organisms by "blasting" each read from the metagenomes to the ten complete genomes of the organisms in the mix, there's a large number of sequences that could not be mapped back to the source organism. Sure, there seems to be a phage infecting some cultures that was not in the sequenced genome. But it is surprising that there was a large number of reads that actually hit a Bacillus, when there were five Lactobacillus strains in the mix. My point is that BLAST is a very poor algorithm to recover precise hits, and the short lenght of the sequences reduce the taxonomic resolution attainable by it, misleading the results. If we add a really biased and incomplete reference database, it ends up being almost impossible to accurately define the genomic composition of a natural community. This also calls for better and more precise methods of assigning or binning of metagenomic sequences.

The Good...
Since "all different" is not a very hopeful result, they prepared three replicas of each DNA extraction method so to say which of them showed a lower variability and hence would be more reliable. It turned out that the DNA kit extraction protocol has a larger repeatability, most likely because there's a lower variation in reagent concentrations.
And then again, although there's large variability inter- and intra- protocol, there are no radical changes in the relative abundance of each organism. That is, there is no change from the dominance of one organism to another. Although they're still not reflecting the "true" abundances.

The Ugly...
One of the samples was sequenced twice, one time with classic Sanger capillary sequencing and the other with pyrosequencing. This helped them to show that differences between extraction methods are far greater than differences between sequencing platforms. Still I sensed a bit of anti-pyrosequencing in it. Sure, pyrosequencing gives shorter reads and so a larger amount of reads will be unassignable to reference organisms (at least by BLAST standards). But I'm not sure that these results actually demonstrate that cloning-bias is not so important. It would be necessary to repeat each sample with pyrosequencing to demonstrate this. And it would be also great to replicate the same example as they did with Sanger. This would actually show how much of this variability is really attributable to DNA extraction and how much of it is attributable to cloning bias.

The Finale...
We desperately need more research like this, that would help us not only to standarize the technology behind metagenomics but also allows to build the robust theoretical framework that metagenomics (and community ecology in general) is so in need. This kind of work should be complemented with in-silico modelations of metagenomes (like that in Mavrommatis et al. 2007), and also with the development of better algorithms to cluster and assign taxonomy to sequenced reads.
After all the metagenomic hype, we still do not know the true structure and composition of sequenced microbial communites. But we do know a lot more than before.

2 Comments:

Luis Eguiarte said...

BIEN ESTA JUNTANDO MAS HITS QUE OTRAS COSAS!

Daemios said...

Luis! Y cómo sabes!?!?!?