Metagenome Sequence Simulators

An article from the Huson's Group at Tübingen University has just came out in the Open Access (and scientific publishing innovator) journal PLoS ONE, describing MetaSim, a software to produce artificial or syntetic or in silico metagenomes out of a selection of completely sequenced genomes.

This is just "heaven-sent" for me since I've been working on a set of syntetic metagenomes for the past two months, and will be happy to use this software first hand like... today. It seems that the software not only lets you choose the source genomes from a phylogenetic tree (figures reproduced here from the original article al PLoS ONE thanks to the Creative Commons License), but also choose from three different type of sequencing technology output (Sanger, 454 and Illumina) and generate theorethical metagenome.

This is the continuation of a very important change in genomic sciences, moving from experiments far too expensive or long to be replicated and hence out of hard statistical comparision, to null-model based in silico genomic analysis.

The first effort to analyze the true scope of metagenomic analysis was presented by Kostas Mavrommatis and others from the Genome Biology group at JGI (unfortunately published in an non-OA journal), where they produced three simulated metagenomes of contrasting complexity to asses assembly, gene prediction and annotation (SPOILER: the best combination assesed was Arachne assembler with Fgenesb predictor and PhyloPhytia for binning, and BLAST "performed poorly" as usual). This work also produced a database for the Fidelity of Analysis of Metagenomic Samples (FAMeS), a great effort to standarize metagenomic analysis software. A great alternative is ProxyGene annotation, as reported by the Markowitz group.

I'll play a little with the software and post some of my impressions here... and maybe in the original PLoS ONE webpage since it is totally open to post-publication review!!!!

You can download MetaSim at Huson's Labpage!!!

Novel Justice on HIV from Nobel Prize

Well I might be a bit late for this post but... The Nobel Prize for Physiology or Medicine 2008 went viral and was granted this Monday to three scientists from the viral research line: 1/2 prize was granted to Harald zur Hausen, for discovering human papilloma virus as a cause of cervical cancer; and another 1/2 (that is, 1/4 each) to Françoise Barré-Sinoussi and Luc Montagnier, for the discovery of the HIV virus.

Now, it's great that the Nobel recognizes the outstanding labor of researchers around the world. And I won't go into how this turns into courth
ish or royalty-like in science. The really GREAT thing is that this could be read like the final word on the "controversy" around who really had the credit for the discovery.

That is, Montagnier's lab had already reported in Science (with Barré-Sinoussi as a first author) the isolation and initial characterization as a T-lymphotrophic retrovirus from a single AIDS patient. They also linked it (or the family it belonged to) to AIDS.

Robert Gallo (the discoverer of the first human retrovirus in 1980) had been searching for the viral causative agent of AIDS for a while, and wanted to link it to his already discovered HTLV virus. He had already assigned multiple functions to this virus and wanted it to be also the cause of AIDS. He actually reported the dicovery of his so-called HTLV-III virus as the causative agent of AIDS in a Science special issue. He basically reports the methodology to study HTLV-III viruses, the presence of the virus in AIDS patients, and the presence of antibodies against the virus in AIDS patients.

So, what's the problem with this? Well, it started when Gallo asked Montagnier some samples and data. Then, during Gallo's press conference, he used an image from Montagnier's virus. Just after the press conference, he filed a US patent on the detection tests. The French complained (!) and led to a large and mediatic dispute that was partially settled when Reagan and Chirac agreed on sharing the credit AND the money...

In 1993, an article by Sheng-Yung Chang and collaborators analized the archived samples from both l'Institut Pasteur and the Laboratory for Tumor Cell Biology, and found that out of six HIV variants found in Gallo's lab, none of them was similar to the one he isolated. In contrast, they were identical to the french isolate. And though they politely described it as "a contamination", it seemed quite clear (at least to me) that Gallo couldn't find the virus he wanted to find in his own samples, and so took the french virus and turned out to be the one he was looking for.

Recently in 2002, they tried to settle things up (and clean a stain in the history of science on the "mine's bigger" credit) by a collaborative publication by both Gallo and Montagnier. I don't think the peace-making settlement received as much attention as the dispute, as usual.

So, 25 years and an international agreement later, the French are acknowledged by the elite of the scientific community as the discoverers. On interview, Montagnier and Barré-Sinoussi credit Gallo. Diplomatically, Gallo congratulated the french.

Hope all the people in Africa benefit from this tangled web of credits and money.

(And before anyone makes an attack on this... 1) Without Dr. Gallo's research, no HIV virus could have been linked to AIDS; 2) HIV isolation and discovery's credit goes without doubt to Montagnier's; 3) As persons, I equally dislike both scientists, specially after quarreling about who would earn the most out of people's diseases).

Atheists of the world, come out of the closet!!!!

Here's a talk that Richard Dawkins gave a while ago (2002) during a TED conference. It is mainly about defending atheism. You can download it at the TED talks site. A wonderful quote:

"...teach your children evolution in biology class and they'll soon move on to drugs ... and sexual perversion..."

I would also love to recommend his new book, God's Delusion (check the wiki), where he makes his point on not being ashamed about being an atheist while also pointing out all the horrors fueled by religions (YES, with "s" of "plural). I might not totally agree with the rationale, but I definitively agree with the general idea. And... you gotta love the guy!

Hope there's some discussion on this...

Looking for the Maras Salterns in Peru, I came across with this... Any idea what the hell these numbers are? (you might need to zoom in a little)
Instructions: Zoom in with the "+" button three times, then move up wih the button 2 times. I don't understand why I cannot zoom the specific region.

And any idea of where on hell are the Maras Salterns?

On the Road...

I'm reading "On the Road" by Jack Kerouac (you can find the Penguin Classics' link in google here), the novel that defined the beat generation 50 years ago and, along with Howl and Naked Lunch, helped americans to begin to question themselves and liberalize a bit. Kerouac and Cassady are Sal Paradise and Dean Moriarty, while Carlo Marx and Old Bull Lee represent Allen Ginsberg and (my very favorite) William Burroughs. It's based on a (soulsearch)trip Jack Kerouac did along Neal Cassady across the U.S., defining the basic trama for most road movies. It was written on a single, continuous scroll of paper, single-spaced and without paragrah breaks. To me, it still has some points worth pointing out:

I've just returned from San Francisco (I actually bought the book at Bound Together, an anarchist collective bookstore on Haight St, recommended!)

"Over Oakland Bay Bridge I slept soundly for the first time since Denver; so I was rudely jolted in the bus station at Market and Fourth..."; "Weird bums (Mission and Third) asked me for dimes in the dawn"... I never got to ride the bridge, since the BART goes under the Bay. But after 50 years, there's lots of bums in Market St as well!!!

"And oh, that pan-fried chow mein flavored air that blew into my room from Chinatown, vying with the spaghetti sauces of North Beach, the soft-shelled crab of Fisherman's Wharf- nay, the ribs of Fillmore turning on spits! Throw in the Market Street chili beans, redhot, and french-fried potatoes of the Embarcadero wino night, and steamed clams from Sausalito across the bay, and that's my ah-dream of San Francisco." --- yum... it certainly is, but a nice burrito from The Mission is missing!

"Dean had a sweater wrapped around his ears to keep warm. He said we were a band of Arabs coming to blow New York." I know. Not very politically correct, but it gave me the creeps!

"Sure, baby, mañana." It was always mañana. For the next week that was all I heard - mañana, a lovely word that probably means heaven." great wisdom in this phrase. If you ever go to Mexico, be sure to learn the meaning of "ahorita"!

"They thought I was a Mexican, of course; an in a way I am" yeah... after picking cotton barehanded, falling in love with a single-mother chicana and dining a frijoles-only meal, yeah... he might have been... I just loved the Terry-the-Mexican-girl chapter.

Finally, here's a link to a google-map with Paradise's itinerary depicted on it. So... go get the book! (and read it, obviously!)

In some ways, you know, people that don't exist are much nicer than people that do.

-- Lewis Carroll

Bacillus coahuilensis : the genomical TexMex

After a long publication struggle, two articles from two close friends have finally been published: the description of novel species Bacillus coahuilensis by my former bacteriology teacher and former owner while doing my Social Service and actually the one to blame for my adscription to the lab I work in now, René Cerritos (a.k.a. Dr. Chapultepec) in the IJSEM Journal. The other is the publication of the complete genome sequence of the very same strain in PNAS by my former schoolmate, my former Represenant in the Universitary Council and beermate Luis Alcaraz (a.k.a. The Dude). Both are the product of a weird collaboration between the CINVESTAV and LANGEBIO in Iruapuato, the Texan universities of Rice and Houston and the institutes Biotechnology and Ecology in UNAM, where I'm at.

In short:

Cuatrociénegas Valley is in a 750 m basin above sea level in North Mexico, deep in the Chihuahuan Desert and formerly a coastal region during the Jurassic. It is characterized by the presence of many oligotrophic ponds in the middle of the desert supporting large bacterial communities, appearingly from a marine origin (as shown by Souza and Desnues), that have been studied by my labgroup leaded by Valeria Souza and Luis Eguiarte (the very same place where I'm conducting my Theses). Cerritos isolated many bacilli strains from one of the widest and shallowest pond (Churince's Laguna Grande) and found many moderately halophilic species (that tolerate slightly salty envirnoments). A novel aerobic strain (m44) belonged to a group of aquatic, moderatedly halophilic species (B .marisflavi, B. aquimaris, B. vietnamensis) , and could not grow on most sugar-contaning media (uncommon for the bacilli). The team in CINVESTAV sequenced the genome (leaded by Gaby Olmedo and Luis Herrera-Estrella) and Alcaraz anotated it. He also conducted most of the sequence analises, with some help of Siefert from Rice University, Putonti from the UofH and me, during our stay in Houston a year ago. The genome turned out to be the smallest genome within bacilli with 3.35 Mbp with many mobile elements.

The most important feature of B. coahuilensis is that this is the second mexican microbial genome sequenced to date (the two bacteria genomes sequenced in Mexico are Rhizobium etli by the CCG and this), but whose sequence has been analized in the light of ecology and evolution (remember Dobzhansky's maxima?), that is, the adaptations of a formerly marine lineage to an oligotrophic lentic environment.

That is, the sequence pointed towards an adaptation to growth within low-phosphorus environment: namely the presence of sulfoquinovose synthase (sdq1) that synthesises sulfolipids to replace membrane phospholipids (which constitute around 30% of the total phosphorus), never reported before outside chloroplasts and unicellular cyanobacteria. The CINVESTAV team looked into the membrane and corroborated its sulfolipid composition.

The genome also codes for a sensory bacteriorrhodopsin gene, reminiscent of its marine origin where they are very abundant (see work by Venter and Rusch). The expression analyses proved it to be constitutive and not -light dependent, suggesting it to be an adaptation to shallow-water irradiance exposure.

Analysing the enconded transmembrane importers is a good way to analyse what the organism is uptaking for the environment, that is, it's "feeding-habits". The family of Iron-Siderophore importers is overrerpresented in B. coahuilensis, a feature shared with other aquatic bacilli, suggesting that marine bacilli actively scavenge for iron. It also show a preference for the uptake of single aminoacids and not large polypeptides, with absolute requirement of 8 aa and partial of another 5, a feature shared by the aquatic, small-genome organism Minibacterium massiliensis.

This, taken together with the fact that it has the lowest number of genes involved in nitrogen cycle, together with the experimental evidence of being incapable of utilizing a wide variety of sugars, suggests that this organism is totally dependent of the rest of the community to live on, and has evolved from a primitive bacterial component of that community with specific adaptations for a novel environment.

I'm very proud of the product of this collaboration and expect to continue this way. And also very happy because from the moment of this publication on, The Dude is able to obtain its PhD!!!

My Geek Pride is hurt: BLOSUM matrices

BLOSUM (BLOcks of Amino Acid SUbstitution Matrix) are the canonical substitution matrices used for scoring protein sequence alignments. In essence, it calculates the relative frequencies of all aminoacids in each position within an alignment and assigned a probability to the substitution of a particular residue. BLOSUM matrices built with closely related sequences are more stringent and have high numbers (BLOSUM80) indicating the percentage similarity allowed to include a sequence in the matrix (in the latter case, all proteins share at least 80% sequence identity).

BLOSUM matrices were developed in 1992 by Henikoff and Henikoff and since then have been extensively used in all analyses involving protein sequences...

and then, here comes he "AAARRGHHHH!!!"

Styczynski et al (2008) were killing their time looking at the evolution of the BLOCKS database and found the unthinkable.... an error in the source code for the algorithm that calculates de BLOSUM matrices!!!! that means... the results obtained with the available BLOSUM matrices differ significantly from the expected algorithm from Henikoff & Henikoff... merde!

Weirdest thing of all.. when corrected and tested back for the use of the matrices in database sequence search, it turned out that the "wrong" matrices performed much better in retrieving protein homologs than the "corrected" matrices.

Fortunately, it seems that though the difference is statistically significative, it is not big. That means, we haven't fucked it up so bad.

Epilogue to the blosum...

1) 16 years of extensive usage doesn't mean it is RIGHT.

2) how come that no one, ever, in 16 years, ever noticed this difference!!! THAT is what happens with dogma... when you take anything from granted

3) messing things up is not always THAT bad...

4) I didn't understand from the article if they proposed that the matrices were corrected even if they performed worse...

5) I would expect to see a huge ocean of erratas everywhere because "when using the revised blosum matrices... our results from the past ten years have completely changed!!"

Bound to...

Diving into my ipod, I rediscovered what I think is one of the best breaking up songs ever. I'm not in that mood now, but I keep recognizing it. So, this post goes dedicated to all those girls that actually are brave enough to say 'I'm through'... I mean, there must be some!

Lotharingie... la breve

"Uno cree que siempre ha sido así, pero no... no todo lo es todo. Aquello era vastísimo, y aunque no todos se asumían como francos, la dominancia carolingia era indiscutible. Siempre había sucedido así, los grandes imperios se resquebrajaban hasta minúsculos territorios. Así pasó con Cárthago, el Senatus Populusque Romanus de Trajan, el Khanato de Ögedei y el Califato Umyyad... así pasaría después con la dinastía Qing, la Totius Hispaniae de los Habsburgo, el Imperio Napoleónico, el Brazil de Braganza y la Soyuz Sovetskikh... así pasará con la Rossiyskaya transcontinental, la Francia de Ultramar, el paternalismo neocolonialista de la Norteamérica intervencionista y el mayor imperio en tiempo y extensión: el United Kingdom del imperio británico.

El entonces imperio franco de Charlemagne había sido dividido por primera vez burdamente por Louis I (el primero de una veintena de luises hasta ahora
) en imperios del Este y del Oeste. Lothair llevaba ya diez años defendiendo ferozmente su derecho legítimo sobre los territorios Francos. La última guerra civil (que entonces respondía a un pleito entre hermanos) culminó esa noche del 843 con el Tratado de Verodunum. Esa noche a la orilla del Meuse Lothair recordaba la figura de su abuelo, aún abatido al fondo del valle por el Juramento de Strazburg. Así nacía la Lotaringia, Lotharii regnum, bajo la tradicion divisionista hereditaria de los carolingios.

La Lotharingie sería recordada como el reino entre el Rhin y el Rhône, aunque más adecuado sería medirla entre la costa frisa del Mar del Norte a Roma. Así, con esa división entre el imperio Franco del Este y del Oeste, la Lotharingie iniciaría una larga disputa entre francos y alemanes que parece terminar apenas en la segunda mitad del siglo XX. Además resultaba ser el primer reino que no recibía su nombre del idioma que hablaban sus pobladores (Francs, Alemanii) puesto que comprendía un amalgama cultural y linguístico.

La duración del magno reino lotaringio fue ridículamente corto. Una del 855 presenciamos cómo el gran Lothair ha abdicado a la corona lotaringia para recluírse en la abadía de Prüm. Poco antes de tomar los votos, repartió el reino entre sus tres hijos. La Lotharingie queda reducida de la Frisia a la Suiza, bajo el reinado de Lothair II. Ésta divisó
n tampoco duraría mucho. Lothair moriría ese mismo año, Lothair II moriría en 869 sin herederos legítimos, lo que conduciría a la repartición territorial entre los hermanos de Lothair I y la consecuente desaparición del reino de Lotharingie con el Tratado de Meerssen... aunque en realidad el territorio norte se encontraba ya bajo el control vikingo danés."

La región es interesante...

más que nada, la aparición y desaparción de Las Cosas Importantísimas me recuerda también la historia de Plasmogenia... y no sin cierto temor me parece que la genómica le da un aire...

insisto, no todo lo es todo

La tektonik

One of the weirdest things I saw in France... a "new" dance style á lá techno-pop that reminds me a LOT of the early 90's... the rediscovery of Michael Jackson's Moonwalker maybe? if fashion has already made a complete turn and rap/dance/moonwalker is cool again... maybe we have a tiny hope of seeing a grunge retro-moment soon?

I find it good for gymnastics... and I think it's cool to see such self-assurance in teens!