castanyes blaves

Random ramblings about some random stuff, and things; but more stuff than things -- all in a mesmerizing and kaleidoscopic soapbox-like flow of words.

5/26/2009

Ubuntu trick -- how to reset evolution email

gconftool-2 --recursive-unset /apps/evolution

Labels: ramblings

posted by avilella # 8:54 AM 0 Comments

5/21/2009

Gene by gene turns genome-by-genome

This is a good example of the kind of paper we are probably going to see more and more often in the future:

Gene discovery using massively parallel pyrosequencing to develop ESTs for the flesh fly Sarcophaga crassipalpis
http://www.ncbi.nlm.nih.gov/pubmed/19454017

Get (a) a certain amount of RNAseq reads for your species "X", (b) build as many full-length cDNAs as possible from the fragments and (c) compare against close species in terms of:

New interesting cDNAs that don't have hits against existing public cDNAs -- What do they do?
Expression patterns -- Are these different to the patterns in other close species?
Protein coding evolution -- run pairwise dNdS against closest genome or tree-based dNdS against an existing phylogeny [1,2] -- Does anything show up in a Gene Ontology enrichment analysis?

Then the data is published and stored in a publicly available database, and can be added to the pool to compare against for the next project. Iterate :-)

It used to be gene-by-gene sequencing and it's now transcriptome by transcriptome sequencing. There are still sequence error and sequence coverage issues: one of my first scientific mentors, Prof. Montserrat Aguade was one of the first to do gene sequencing on the Adh gene in Drosophila when doing her PhD in Harvard. People then extended Adh sequencing and analysis to other Drosophila species, then other clades, then other genes, then some gene families like odorant binding proteins for a bunch of Drosophila species or populations, or gene pathways like the insulin pathway, etc.

But now we have a much broader picture with a rather complete transcriptome. And most of the sequencing issues are going to be corrected across the phylogeny in pretty much the same way as allele imputation is filling the gaps at the population genomics level (e.g. 1000 Genomes Project).

I am very excited about all this!

Labels: genomics, nextgen sequencing, ramblings

posted by avilella # 4:02 PM 0 Comments

5/11/2009

Anolis carolinensis: First reptile in Ensembl -- Amonida Sadissa

Very few anolis proteins and cDNAs, many more ESTs.
Used Uniprot PE Evidence ranking to generate transcript models with genewise
Different parameters for Exonerate, including exhaustive option for cDNAs, including 31000 chicken set (which wasn't very useful in the end)
Chris Ponting group provided extra models and kill list to rename some retrotransposons and pseudogenes
Manually looked at some of the EST genes in Chris' list to reincorporate them in the main db

Word of advise to all Genome Sequencing Centers out there: now that RNAseq is cheap and powerful, please allocate some of your budget for that instead of spending all in genomic sequencing. Contact Sanger people for PCR-free sample preparation protocols, which makes a huge difference in terms of avoiding duplicity.

Labels: ensembl, gene prediction, scientific talk

posted by avilella # 10:33 AM 0 Comments

5/07/2009

FC Barcelona 1991 and 2009 -- find the similarities

http://www.youtube.com/watch?v=6OHYAMG5RTk (jump to 1:00)
http://www.youtube.com/watch?v=1-4NpWO4ObU

The guy who jumps to celebrate with the wet coat was a very young Guardiola as a player, yesterday he jumped to celebrate at the same spot as a coach... now in a suit and with much less hair...

Labels: ramblings

posted by avilella # 10:58 AM 0 Comments

5/05/2009

Drug combinations, gene combinations and cancer -- Sven Nelander

First seminar of the Systems Biology series at the EBI. This series
starts with a strong focus on the modeling side of Systems Biology,
but with the idea of extending it to other subfields.

Title: Drug combinations, gene combinations and cancer
Speaker: Dr Sven Nelander
Affiliation: Goetebourg University
Date & Time: Tuesday 5th May 2009; 14.00-15.00
Location: C202-3, Shared facilities
Host: Mikhail Spivakov

There is no rational combination theory for different anticancer drugs
so far. One anticancer drug for one step in the pathway, but no
interrelations described.

Increasing number of genotype to phenotype pairs of data sets: what is
the system in the middle?

TCGA with 200 ovarian tumors, first data released last week. Amazing
data production and integrative bioinformatics, but space for more
modeling.

Example: CoPIA

CNV profiles -- transcriptional network -- final mRNA profiles

Now we have 10 million datapoints that with a fully automated
procedure give a testable hypotheses: 3 of the hub (pleiotropic) genes
are not previously implicated in glioma. GO enrichment analysis makes
sense.

PhD student - Theresia Dahl
Peter Gennemark -- mathematical models
Ulrike Nuber
Chris Sander -- old boss
Linda Karlsson-Lindahl
Debora Marks
Niki Schultz
Bodil Nordlander -- now testing one of the new hub genes

Labels: EBI Systems Biology series, scientific talk

posted by avilella # 2:46 PM 0 Comments

5/01/2009

NCBI SRA blastn service

Never easier before to check your sequence against the NCBI Short Read Archive database:

NCBI SRA BLAST

First thoughts:

Transcriptome coverage is hugely biased to the 3' end (or 5' depending on library preparation). A lot more than I suspected.
Would be great to do queries for phylogenetic subclades: e.g. my human sequence against all SRA data for primates.
A lot of the 454 data has homopolymer issues, mostly TTT[...]TTTs but also some others:

Query  465  GGGCCTTGACAAAGTGTAAACCGCATGGATGGGCTTCCCC-AAGGATTTATTGACATTGC  523<br /><font color="#ff0000"><b>Sbjct</b></font>  249  ........................................<font color="#ff0000"><b>C</b></font>...................  190<br /><br /></pre><ul><li>Some of these (unless they are real variations) get picked up as mismatches, some as indels:<pre>Query  1    CGGCAAGGTATGTGCGTGATTTTGGGCCCACGTGTATTTCCATTAATTTT-AAGCCGTAA  59<br /><font color="#ff0000"><b>Sbjct</b></font>  224  ..................................................<font color="#ff0000"><b>T</b></font>.........  165<br /><br />Query  60   TTGTCGTTTTTGGCGGTTTCGAGTTGAACTGCGTTAGTCCGTGCGCTGTTCGCAAGTGTG  119<br /><font color="#ff0000"><b>Sbjct</b></font>  164  ..........<font color="#ff0000"><b>C</b></font>.................................................  105<br /></pre></li></ul><pre><br />Query  61   TGTCGTTTTTGGCGGTTTCGAGTTGAACTGCGTTAGTCCGTGCGCTGTTCGCAAGTGTGC  120<br /><font color="#ff0000"><b>Sbjct</b></font>  118  .....<font color="#ff0000"><b>C</b></font>......................................................  177<br />Query  61   TGTCG-TTTTTGGCGGTTTCGAGTTGAACTGCGTTAGTCCGTGCGCTGTTCGCAAGTGTG  119<br /><font color="#ff0000"><b>Sbjct</b></font>  160  .....<font color="#ff0000"><b>T</b></font>...............................<font color="#ff0000"><b>-</b></font>......................  102<br />Query  61   TTAATTTTAAGCCGTAATTGTCGTTTTTGGCGGTTTCGAGTTGAACTGCGTTAGTCCGTG  120<br /><font color="#ff0000"><b>Sbjct</b></font>  181  ...........................<font color="#ff0000"><b>C</b></font>................................  122<br /><br /><br /><br />

Labels: genomics, nextgen sequencing

posted by avilella # 4:46 PM 0 Comments

A customized and versatile high-density genotyping array for the mouse -- Gary Churchill, The Jackson Laboratory, USA

Microarray based genotyping is an inexpensive and powerful tool to characterize genetic variation. High-density genotyping microarrays are commercially available for humans, economically important livestock and model organisms. However, they have not been available previously for the laboratory mouse, the premier mammalian model organism for biomedical research. Here we describe a custom high-density mouse genotyping array. The Mouse Diversity array was designed to capture the known genetic variation present in the laboratory mouse. It contains 623,124 SNPs distributed across the 19 mouse autosomes, the sex chromosomes, and the mitochondria with a median spacing of one SNP every 1,411 bp in the nuclear genome. The array also contains 916,269 invariant probes that are targeted to functional elements and regions of the genome known to harbor segmental duplications. The nature of the probes opens the door to a variety of novel applications including the characterization of copy number variation, allele specific gene expression and DNA methylation. Performance of the array based on call rate, replication and concordance with previously known genotypes is exceptional. The content-rich Mouse Diversity array provides a critical new tool for mouse genetics including the possibility of extending the successes of genome-wide association studies in humans to the mouse.

Funny comment -- This may be the last chip we do. The economics tell us the line is still below for chips, but sequencing is getting cheaper.

History: people catching mice, trading them, etc. Bottlenecks and all sort of artificial effects.

Diversity 11 classical inbred strains: some chromosomal regions with extremely low diversity. Is this petness? Longevity/Fecundity?

Problem with ascertainment bias: 623124 phylogenetically informative SNPs with known ascertainment

Collaborative Cross -- 8 M. musculus lines -- each contributing equally to the final "line". All inbred by now.

Phenotypes of intermediate CCs: e.g. voluntary exercise goes from 0 miles per day to 18 miles per day.

With 2 different inbred parents, we get complex children but with theoretically predictable phenotype, which means doing GWAS with phenotypes "a la carte".

Also, just by mixing genomes, creating diversity that was not in the parents: very useful novel phenotypic diversity.

Resolution is 7x better. CC will not be GWAS-level, but on the order of 10 genes or MB level resolution. Possibly gene level resolution in 10 generations. Always a mapping resolution panel and a validation panel, going back to the inbred lines.

Selection strictly by random number. Maintaining the diversity is good, lucky because natural selection already took a toll on the original strains.

Done in a way to maximise diversity, not to mimic human population structure. Deep reservoir of diversity for studying phenotypes.

A-Male/B-female crosses and compare to A-Female/B-Male in terms of sex-related epigenetics and other studies.

Some strains will die out along the way, but in 5-10 years should get a lot of info out of it.

Hyuna Yang a lot of array work.
David Aylor pop.struct.

Published mouse distances:

Mus cervicolor - Mus crociduroides = 7.60 MYA
Mus cervicolor - Mus haussa = 6.60 MYA
Mus cervicolor - Mus indutus = 6.60 MYA
Mus cervicolor - Mus mattheyi = 6.60 MYA
Mus cervicolor - Mus minutoides = 6.60 MYA
Mus cervicolor - Mus musculoides = 6.60 MYA
Mus cervicolor - Mus musculus = 4.80 MYA
Mus cervicolor - Mus pahari = 7.60 MYA
Mus cervicolor - Mus platythrix = 7.10 MYA
Mus cervicolor - Mus setulosus = 6.60 MYA
Mus cervicolor - Mus spretus = 4.80 MYA
Mus crociduroides - Mus haussa = 7.60 MYA
Mus crociduroides - Mus indutus = 7.60 MYA
Mus crociduroides - Mus mattheyi = 7.60 MYA
Mus crociduroides - Mus minutoides = 7.60 MYA
Mus crociduroides - Mus musculoides = 7.60 MYA
Mus crociduroides - Mus musculus = 7.60 MYA
Mus crociduroides - Mus pahari = 3.40 MYA
Mus crociduroides - Mus platythrix = 7.60 MYA
Mus crociduroides - Mus setulosus = 7.60 MYA
Mus crociduroides - Mus spretus = 7.60 MYA
Mus haussa - Mus indutus = 3.20 MYA
Mus haussa - Mus mattheyi = 2.60 MYA
Mus haussa - Mus minutoides = 3.20 MYA
Mus haussa - Mus musculoides = 3.20 MYA
Mus haussa - Mus musculus = 6.60 MYA
Mus haussa - Mus pahari = 7.60 MYA
Mus haussa - Mus platythrix = 7.10 MYA
Mus haussa - Mus setulosus = 4.00 MYA
Mus haussa - Mus spretus = 6.60 MYA
Mus indutus - Mus mattheyi = 3.20 MYA
Mus indutus - Mus minutoides = 2.50 MYA
Mus indutus - Mus musculoides = 2.50 MYA
Mus indutus - Mus musculus = 6.60 MYA
Mus indutus - Mus pahari = 7.60 MYA
Mus indutus - Mus platythrix = 7.10 MYA
Mus indutus - Mus setulosus = 4.00 MYA
Mus indutus - Mus spretus = 6.60 MYA
Mus mattheyi - Mus minutoides = 3.20 MYA
Mus mattheyi - Mus musculoides = 3.20 MYA
Mus mattheyi - Mus musculus = 6.60 MYA
Mus mattheyi - Mus pahari = 7.60 MYA
Mus mattheyi - Mus platythrix = 7.10 MYA
Mus mattheyi - Mus setulosus = 4.00 MYA
Mus mattheyi - Mus spretus = 6.60 MYA
Mus minutoides - Mus musculoides = 1.60 MYA
Mus minutoides - Mus musculus = 6.60 MYA
Mus minutoides - Mus pahari = 7.60 MYA
Mus minutoides - Mus platythrix = 7.10 MYA
Mus minutoides - Mus setulosus = 4.00 MYA
Mus minutoides - Mus spretus = 6.60 MYA
Mus musculoides - Mus musculus = 6.60 MYA
Mus musculoides - Mus pahari = 7.60 MYA
Mus musculoides - Mus platythrix = 7.10 MYA
Mus musculoides - Mus setulosus = 4.00 MYA
Mus musculoides - Mus spretus = 6.60 MYA
Mus musculus - Mus pahari = 7.60 MYA
Mus musculus - Mus platythrix = 7.10 MYA
Mus musculus - Mus setulosus = 6.60 MYA
Mus musculus - Mus spretus = 2.30 MYA
Mus pahari - Mus platythrix = 7.60 MYA
Mus pahari - Mus setulosus = 7.60 MYA
Mus pahari - Mus spretus = 7.60 MYA
Mus platythrix - Mus setulosus = 7.10 MYA
Mus platythrix - Mus spretus = 7.10 MYA
Mus setulosus - Mus spretus = 6.60 MYA

Labels: genomics, nextgen sequencing, scientific talk

posted by avilella # 2:43 PM 0 Comments