Random ramblings about some random stuff, and things; but more stuff than things -- all in a mesmerizing and kaleidoscopic soapbox-like flow of words.
This is a good example of the kind of paper we are probably going to see more and more often in the future:
Gene discovery using massively parallel pyrosequencing to develop ESTs for the flesh fly Sarcophaga crassipalpishttp://www.ncbi.nlm.nih.gov/pubmed/19454017Get (a) a certain amount of RNAseq reads for your species "X", (b) build as many full-length cDNAs as possible from the fragments and (c) compare against close species in terms of:
- New interesting cDNAs that don't have hits against existing public cDNAs -- What do they do?
- Expression patterns -- Are these different to the patterns in other close species?
- Protein coding evolution -- run pairwise dNdS against closest genome or tree-based dNdS against anĀ existing phylogeny [1,2] -- Does anything show up in a Gene Ontology enrichment analysis?
Then the data is published and stored in a publicly available database, and can be added to the pool to compare against for the next project. Iterate :-)
It used to be gene-by-gene sequencing and it's now transcriptome by transcriptome sequencing. There are still sequence error and sequence coverage issues: one of my first scientific mentors, Prof. Montserrat Aguade was one of the first to do gene sequencing on the Adh gene in
Drosophila when doing her PhD in Harvard. People then extended Adh sequencing and analysis to other
Drosophila species, then other clades, then other genes, then some gene families like odorant binding proteins for a bunch of
Drosophila species or populations, or gene pathways like the insulin pathway, etc.
But now we have a much broader picture with a rather complete transcriptome. And most of the sequencing issues are going to be corrected across the phylogeny in pretty much the same way as allele imputation is filling the gaps at the population genomics level (e.g. 1000 Genomes Project).
I am very excited about all this!
Labels: genomics, nextgen sequencing, ramblings