castanyes blaves

Random ramblings about some random stuff, and things; but more stuff than things -- all in a mesmerizing and kaleidoscopic soapbox-like flow of words.



RNAseq in the worm -- Gary Williams

Illumina Short-read transcriptome data has the potential to help solve many problems with curating gene models and the genomic sequence in C. elegans. This is an initial look at the data and some examples of how it can be used.

So far, C. elegans gene predictions:

36% - fully confirmed by ESTs
48% - partially confirmed
14% - no transcript confirmation

RNAseq data -- different worms than the genome, so some polymorphisms expected -- 200bp inserts, 36bp paired end reads

MAQ or cross-match to genomic or transcript sequences

6137 new splice junctions (6% increase)

Jumped from 70000 to 98000 splice junctions.
3x as many polyA sites
80 possible new coding genes

V-shaped coverages -- validation against traces, then:
  • Detected sequencing error, correction needed for the reference
  • Detected alternative haplotype
Moving towards single-cell sequencing -- not sequencing in tiny cells but sequencing each cell in each developmental state in the worm. Moving towards RNA sequencing C. briggsae and C. remanei.

Updated gene builds will be given to other projects. Next Ensembl Metazoa comparative genomics build may already have the modENCODE-updated C. elegans and D. melanogaster builds.

Labels: , , ,



Peston on makein banks safe

BBC - Peston's Picks: Making banks safe
For what it's worth, there are two reasons why it might make sense to force our biggest and most complex banks to hold more capital than their smaller, simpler peers: if big super-banks have the privilege of knowing that we as taxpayers would always bail them out in a crisis, surely they've got to put in place treble protection against the risk that they'd call on us for such help; also the costs of holding the extra capital might encourage them to slim down and simplify their operations.



The Molecules and Mechanisms of Instinctive Behaviour in Mammals -- Darren Logan, The Scripps Research Institute, La Jolla USA


Social or behavioural disorders affect a quarter of individuals at some time during their lives however the molecules and mechanisms that mediate social cues, process their meaning, and initiate the corresponding behaviour are unknown. Instinctive social behaviours in mammals are thought to be largely promoted by pheromones: specialized olfactory cues secreted by one animal that directly influence the behaviour of another.  Here I will describe studies into two instinctive, olfactory mediated behaviours in mice, aggression and pup suckling. 

Our studies found that aggression is promoted by specific protein pheromones excreted in male urine. These activate specialized, finely tuned sensory neurons in the noses of other males, resulting in a robust aggressive behaviour in the recipient. Our genomic and functional characterization of the gene family encoding these pheromones reveals an extraordinary scope for information-coding. I will describe our recent efforts to elucidate their social significance using cellular and behavioural techniques.

Pup suckling is a behaviour that is found in all mammals and is thought to be promoted by pheromones emitted by the mother and detected by the infant. We found that newborn mice do use maternal odour cues to promote suckling but, in contrast to the aggression pheromones, these cues are not genetically predetermined to elicit behaviour. Instead, the cues are complex, variable and learned by pups around birth. Suckling is subsequently initiated when the pup recognizes the same odour pattern in the context of their mother's nipple. The sensory neurons that mediate this are not specialized and found in the noses of all mammals, including humans.

Together these studies demonstrate a diversity of mechanisms and molecules that underlie instinctive behaviours, and are a first step towards understanding the neural circuitry of social interaction.




Giving functional genomics a REST -- Alex Bruce;g=ENSG00000084093;r=4:57468799-57493097;t=ENST00000309042

interesting case where a small skipping exon generates an extra copy for the Znf-C2H2 domain.

REST is an essential vertebrate transcription factor with very diverse roles. It has an important role in regulatory secretory pathway. Independently confirmed by 2 other groups.

RE1 array used to identify misregulated REST target genes in diseases like Huntington's.

RE1 "half" sites. Canonical/Transfac/Discovered motifs.

Different evolutionary pressures over RE1 sites seem to be associating with different function subsets: common sites are less well conserved than unique sites. Unique sites need to be tissue specific, so they are bound to keep a general binding weakness to turn off binding in non-specific tissue (if I correctly understand?).

Solexa sequencing quite good in identifying high affinity motifs, but poor at low affinity motifs.

There is an in vivo hierarchy between RE1 for REST binding and it can be discriminated at the DNA sequence level.

Labels: ,


PLoS Computational Biology: Polymorphism Data Can Reveal the Origin of Species Abundance Statistics
Polymorphism Data Can Reveal the Origin of Species Abundance Statistics



Saint George was an ecocriminal

Here are different images of Saint George caught in the act of killing an endangered species, a mythical dragon, that has been extinct since then...


Phenotype data in Ensembl -- Fiona Cunningham

European Genotype Archive: Genome Wide Association Studies (GWAS) like WTCCC data and others. Only public information is public available under very strict rules.

NHGRI GWAS will be imported in Ensembl: it's got manually curated data of high quality.

Links in Variation view: link "Phenotype Data (n)";v=rs420259

Diagnostic testing: situation will now improve in Ensembl

Locus specific databases (LSDBs): p53, ABO, collagen, albinism, cystic
fibrosis, Altzheimer's disease, ... >700

The main aim is to be able to link the reference CDS sequence used by the biomedical community to the most up-to-date reference sequence in the genomics community. This mapping will allow clinicians to link all phenotype data on their end to the genomic data in the genomic community.

Political pros and cons have to be carefully handled and continuously explained. Ensembl openness, existing infrastructure and visibility is the biggest selling point to have these dbs linked in a common LSDB resource.

Website here will have LRG XML files and prettified HTML reports soon.

Labels: , ,


Ensembl Quality Checking -- Michael Schuster

All Ensembl gene predictions for all vertebrate species are based on experimental evidence:

  1. UniprotKB/Swiss-Prot
  2. NCBI RefSeq proteins and mRNAs
  3. UniprotKB/TrEMBL
  4. EMBL Nucleotide Sequence Archive

Aligning the evidences back to the gene prediction with Exonerate. Types of alignment results:

  • perfect
  • added start
  • longer region
  • missing start
  • non-matching start
  • non-matching region
  • shorter region
  • ...

Exonerate has an exhaustive mode that takes a lot more time but fixes some of the mini-intron and
mini-exon issues that sometimes occur. Exonerate cdna2genome is very useful for quality checking.

Genebuild now uses head-to-head alignments of genewise and exonerate, and takes the best in each case.

Some cases are still difficult to get right with algorithmic solutions: this is were the curators are needed.

Labels: , , ,



Alzheimer -- BBC

BBC NEWS | Health | Drug offers hope on Alzheimer's

A new drug which shows promise as a treatment for Alzheimer's disease has been developed by UK scientists.

The Proceedings of the National Academy of Sciences reports the drug, CPHPC, removes a protein thought to play a key role in Alzheimer's from the blood.

Tests at the University College London found the protein also disappeared from the brains of five Alzheimer's patients given the drug for three months.

Longer and larger scale clinical studies are now being planned.




Paul Preston on Breaking up banks

The issue here is how to handle financial globalization: bigger UK banks are good to make them well-positioned internationally, which one could argue is good for the UK financial system but, smaller UK banks are good to the retail consumer as pointed in the blog post. What is best? Ensure that big UK banks can compete internationally in making big deals and acquisitions? Or make sure that smaller UK banks are competing against each other within the UK and providing the best retail value?

Same happened with the wave of water, energy and IT privatizations in the last 10 years in Europe. Every country was playing a double game: trying to promote internal competition and avoid national monopolistic practices but also trying to beef up their national water/energy/IT company so that it could be well-positioned to acquire other European, South American, Asian companies...

BBC - Peston's Picks: Tories to break up banks?
Tories to break up banks?

Robert Peston | 12:40 UK time, Wednesday, 8 April 2009

Royal Bank of Scotland and Lloyds TSB could be dismantled after the next election, if the Tories form the government.

George OsborneHere's why I say that, in the form of excerpts from a speech that's just been delivered by George Osborne, the Shadow chancellor.

"We cannot allow one part of our economy to behave in a way that puts the rest of the economy at risk when it fails. We need to think deeply about whether we can sustain banks that are not only too big to fail, but potentially too big to bail.

By dint of its substantial shareholdings the government has a powerful influence over the future structure of the UK banking industry, whether it likes it or not.

When the time comes to sell off those shareholdings we need to think very carefully before simply selling them to the highest bidder without thinking through the consequences for the wider economy.

We should look at whether Britain in fact needs smaller banks.

For it would be a bitter irony if we came out of this crisis with a banking system that was even more concentrated and even riskier than the one we had before it."

The background to these remarks is that Royal Bank's balance sheet is considerably bigger than the total output of the British economy and it liabilities are considerably great than the entire public-sector debt of the UK.

Hence Osborne's allusion to banks that are "too big to bail". Or to put it another way, in rescuing RBS, the government has mortgaged all our economic futures to the rehabilitation of this giant bank.

As for Lloyds, it became far and away the biggest retail bank in the UK when it was permitted to buy HBOS last autumn.

In fact, it only rescued battered HBOS because the deal offered a once-in-a-generation opportunity to become the unchallenged market leader in British retail banking.

So if the next government were to dismantle Lloyds, depriving it of its enormous share of the current-account, savings and mortgage markets, that would be a reputational disaster for Lloyds' management.

There are two further implications of Osborne's remarks: first, that he would privatise Northern Rock as an independent bank, rather than flogging it to another bank; second, that he would ask the City watchdog, the FSA, and the competition authorities to consider whether other big British banks should be broken up. Even those where taxpayers don't have a big stake.

In the City, where I am tapping out this blog, this is big stuff.

To do my normal thing of ramming home the bleedin' obvious, the opinion polls are currently saying Osborne will be the next chancellor. Which means that his ambitions for what our banks should look like after the spring of next year are at least as significant as the future plans of the current chancellor.




EnsemblCompara "Back to the future": using phylogenetic information to help gene annotation

Here is an example success story of using phylogenetic information to improve human gene annotation. What do you see wrong in this EnsemblCompara GeneTree?

The human gene prediction has been split into one third to the left and two thirds to the right. Some of the other species have the full length prediction, but some of the 2x and projected genomes also have this issue. This case was reported to the Havana team at the Sanger and they have now built a human and mouse full-length prediction for the gene (notice the Havana_genes blue and dark green model):

The next Ensembl/Havana merge will hopefully reflect this change but, right now, you need to activate the Havana_genes DAS track to see the most up-to-date Havana annotation. There is a good number of these fixed now in the highly loved genomes, aka human, mouse and zebrafish. But there is a second level of genomes that are not getting any manual annotation here but may be annotated somewhere else...

Labels: ,


Promoting Open Source Bioinformatics

Interesting to see all the buzz that the Benjamin Franklin Award has generated in the blogosphere, twittersphere, facebooksphere and any of the other spheres out there... I still think there is something we should try and resolve in open source bioinformatics, which is promoting Open Source software to create more awareness in the scientific community. We need to reconcile the promotion of modularity and generality with the fact that giving credit to the scientists who contribute to Open Source Bioinformatics software is still important. Projects that have built very modular and generic components may be doing a lot for the bioinformatics community at large but, at the same time, the less atomic and single-purpose your software is, the more difficult it is to publish it in a prominent scientific journal. The same goes for citing it in the downstream publications: very atomic programs are very successful in citation metrics, but infrastructure code is not. This means that well-designed, well-implemented and well-tested software is often not prominent enough for new people to notice, and too many bioinformaticians resort to their own glue code for building their bioinformatics infrastructure.

There wouldn't be anything wrong with rewriting your own code over and over again if it weren't because: (a) people spend too much time writing scaffolding code that will let them access what is really new and interesting in their project and (b) that code tends to be used and tested only internally and almost never reused for any other party unless it has been very well designed and documented --- hence the name scaffolding code.

There is a really good chance now to build an infrastructure that brings up a terminal next to the next generation Petabyte-size data sources, using emerging "cloud" technologies. These technologies are already advanced in other fields other than bioinformatics, so we can leverage what it has already been done for us and make extensive use of it. These terminals don't need to be silly, and the community should provide in them as much prebuilt code as possible so that the new breed of bioinformaticians get used to have this software at their fingertips.

A few years ago all the effort was in building packages for different Linux distributions, so that people could easily install Open Source software on their in-house CPU clusters. I think we need to shift gears now to cloud software accessibility. The good news is that it seems everybody is happy with the common Ubuntu system as a start. I fear the proliferation of iPhone-like SDKs around that will make the existing bioinformatics software useless. In an era where everybody is acutely aware about governments having to pour our taxes into infrastructure that was already paid for, noone will like to see all existing bioinformatics software become a "toxic" or "legacy asset"!

Labels: , ,



Genomes to come: wallaby

Tammar Wallaby - Wikipedia, the free encyclopedia

The Tammar Wallaby (Macropus eugenii), also known as the Dama Wallaby or Darma Wallaby, is a small member of the kangaroo family and is the type species for research on kangaroos and marsupials.

It is found on offshore islands on the South Australian and Western Australian coast. It is classified as vermin on Kangaroo Island, where it seasonally breeds in large numbers and damages the echidna habitat on the island.

Labels: ,



Quarterly Activity Report -- Hinxton Sequence Forum Wellcome Trust Genome Campus

A quarterly activity report for the different activities that take place under what can be considered "sequence" at the WTGC in Hinxton, Cambridge, UK:

  • HGNC has made great progress in solving nomenclatures for 130 cases where the community has a diversity of opinions and it's difficult to agree on something. Very good point in saying that in the Internet era it's better to give a gene a name that is distinctive to common words that would clobber your Google search results. There is now a forum set up for different communities to use in discussions for gene family names.
  • Havana now has started using RNAseq data to confirm new genes found in human and zebrafish that didn't have evidence before. One new feature is a "confirmed intron" for when paired Solexa reads bridge two exons, with an associated score for read depth. Confident this type of data will bring out many interesting new annotations that couldn't be found before, e.g. genes expressed in a given tissue during a lapse of a few hours in the development. There are already a few examples in zebrafish.
  • Wormbase has been working hard on compiling more data from the modENCODE. Small but cool infrastructure achievement in having VMware images running for old releases that investigators can just pull and bring up on demand.
  • Ensembl Genomes has been successfully testing the beta sites for Bacteria, Protists and the first Metazoa build. Another Metazoa build is in progress, with all the phylogenetics goodness of the 12 Drosophila genomes plus the vectors plus C.elegans and a few other outgroups. The modENCODE project is about to complete the re-annotation of gene models using CAGE data that will bring more precise gene starts for melano and elegans. Ensembl Genomes is still working together with Manchester and now the US to put together an Aspergillus resource that provides the best value for money to researchers. PombBase is also being pursued, lots of labs interested in having it Ensemblified and ready to use.
* This is a personal blog. Things said here are not to be taken as official reports.

Labels: ,


200409   200412   200501   200502   200503   200504   200505   200506   200507   200508   200509   200510   200511   200512   200601   200602   200603   200604   200605   200606   200607   200608   200609   200610   200611   200612   200701   200702   200703   200704   200705   200707   200708   200709   200710   200711   200712   200801   200802   200803   200804   200805   200806   200807   200808   200809   200810   200811   200812   200901   200902   200903   200904   200905   200906   200907   200908   200909   200912   201001   201002   201003   201004   201007   201009   201011   201102  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]