castanyes blaves

Random ramblings about some random stuff, and things; but more stuff than things -- all in a mesmerizing and kaleidoscopic soapbox-like flow of words.

2/13/2011

Movies

Role models: great potential, but not fully developed.

Gulliver's travels: Jack Black bang-on comedy. Not enough Amanda Peet in the movie.

Life as we know it: pseeee...

Miracle at St. Anna: great surprise. Terrible ending but enjoyable otherwise.

Inside Man: great script and well executed, it could have been a great hit.

The Break-Up: seen it for the fifth time, it's a very special movie.

Tropic Thunder: Jack Black is great on it but not well rounded-up altogether film.

Red: great cast, classic action movie.

Keeping the faith: who would have thought that master of comedy would also be able to play on a romantic setting.

The American: on the sad side, I like what G.C. has turned into with age. Very sexy co-star, although the film has an abrupt finish that doesn't help in getting close to the characters.

Forgetting Sarah Marshall: very enjoyable, unpretentious comedy.

Hot Tub Time Machine: too much in-your-face at times, but laughable comedy.

What Happens in Vegas: does what it says on the tin.

The Switch: there is something I can't digest about this movie, although I understand why people want to like it.

Rumor Has It: good comedy which wants to be a bit too dramatic at times.

posted by avilella # 7:33 PM 0 Comments

11/15/2010

EMBL Postdoc retreat 2010 -- some notes

The EMBL Postdoc Retreat 2010 took place in Lubeck. Here are some notes and keywords for some of the seminars:

Timothy Sauders, EMBL-Heidelberg
Dissecting a noisy subcellular gradient in fission yeast

EIPOD Timothy Saunders, collaborators Martin Howard (JIC), Eileen Furlong. Image analysis project. POM1. Agreggates. FRAP.

Kevin Knoops, EMBL-Grenoble
Utrastructural analysis of the nidovirus replication complex
reveals an unique reticulovesicular network of modified
endoplasmic reticulum

New postdoc, presented previous project at U.Leiden (NL). EM tomography techniques. 300nM coronavirus. Vesicles connected to the rough ER.

Ciaran Carolan, EMBL-Hamburg
Combination of advanced shape description methodologies for the
identification of ligands and for drug design

Victor Lamzin, Hamburg. Gerrit Langer, Janet Thornton, Abdullarh Kahraman, Roman Laskowski. Institutions: EBI, St.Jude's Hospital, Malaria DB.

Lead generation -- bottleneck in drug design

Protein pocket -> shape -> shape features -> feature match

Surfnet program. Math descriptors, like Zernike moments, 3rd order moments. Matrix-based shape measures. Electrostatics. ATOLL database.

Daniel Fernández, EMBL-Grenoble
Structural analysis of stress tolerance proteins in plants

Jose Antonio Marquez Group - Grenoble, IBMCP - Valencia, Regina Antoni, Pedro Rodriguez

C2 Domain proteins. ABA receptor: PYR, PYL, RCARs. Humidity control device

Virginia van Delinder, EMBL-Heidelberg
Single-molecule TIRF without immobilization

Lemke Lab.

FRET distance between molecules. Microfluidics. Single-molecule FRET is tough. CFP-YFP pairs: cyan+yellow = green

PDMS device (channels). Droplet generation: 8 micron deep channels. Gave up on droplets, continuous channels.

Joseph Barry, EMBL-Heidelberg
Mathematical modeling of protein aging and turnover in live
yeast cells

EIPOD, Huber Group Heidelberg, Knop Group, Andreas Kaufmann

Computational physics

Snapshot Analysis Protein Stability SAPS. Protein-mCherry-sfGFP (43min, 5min)

Colour 1 -> snapshots -> colour 2

Genome-wide library of yeast strains. One protein tagged each time. MCD1 protein-cell cycle.

R package - deSolve

Jan Medenbach, EMBL-Heidelberg
A novel and general concept for the regulation of translation by
protein-controlled upstream open reading frames (pc-uORFs)

M.Hentze lab Heidelberg

msl-2. SXL

Chris Williams, EMBL-Hamburg
Insights into the regulation of an E2 enzyme by a non-
canonical binding partner

Structural Biology Unit, Matthias Wilmanns, Hamburg

Pex22 novel fold. No homologues in PDB. Y172A mutation blocks PEX4P-PEX22 binding

Pierre Khoueiry, EMBL-Heidelberg
Defining functionality and essentiality in the Drosophila
mesoderm network using inter- and intra-species comparisons

Eileen Furlong. Doing both wetlab and bioinformatics analysis

8008 CRMs mesoderm defined with about ~5 per gene -> need to purify this list

Mesoderm CRMs should be evolutionarily conserved

D.virilis ~50MYA from D.melanogaster

Compare network mel vs vir -> missing genes / new genes

Data from indels -- Jan Korbel lab

SVs/CNVs -> create regulatory changes

When we find an indel in a CRM -> might be affected by it -> redundancy -> property of CRMs

Hypothesis: developmental CRM deletions are lethal, they are supposed to avoid SVs

If one deleted, but others next to it -> still functional, redundancy in action, or not functional anymore

Andres Palencia, EMBL-Grenoble
Role of aminoacyl-tRNA synthetases in protein synthesis.
Case study: structural dynamics of the aminoacylation and proof-
reading cycle of leucyl-tRNA synthetase

Bacterial LeuRS as a model

Sebastian Glatt, EMBL-Heidelberg
"Ring"ulation of tRNA modification

Elp 4-5-6 complex. Elp 1-2-3

Rho-ssRNA, bind ssRNA, same structure as elongator. Asymmetric central cavity using heterohexameric assembly.

posted by avilella # 9:24 AM 0 Comments

9/05/2010

Alcohol consumption in Europe

BBC News - Alcohol consumption 'continues to fall'

2008 alcohol consumption in Europe (litres per head)

* Czech Republic - 12.3
* Austria - 10.4
* Lithuania - 10.1
* Germany - 10.0
* Spain/Hungary - 9.8
* Portugal/Slovakia/Denmark - 9.3
* Poland - 9.8
* Belgium/Luxembourg - 8.5
* UK - 8.4
* Finland/Greece - 7.6

Source: BBPA Statistical Handbook 2010

posted by avilella # 8:50 PM 0 Comments

7/22/2010

Lab life and ecology

Prof-like Substance: Job data in ecology and evolution fields

Euphemisms called “labs” coexist in structured universal aggregations where they compete with one another for scarce resources. Labs cooperate to produce copious numbers of zygotes, most of which disperse synchronously each year. The strongest find their way into the protective brood pouches of crusty adults who shed soft-shelled offspring at regular intervals (slowly developing zygotes die by the incompletely understood process of academic apoptosis). Juveniles develop a hard external carapace by intermittently joining and extracting themselves from other labs. The hardened but vulnerable sub-adults then join a common pool where they compete for space and position on rapidly eroding substrate in the universal aggregation. Many become dormant and fail to contribute to the gene (meme) pool. Some return to the lab as brood-rearing helpers. Few survive the rampant competition and frenzied cannibalism in the pool. Not all of the survivors are safe on the fragile substrate. A second apoptosis-like event eliminates the weak and meek. Only the most persistent or aggressive remain.

posted by avilella # 9:18 AM 0 Comments

4/24/2010

Pay per view, football and statistics

A very interesting web application that studies the score distributions for different football leagues:
http://understandinguncertainty.org/node/228
Something that people already know is that Pay-per-view TV had a major impact in how different teams performed in different years. In the early 2000's, many more teams in Spain, Italy and UK had a chance to sign highly-payed players, because they were receiving the monetary influx from Pay-per-view TV. That narrowed the differences among the traditionally best-performing teams, and a group of teams that were traditionally in middle-of-the-table positions. Some teams were better at investing this influx of money than others, and this meant they had much better chances of winning the national league. But as Pay-per-view normalized, and teams with the highest media interest got back at receiving more money than the rest of the teams, sharp differences reappeared.
See for example the results for the Spanish League in 2000-2001: the variance due to chance went up to 60%.

Whereas last year: Barcelona and Real Madrid quickly dominated the scoreboard, and the % variance due to chance ended at 34%.

Italy before the referee scandal was down to 18%:

posted by avilella # 8:37 PM 0 Comments

4/13/2010

Financial bonuses

Long time, no post. Here is a financial one: Vince Cable (LibDem UK) on bank bonuses. Many many people would subscribe to these rules. That is not the problem. Politicians will say: if there is multinational agreement on these rules, let's apply them. The very moment when one important country decides not to go for them, then it's war. War many centuries ago was about having a fleet of ships with cannons, then it turned into air force, then into nuclear weaponry. Nowadays, it's a financial war. For example, reams of paper being written about artificial devaluation of the currency in China. This is as close as a Cold War as it was two decades ago.

BBC - Peston's Picks: Lib Dems: Smaller bonuses, smaller City?

These are the highlights:

1) All bonuses over £2,500 would be payable in shares, which couldn't be redeemed, or pledged as security for loans or turned into anything spendable for at least five years.

2) No one on the board of a bank, not even the chief executive, would be eligible for a penny of bonus.

3) Loss-making banks would be banned from paying bonuses.

4) Every employee of a bank earning more than the prime minister - which Mr Clegg defines as circa £200,000 - would be publicly named.

posted by avilella # 1:07 PM 0 Comments

3/26/2010

Google Summer of Code 2010: interesting phyloinformatics projects

Google_Summer_of_Code_2010 – GenMAPP

IDEA 12: Phylogenetic Tree plugin ¶

Most of the software for visualization of phylogenetic tree is command line driven. Cytoscape's enhanced graphical abilities can be used to layout a phylogenetic tree, zoom a region, assign colors to groups of nodes and edges, generates publication-quality images. A Cytoscape plugin (PhyloTreePlugin) could become very useful tool for scientists studying phylogenetic trees. A student (GSoC 2009) already implemented some algorithms to visualize the rooted trees, this year we want to extend the plugin to support the visualization of un-rooted tree to make the plugin complete. The task includes the support of more format of phylogenetic tree and implements more algorithms for visualization of phylogenetic tree.

Language and skills: Java

Idea by: Peng Liang Wang

Potential Mentors: Peng Liang Wang, Scooter Morris, sign up here

Phyloinformatics Summer of Code 2010 - Phyloinformatics

Phylomovies: interactive animations of gene tree evolution

This project would produce a tool useful for both science and outreach, and which could ultimately become linked with the Ensembl database and be used by thousands of biologists worldwide every day.

Rationale
Evolving DNA is usually represented by static pictures, like trees, that are sometimes difficult to decipher. A movie is a more natural medium for presenting the evolutionary processes that shape the genomes of all living species. Comparative genomics databases such as Ensembl, Pfam, and HomoloGene use large amounts of data and powerful inference methods to construct highly-resolved evolutionary histories of individual gene families, including the characteristic patterns of duplication and speciation leading to the gene content observed today. However, the static nature of image-based visualizations can make the interpretation of such "gene family trees" a bit difficult.

In the same way that the Google-acquired Trendalyzer technology uses animation to clarify and expose trends in economic data, an evolutionary movie would help scientists and the public better understand the meaning and importance of evolutionary data.
Approach
The goal of this project would be to create an online widget that loads gene tree data in standard formats (XML / NHX format trees) and generates an interactive movie, allowing users to temporally navigate through the evolution of a given gene family. This would involve the design, implementation and exhibition of a novel visualization interface.

In terms of implementation, one option would be to use the PhyloWidget codebase as a starting point, although other options (such as Flash) should be considered, depending on the student's expertise. A mock-up animation of the evolution of the Tropomyosin gene can be found here. Animations relating a gene tree can also be found here [1].

Possible extensions to the tool would include the incorporation / correlation of species tree information and divergence time estimates, and exporting animations to non-interactive movie formats.
Challenges

* (medium) Identify the most appropriate libraries and tools to work with
* (hard) Define and implement a system for converting static phylogenetic trees (and associated metadata) into animations
* (medium-to-hard) Create an interface for visualizing / exploring / exporting the animations

Involved toolkits or projects

* Possible languages: Java, Flash, Javascript
* Possible toolkits / libraries: PhyloWidget, Archaeopteryx, ScripTree

Degree of difficulty and needed skills
Medium. Will require creative thinking, interface design, and solid background in a client-side browser language (Java, JavaScript or Flash).
Other topics
Any alternative proposals aimed towards improving the "accessibility" of phylogenetic trees to non-specialist audiences are encouraged. Also, alternative proposals would include implementing new methods for visualizing biological data on trees: e.g., bootstrap values / tree uncertainty, population size estimates, or uncertainty in divergence times.
Mentors
Gregory Jordan (PhyloWidget), Albert Vilella (Ensembl Compara)

Phyloinformatics Summer of Code 2010 - Phyloinformatics

Google maps-like multi-genome browsing in Jalview

Rationale
The amount of genomic data is increasing exponentially since the advent of the next generation sequencing techniques. New visualizations tools are needed to be able to compare genomic data at multiple resolutions in a natural way. One way to do this would be to adopt user interaction principles from successful multi-resolution interfaces, such as Google Maps.
Approach
Most problems with Jalview's interface become obvious when working with large alignments. An initial examination of the UI issues should first be made, and some solutions proposed to improve the user's experience when working with large alignments (and trees). These solutions should then be prototyped using a snapshot of the current Jalview 2.X codebase so their effectiveness can be assessed by expert users.
Challenges

* (easy/medium) adapting Jalview's existing multi-window visualizations implemented in AWT and Swing
* (hard) efficient multiscale rendering of very large multiple sequence alignments
* (medium/hard) solving the memory and rendering issues that arise when interactively visualizing and editing very large alignments.

Degree of difficulty and needed skills
Medium to hard. Java. Experience with either or both AWT and Swing. Some familiarity with low level file handling, or alternately, experience with handling datasets with many thousands of sequences (e.g. with Picard).
Other Topics
There are plenty of other areas to work on in Jalview - topics include AJAX and DAS (extending the JalviewLite for working with DAS annotation servers), Phylogenetic visualization (wider tree format support and better interactive visualization) and extending support for RNA (linked secondary structure visualization, alignment and analysis services). Please contact the Mentors for further information.
Mentors
Jim Procter and Albert Vilella

posted by avilella # 9:04 AM 0 Comments

3/02/2010

Brain evolution @ news.bbc.co.uk

BBC News - Did the discovery of cooking make us human?

Cooking is something we all take for granted but a new theory suggests that if we had not learned to cook food, not only would we still look like chimps but, like them, we would also be compelled to spend most of the day chewing.

posted by avilella # 1:04 PM 0 Comments

2/05/2010

Paul Krugman on Spanish finances

The Spanish Tragedy - Paul Krugman Blog - NYTimes.com

OECD Government debt as % of GDP

As Europe is roiled by sovereign debt fears, it’s important to realize that the crisis in the largest of the PIIGS (Portugal, Ireland, Italy, Greece, Spain) has nothing to do with fiscal irresponsibility. On the even of the crisis, Spain was running a budget surplus; its debts, as you can see in the figure above, were low relative to GDP.

So what happened? Spain is an object lesson in the problems of having monetary union without fiscal and labor market integration. First, there was a huge boom in Spain, largely driven by a housing bubble — and financed by capital outflows from Germany. This boom pulled up Spanish wages. Then the bubble burst, leaving Spanish labor overpriced relative to Germany and France, and precipitating a surge in unemployment. It also led to large Spanish budget deficits, mainly because of collapsing revenue but also due to efforts to limit the rise in unemployment.

If Spain had its own currency, this would be a good time to devalue; but it doesn’t.

On the other hand, if Spain were like Florida, its problems wouldn’t be as severe. The budget deficit wouldn’t be as large, because social insurance payments would be coming from Brussels, just as Social Security and Medicare come from Washington. And there would be a safety valve for unemployment, as many workers would migrate to regions with better prospects. (Wages wouldn’t have gone up as much in the first place, because of in-migration).

The point is that this has nothing to do with a spendthrift government; what’s happening to Spain reflects the inherent problems with the euro, which now more than ever looks like a monetary union too far.

posted by avilella # 1:55 PM 0 Comments

2/01/2010

UK is not Greece -- Peston's Picks

BBC - Peston's Picks: Tories withdraw support from the gilt market

To be clear, there is no direct read-across (to use the dreadful business cliche) from Greece's acute difficulties in borrowing: the UK's finances are in better shape than Greece's, the UK economy is more flexible than Greece's and there isn't international pressure on the UK to improve the accuracy and reliability of government book-keeping.

posted by avilella # 9:59 AM 0 Comments

1/17/2010

Pull someone's chestnuts out of the fire Definition | Definition of Pull someone's chestnuts out of the fire at Dictionary.com

Idiom
12. pull someone's chestnuts out of the fire, to rescue someone from a difficulty.

posted by avilella # 8:01 PM 0 Comments

12/15/2009

I knew this could be done!!!

For Bicyclists Needing a Boost, This Wheel May Help - NYTimes.com

PD: I want one!

posted by avilella # 11:57 AM 0 Comments

9/27/2009

Paul Krugman explains cap-and-trade

The textbook economics of cap-and-trade - Paul Krugman Blog - NYTimes.com

I realized, after the last post, that it might be useful to write down just what the Econ 101 version of cap and trade looks like; as it happens, this also helps explain the intellectual sins of Glenn Beck and Martin Feldstein.

So here we go. Bear in mind that something like what follows can be found in just about every intro textbook.

Think of the benefits to the private sector from pollution. Yes, benefits — in the sense that it’s cheaper to pollute than not to, or that it’s easier to produce goods if you don’t worry about whatever emissions result as a byproduct. So we can think of drawing a curve representing the private marginal benefit of emissions, as in this figure:

In the absence of government action, the private sector will increase emissions up to the point where there is no further marginal benefit. That is, emissions will rise to whatever level is implied by profit-maximization, paying no attention to the effects on the environment.

A cap-and-trade system puts a limit on overall emissions, so that emitters have to pay a price for emitting. This price will, as shown in the figure above, equal the marginal benefit of the last unit of emissions allowed.

Now, the cost to the economy of this limit is the benefit the private sector would have gotten by emitting more than is allowed under the cap. It’s shown in the figure as the red triangle labeled “deadweight loss”. CBO puts these losses under Waxman-Markey at 0.2-0.7 percent of GDP in 2020, 1.1 to 3.4 percent in 2050. These costs have to be set against the environmental benefits.

In addition to this overall economic cost, there’s a distributional effect. The creation of cap and trade means that emission permits command a market price, and the value of these permits — the blue rectangle — goes to someone. Under Waxman-Markey, some of it (a growing fraction over time) would be captured by the government through auctions, and used to cut or avoid increases in other taxes — in effect, recycled to consumers. The rest would be passed on to industry — but because the biggest recipients would be regulated utilities, much of this would also be passed on to consumers.

OK, now let’s send in Beck and Feldstein.

Beck got his number from someone who learned about a guesstimate of what the auction value of permits might be (way higher than current estimates, by the way), divided by the number of households, and proclaimed this the cost of the bill. In effect, he looked at a guess about the size of the blue rectangle, which does not represent an economic cost, and called that the cost to the economy.

In a way, though, what Martin Feldstein did was worse. He took the CBO’s estimate of “compliance costs”, which was $1600 per household in an early report (it’s now down to $900, but who’s counting?), and implied that this was the economic cost of the legislation. But “compliance costs” are basically the sum of the blue rectangle and the red triangle; the true economic costs are just the triangle, and are much smaller.

Another way to say this is that under the Feldstein method, any time you try to correct an externality, which necessarily means changing relative prices, all of the negative effects of the price change will be counted as a cost — but none of the positive effects will be counted as a benefit.

Bad stuff. And what you should bear in mind is that all I’m doing here is conventional neoclassical economics, quite literally basic textbook material. What does it say when the people who claim to believe in this stuff throw it out the window as soon as it leads to policy conclusions they don’t like?

posted by avilella # 5:29 PM 0 Comments

8/06/2009

Caption competition

BBC - Magazine Monitor: Caption Competition

Caption Competition
12. At 12:31pm on 06 Aug 2009, LaurenceLane wrote:
The PGA have been urged to rule on the use of polyurethane suits by spectators, even if rain has been forecast.

12:00 UK time, Thursday, 6 August 2009

Labels: ramblings

posted by avilella # 1:27 PM 0 Comments

7/02/2009

What a NGS IT person should be skilled in

PLoS Computational Biology: Managing and Analyzing Next-Generation Sequence Data

The skills necessary within the Facility include the following.

1.

An intimate knowledge of UNIX-based operating systems.
2.

Understanding of a scripting language such as Perl.
3.

An understanding of parallel computing environments for UNIX clusters.
4.

Knowledge of network-based data storage.
5.

General knowledge of biology and genome sciences.
6.

Ability to derive data analysis and software requirements from investigators who do not have a sophisticated understanding of information technology.
7.

Ability to develop software encapsulating new analysis methods.
8.

Understanding of relational databases and database architecture.
9.

Ability to seek out and test novel bioinformatics software and analysis routines.

Labels: genomics, nextgen sequencing, open source

posted by avilella # 11:09 AM 0 Comments

6/24/2009

Lamprey Genome rearrangements

DNA Jettisoned From Lamprey Genome During Development | GenomeWeb Daily News | Sequencing | GenomeWeb

Amemiya and his co-workers became suspicious that the lamprey's genome structure and composition was changing during development when they heard rumors lamprey genome sequences efforts were being complicated by genome fragmentation. They speculated that this might be due to genome rearrangements similar to those described for the hagfish, a chordate and superficially similar organism.

To test this, the researchers compared germ line and somatic tissues from sea lamprey caught in Lake Michigan.

Indeed, they found that the genome was larger in sperm (germ line cells) than in adult blood nuclei (somatic cells), even within the same individual. The sperm cells also contained more DNA than kidney and liver cells, which both had similar DNA content to red blood cells. Overall, the researchers noted, sperm genomes contained some 20 percent more DNA than adult cells such as red blood cells.

Labels: genomics

posted by avilella # 10:51 AM 0 Comments

6/12/2009

Life After GWAS: For Some Researchers, Focus Shifts to Rare Variants, CNVs | GenomeWeb Daily News | Sequencing | GenomeWeb

Over the last several years, genome-wide association studies have become the primary method for identifying variations associated with human disease, but the approach has shortcomings that are leading some in the genomics community to push more aggressively into the post-GWAS era.

At Cambridge Healthtech Institute's Genomic Tools and Technologies Summit held here this week, many speakers noted that even though GWA studies have linked hundreds of common SNPs to disease, these variants account for only a very small portion of disease heritability, which has raised doubts over their clinical value. A number of talks focused on two key alternatives to GWAS: the discovery of rare variants, as opposed to common variants, with a role in disease; and an increasing focus on copy number variants rather than SNPs.

Life After GWAS: For Some Researchers, Focus Shifts to Rare Variants, CNVs | GenomeWeb Daily News | Sequencing | GenomeWeb

"GWAS was never meant to substitute for fine genomic sequencing," but rather to identify regions of linkage disequilibrium in the genome that warrant further study

Life After GWAS: For Some Researchers, Focus Shifts to Rare Variants, CNVs | GenomeWeb Daily News | Sequencing | GenomeWeb

Lupski said that efforts like the 1000 Genomes Project will likely produce valuable information that will drive improvements in the use of sequencing for CNV detection. "It's coming along," he said. "I think this will be solved."

posted by avilella # 1:48 PM 0 Comments

5/26/2009

Ubuntu trick -- how to reset evolution email

gconftool-2 --recursive-unset /apps/evolution

Labels: ramblings

posted by avilella # 8:54 AM 0 Comments

5/21/2009

Gene by gene turns genome-by-genome

This is a good example of the kind of paper we are probably going to see more and more often in the future:

Gene discovery using massively parallel pyrosequencing to develop ESTs for the flesh fly Sarcophaga crassipalpis
http://www.ncbi.nlm.nih.gov/pubmed/19454017

Get (a) a certain amount of RNAseq reads for your species "X", (b) build as many full-length cDNAs as possible from the fragments and (c) compare against close species in terms of:

New interesting cDNAs that don't have hits against existing public cDNAs -- What do they do?
Expression patterns -- Are these different to the patterns in other close species?
Protein coding evolution -- run pairwise dNdS against closest genome or tree-based dNdS against an existing phylogeny [1,2] -- Does anything show up in a Gene Ontology enrichment analysis?

Then the data is published and stored in a publicly available database, and can be added to the pool to compare against for the next project. Iterate :-)

It used to be gene-by-gene sequencing and it's now transcriptome by transcriptome sequencing. There are still sequence error and sequence coverage issues: one of my first scientific mentors, Prof. Montserrat Aguade was one of the first to do gene sequencing on the Adh gene in Drosophila when doing her PhD in Harvard. People then extended Adh sequencing and analysis to other Drosophila species, then other clades, then other genes, then some gene families like odorant binding proteins for a bunch of Drosophila species or populations, or gene pathways like the insulin pathway, etc.

But now we have a much broader picture with a rather complete transcriptome. And most of the sequencing issues are going to be corrected across the phylogeny in pretty much the same way as allele imputation is filling the gaps at the population genomics level (e.g. 1000 Genomes Project).

I am very excited about all this!

Labels: genomics, nextgen sequencing, ramblings

posted by avilella # 4:02 PM 0 Comments

5/11/2009

Anolis carolinensis: First reptile in Ensembl -- Amonida Sadissa

Very few anolis proteins and cDNAs, many more ESTs.
Used Uniprot PE Evidence ranking to generate transcript models with genewise
Different parameters for Exonerate, including exhaustive option for cDNAs, including 31000 chicken set (which wasn't very useful in the end)
Chris Ponting group provided extra models and kill list to rename some retrotransposons and pseudogenes
Manually looked at some of the EST genes in Chris' list to reincorporate them in the main db

Word of advise to all Genome Sequencing Centers out there: now that RNAseq is cheap and powerful, please allocate some of your budget for that instead of spending all in genomic sequencing. Contact Sanger people for PCR-free sample preparation protocols, which makes a huge difference in terms of avoiding duplicity.

Labels: ensembl, gene prediction, scientific talk

posted by avilella # 10:33 AM 0 Comments

5/07/2009

FC Barcelona 1991 and 2009 -- find the similarities

http://www.youtube.com/watch?v=6OHYAMG5RTk (jump to 1:00)
http://www.youtube.com/watch?v=1-4NpWO4ObU

The guy who jumps to celebrate with the wet coat was a very young Guardiola as a player, yesterday he jumped to celebrate at the same spot as a coach... now in a suit and with much less hair...

Labels: ramblings

posted by avilella # 10:58 AM 0 Comments

5/05/2009

Drug combinations, gene combinations and cancer -- Sven Nelander

First seminar of the Systems Biology series at the EBI. This series
starts with a strong focus on the modeling side of Systems Biology,
but with the idea of extending it to other subfields.

Title: Drug combinations, gene combinations and cancer
Speaker: Dr Sven Nelander
Affiliation: Goetebourg University
Date & Time: Tuesday 5th May 2009; 14.00-15.00
Location: C202-3, Shared facilities
Host: Mikhail Spivakov

There is no rational combination theory for different anticancer drugs
so far. One anticancer drug for one step in the pathway, but no
interrelations described.

Increasing number of genotype to phenotype pairs of data sets: what is
the system in the middle?

TCGA with 200 ovarian tumors, first data released last week. Amazing
data production and integrative bioinformatics, but space for more
modeling.

Example: CoPIA

CNV profiles -- transcriptional network -- final mRNA profiles

Now we have 10 million datapoints that with a fully automated
procedure give a testable hypotheses: 3 of the hub (pleiotropic) genes
are not previously implicated in glioma. GO enrichment analysis makes
sense.

PhD student - Theresia Dahl
Peter Gennemark -- mathematical models
Ulrike Nuber
Chris Sander -- old boss
Linda Karlsson-Lindahl
Debora Marks
Niki Schultz
Bodil Nordlander -- now testing one of the new hub genes

Labels: EBI Systems Biology series, scientific talk

posted by avilella # 2:46 PM 0 Comments

5/01/2009

NCBI SRA blastn service

Never easier before to check your sequence against the NCBI Short Read Archive database:

NCBI SRA BLAST

First thoughts:

Transcriptome coverage is hugely biased to the 3' end (or 5' depending on library preparation). A lot more than I suspected.
Would be great to do queries for phylogenetic subclades: e.g. my human sequence against all SRA data for primates.
A lot of the 454 data has homopolymer issues, mostly TTT[...]TTTs but also some others:

Query  465  GGGCCTTGACAAAGTGTAAACCGCATGGATGGGCTTCCCC-AAGGATTTATTGACATTGC  523<br /><font color="#ff0000"><b>Sbjct</b></font>  249  ........................................<font color="#ff0000"><b>C</b></font>...................  190<br /><br /></pre><ul><li>Some of these (unless they are real variations) get picked up as mismatches, some as indels:<pre>Query  1    CGGCAAGGTATGTGCGTGATTTTGGGCCCACGTGTATTTCCATTAATTTT-AAGCCGTAA  59<br /><font color="#ff0000"><b>Sbjct</b></font>  224  ..................................................<font color="#ff0000"><b>T</b></font>.........  165<br /><br />Query  60   TTGTCGTTTTTGGCGGTTTCGAGTTGAACTGCGTTAGTCCGTGCGCTGTTCGCAAGTGTG  119<br /><font color="#ff0000"><b>Sbjct</b></font>  164  ..........<font color="#ff0000"><b>C</b></font>.................................................  105<br /></pre></li></ul><pre><br />Query  61   TGTCGTTTTTGGCGGTTTCGAGTTGAACTGCGTTAGTCCGTGCGCTGTTCGCAAGTGTGC  120<br /><font color="#ff0000"><b>Sbjct</b></font>  118  .....<font color="#ff0000"><b>C</b></font>......................................................  177<br />Query  61   TGTCG-TTTTTGGCGGTTTCGAGTTGAACTGCGTTAGTCCGTGCGCTGTTCGCAAGTGTG  119<br /><font color="#ff0000"><b>Sbjct</b></font>  160  .....<font color="#ff0000"><b>T</b></font>...............................<font color="#ff0000"><b>-</b></font>......................  102<br />Query  61   TTAATTTTAAGCCGTAATTGTCGTTTTTGGCGGTTTCGAGTTGAACTGCGTTAGTCCGTG  120<br /><font color="#ff0000"><b>Sbjct</b></font>  181  ...........................<font color="#ff0000"><b>C</b></font>................................  122<br /><br /><br /><br />

Labels: genomics, nextgen sequencing

posted by avilella # 4:46 PM 0 Comments

A customized and versatile high-density genotyping array for the mouse -- Gary Churchill, The Jackson Laboratory, USA

Microarray based genotyping is an inexpensive and powerful tool to characterize genetic variation. High-density genotyping microarrays are commercially available for humans, economically important livestock and model organisms. However, they have not been available previously for the laboratory mouse, the premier mammalian model organism for biomedical research. Here we describe a custom high-density mouse genotyping array. The Mouse Diversity array was designed to capture the known genetic variation present in the laboratory mouse. It contains 623,124 SNPs distributed across the 19 mouse autosomes, the sex chromosomes, and the mitochondria with a median spacing of one SNP every 1,411 bp in the nuclear genome. The array also contains 916,269 invariant probes that are targeted to functional elements and regions of the genome known to harbor segmental duplications. The nature of the probes opens the door to a variety of novel applications including the characterization of copy number variation, allele specific gene expression and DNA methylation. Performance of the array based on call rate, replication and concordance with previously known genotypes is exceptional. The content-rich Mouse Diversity array provides a critical new tool for mouse genetics including the possibility of extending the successes of genome-wide association studies in humans to the mouse.

Funny comment -- This may be the last chip we do. The economics tell us the line is still below for chips, but sequencing is getting cheaper.

History: people catching mice, trading them, etc. Bottlenecks and all sort of artificial effects.

Diversity 11 classical inbred strains: some chromosomal regions with extremely low diversity. Is this petness? Longevity/Fecundity?

Problem with ascertainment bias: 623124 phylogenetically informative SNPs with known ascertainment

Collaborative Cross -- 8 M. musculus lines -- each contributing equally to the final "line". All inbred by now.

Phenotypes of intermediate CCs: e.g. voluntary exercise goes from 0 miles per day to 18 miles per day.

With 2 different inbred parents, we get complex children but with theoretically predictable phenotype, which means doing GWAS with phenotypes "a la carte".

Also, just by mixing genomes, creating diversity that was not in the parents: very useful novel phenotypic diversity.

Resolution is 7x better. CC will not be GWAS-level, but on the order of 10 genes or MB level resolution. Possibly gene level resolution in 10 generations. Always a mapping resolution panel and a validation panel, going back to the inbred lines.

Selection strictly by random number. Maintaining the diversity is good, lucky because natural selection already took a toll on the original strains.

Done in a way to maximise diversity, not to mimic human population structure. Deep reservoir of diversity for studying phenotypes.

A-Male/B-female crosses and compare to A-Female/B-Male in terms of sex-related epigenetics and other studies.

Some strains will die out along the way, but in 5-10 years should get a lot of info out of it.

Hyuna Yang a lot of array work.
David Aylor pop.struct.

Published mouse distances:

Mus cervicolor - Mus crociduroides = 7.60 MYA
Mus cervicolor - Mus haussa = 6.60 MYA
Mus cervicolor - Mus indutus = 6.60 MYA
Mus cervicolor - Mus mattheyi = 6.60 MYA
Mus cervicolor - Mus minutoides = 6.60 MYA
Mus cervicolor - Mus musculoides = 6.60 MYA
Mus cervicolor - Mus musculus = 4.80 MYA
Mus cervicolor - Mus pahari = 7.60 MYA
Mus cervicolor - Mus platythrix = 7.10 MYA
Mus cervicolor - Mus setulosus = 6.60 MYA
Mus cervicolor - Mus spretus = 4.80 MYA
Mus crociduroides - Mus haussa = 7.60 MYA
Mus crociduroides - Mus indutus = 7.60 MYA
Mus crociduroides - Mus mattheyi = 7.60 MYA
Mus crociduroides - Mus minutoides = 7.60 MYA
Mus crociduroides - Mus musculoides = 7.60 MYA
Mus crociduroides - Mus musculus = 7.60 MYA
Mus crociduroides - Mus pahari = 3.40 MYA
Mus crociduroides - Mus platythrix = 7.60 MYA
Mus crociduroides - Mus setulosus = 7.60 MYA
Mus crociduroides - Mus spretus = 7.60 MYA
Mus haussa - Mus indutus = 3.20 MYA
Mus haussa - Mus mattheyi = 2.60 MYA
Mus haussa - Mus minutoides = 3.20 MYA
Mus haussa - Mus musculoides = 3.20 MYA
Mus haussa - Mus musculus = 6.60 MYA
Mus haussa - Mus pahari = 7.60 MYA
Mus haussa - Mus platythrix = 7.10 MYA
Mus haussa - Mus setulosus = 4.00 MYA
Mus haussa - Mus spretus = 6.60 MYA
Mus indutus - Mus mattheyi = 3.20 MYA
Mus indutus - Mus minutoides = 2.50 MYA
Mus indutus - Mus musculoides = 2.50 MYA
Mus indutus - Mus musculus = 6.60 MYA
Mus indutus - Mus pahari = 7.60 MYA
Mus indutus - Mus platythrix = 7.10 MYA
Mus indutus - Mus setulosus = 4.00 MYA
Mus indutus - Mus spretus = 6.60 MYA
Mus mattheyi - Mus minutoides = 3.20 MYA
Mus mattheyi - Mus musculoides = 3.20 MYA
Mus mattheyi - Mus musculus = 6.60 MYA
Mus mattheyi - Mus pahari = 7.60 MYA
Mus mattheyi - Mus platythrix = 7.10 MYA
Mus mattheyi - Mus setulosus = 4.00 MYA
Mus mattheyi - Mus spretus = 6.60 MYA
Mus minutoides - Mus musculoides = 1.60 MYA
Mus minutoides - Mus musculus = 6.60 MYA
Mus minutoides - Mus pahari = 7.60 MYA
Mus minutoides - Mus platythrix = 7.10 MYA
Mus minutoides - Mus setulosus = 4.00 MYA
Mus minutoides - Mus spretus = 6.60 MYA
Mus musculoides - Mus musculus = 6.60 MYA
Mus musculoides - Mus pahari = 7.60 MYA
Mus musculoides - Mus platythrix = 7.10 MYA
Mus musculoides - Mus setulosus = 4.00 MYA
Mus musculoides - Mus spretus = 6.60 MYA
Mus musculus - Mus pahari = 7.60 MYA
Mus musculus - Mus platythrix = 7.10 MYA
Mus musculus - Mus setulosus = 6.60 MYA
Mus musculus - Mus spretus = 2.30 MYA
Mus pahari - Mus platythrix = 7.60 MYA
Mus pahari - Mus setulosus = 7.60 MYA
Mus pahari - Mus spretus = 7.60 MYA
Mus platythrix - Mus setulosus = 7.10 MYA
Mus platythrix - Mus spretus = 7.10 MYA
Mus setulosus - Mus spretus = 6.60 MYA

Labels: genomics, nextgen sequencing, scientific talk

posted by avilella # 2:43 PM 0 Comments

4/30/2009

RNAseq in the worm -- Gary Williams

Illumina Short-read transcriptome data has the potential to help solve many problems with curating gene models and the genomic sequence in C. elegans. This is an initial look at the data and some examples of how it can be used.

So far, C. elegans gene predictions:

36% - fully confirmed by ESTs
48% - partially confirmed
14% - no transcript confirmation

RNAseq data -- different worms than the genome, so some polymorphisms expected -- 200bp inserts, 36bp paired end reads

MAQ or cross-match to genomic or transcript sequences

6137 new splice junctions (6% increase)

Jumped from 70000 to 98000 splice junctions.
3x as many polyA sites
80 possible new coding genes

V-shaped coverages -- validation against traces, then:

Detected sequencing error, correction needed for the reference
Detected alternative haplotype

Moving towards single-cell sequencing -- not sequencing in tiny cells but sequencing each cell in each developmental state in the worm. Moving towards RNA sequencing C. briggsae and C. remanei.

Updated gene builds will be given to other projects. Next Ensembl Metazoa comparative genomics build may already have the modENCODE-updated C. elegans and D. melanogaster builds.

Labels: ensembl, genomics, nextgen sequencing, scientific talk

posted by avilella # 10:39 AM 0 Comments

4/28/2009

Peston on makein banks safe

BBC - Peston's Picks: Making banks safe

For what it's worth, there are two reasons why it might make sense to force our biggest and most complex banks to hold more capital than their smaller, simpler peers: if big super-banks have the privilege of knowing that we as taxpayers would always bail them out in a crisis, surely they've got to put in place treble protection against the risk that they'd call on us for such help; also the costs of holding the extra capital might encourage them to slim down and simplify their operations.

Labels: ramblings

posted by avilella # 12:40 PM 0 Comments

The Molecules and Mechanisms of Instinctive Behaviour in Mammals -- Darren Logan, The Scripps Research Institute, La Jolla USA

Abstract

Social or behavioural disorders affect a quarter of individuals at some time during their lives however the molecules and mechanisms that mediate social cues, process their meaning, and initiate the corresponding behaviour are unknown. Instinctive social behaviours in mammals are thought to be largely promoted by pheromones: specialized olfactory cues secreted by one animal that directly influence the behaviour of another. Here I will describe studies into two instinctive, olfactory mediated behaviours in mice, aggression and pup suckling.

Our studies found that aggression is promoted by specific protein pheromones excreted in male urine. These activate specialized, finely tuned sensory neurons in the noses of other males, resulting in a robust aggressive behaviour in the recipient. Our genomic and functional characterization of the gene family encoding these pheromones reveals an extraordinary scope for information-coding. I will describe our recent efforts to elucidate their social significance using cellular and behavioural techniques.

Pup suckling is a behaviour that is found in all mammals and is thought to be promoted by pheromones emitted by the mother and detected by the infant. We found that newborn mice do use maternal odour cues to promote suckling but, in contrast to the aggression pheromones, these cues are not genetically predetermined to elicit behaviour. Instead, the cues are complex, variable and learned by pups around birth. Suckling is subsequently initiated when the pup recognizes the same odour pattern in the context of their mother's nipple. The sensory neurons that mediate this are not specialized and found in the noses of all mammals, including humans.

Together these studies demonstrate a diversity of mechanisms and molecules that underlie instinctive behaviours, and are a first step towards understanding the neural circuitry of social interaction.

Labels: scientific talk

posted by avilella # 9:39 AM 0 Comments

4/27/2009

Giving functional genomics a REST -- Alex Bruce

http://www.ensembl.org/Homo_sapiens/Transcript/ProteinSummary?db=core;g=ENSG00000084093;r=4:57468799-57493097;t=ENST00000309042

interesting case where a small skipping exon generates an extra copy for the Znf-C2H2 domain.

REST is an essential vertebrate transcription factor with very diverse roles. It has an important role in regulatory secretory pathway. Independently confirmed by 2 other groups.

RE1 array used to identify misregulated REST target genes in diseases like Huntington's.

RE1 "half" sites. Canonical/Transfac/Discovered motifs.

Different evolutionary pressures over RE1 sites seem to be associating with different function subsets: common sites are less well conserved than unique sites. Unique sites need to be tissue specific, so they are bound to keep a general binding weakness to turn off binding in non-specific tissue (if I correctly understand?).

Solexa sequencing quite good in identifying high affinity motifs, but poor at low affinity motifs.

There is an in vivo hierarchy between RE1 for REST binding and it can be discriminated at the DNA sequence level.

Labels: genomics, scientific talk

posted by avilella # 1:12 PM 0 Comments

4/24/2009

PLoS Computational Biology: Polymorphism Data Can Reveal the Origin of Species Abundance Statistics

Polymorphism Data Can Reveal the Origin of Species Abundance Statistics

posted by avilella # 9:19 AM 0 Comments

4/23/2009

Saint George was an ecocriminal

Here are different images of Saint George caught in the act of killing an endangered species, a mythical dragon, that has been extinct since then...

posted by avilella # 11:16 AM 0 Comments

Phenotype data in Ensembl -- Fiona Cunningham

European Genotype Archive: Genome Wide Association Studies (GWAS) like WTCCC data and others. Only public information is public available under very strict rules.

NHGRI GWAS will be imported in Ensembl: it's got manually curated data of high quality.

Links in Variation view: link "Phenotype Data (n)"

http://www.ensembl.org/Homo_sapiens/Variation/Phenotype?source=dbSNP;v=rs420259

Diagnostic testing: situation will now improve in Ensembl

Locus specific databases (LSDBs): p53, ABO, collagen, albinism, cystic
fibrosis, Altzheimer's disease, ... >700

The main aim is to be able to link the reference CDS sequence used by the biomedical community to the most up-to-date reference sequence in the genomics community. This mapping will allow clinicians to link all phenotype data on their end to the genomic data in the genomic community.

Political pros and cons have to be carefully handled and continuously explained. Ensembl openness, existing infrastructure and visibility is the biggest selling point to have these dbs linked in a common LSDB resource.

Website here will have LRG XML files and prettified HTML reports soon.

Labels: ensembl, genomics, scientific talk

posted by avilella # 11:08 AM 0 Comments

Ensembl Quality Checking -- Michael Schuster

All Ensembl gene predictions for all vertebrate species are based on experimental evidence:

UniprotKB/Swiss-Prot
NCBI RefSeq proteins and mRNAs
UniprotKB/TrEMBL
EMBL Nucleotide Sequence Archive

Aligning the evidences back to the gene prediction with Exonerate. Types of alignment results:

perfect
added start
longer region
missing start
non-matching start
non-matching region
shorter region
...

Exonerate has an exhaustive mode that takes a lot more time but fixes some of the mini-intron and
mini-exon issues that sometimes occur. Exonerate cdna2genome is very useful for quality checking.

Genebuild now uses head-to-head alignments of genewise and exonerate, and takes the best in each case.

Some cases are still difficult to get right with algorithmic solutions: this is were the curators are needed.

Labels: ensembl, gene prediction, genomics, scientific talk

posted by avilella # 10:55 AM 0 Comments

4/14/2009

Alzheimer -- BBC

BBC NEWS | Health | Drug offers hope on Alzheimer's

A new drug which shows promise as a treatment for Alzheimer's disease has been developed by UK scientists.
The Proceedings of the National Academy of Sciences reports the drug, CPHPC, removes a protein thought to play a key role in Alzheimer's from the blood.

Tests at the University College London found the protein also disappeared from the brains of five Alzheimer's patients given the drug for three months.

Longer and larger scale clinical studies are now being planned.

Labels: light science

posted by avilella # 9:51 AM 0 Comments

4/08/2009

Paul Preston on Breaking up banks

The issue here is how to handle financial globalization: bigger UK banks are good to make them well-positioned internationally, which one could argue is good for the UK financial system but, smaller UK banks are good to the retail consumer as pointed in the blog post. What is best? Ensure that big UK banks can compete internationally in making big deals and acquisitions? Or make sure that smaller UK banks are competing against each other within the UK and providing the best retail value?

Same happened with the wave of water, energy and IT privatizations in the last 10 years in Europe. Every country was playing a double game: trying to promote internal competition and avoid national monopolistic practices but also trying to beef up their national water/energy/IT company so that it could be well-positioned to acquire other European, South American, Asian companies...

BBC - Peston's Picks: Tories to break up banks?

Tories to break up banks?

Robert Peston | 12:40 UK time, Wednesday, 8 April 2009

Royal Bank of Scotland and Lloyds TSB could be dismantled after the next election, if the Tories form the government.

George OsborneHere's why I say that, in the form of excerpts from a speech that's just been delivered by George Osborne, the Shadow chancellor.

"We cannot allow one part of our economy to behave in a way that puts the rest of the economy at risk when it fails. We need to think deeply about whether we can sustain banks that are not only too big to fail, but potentially too big to bail.

By dint of its substantial shareholdings the government has a powerful influence over the future structure of the UK banking industry, whether it likes it or not.

When the time comes to sell off those shareholdings we need to think very carefully before simply selling them to the highest bidder without thinking through the consequences for the wider economy.

We should look at whether Britain in fact needs smaller banks.

For it would be a bitter irony if we came out of this crisis with a banking system that was even more concentrated and even riskier than the one we had before it."

The background to these remarks is that Royal Bank's balance sheet is considerably bigger than the total output of the British economy and it liabilities are considerably great than the entire public-sector debt of the UK.

Hence Osborne's allusion to banks that are "too big to bail". Or to put it another way, in rescuing RBS, the government has mortgaged all our economic futures to the rehabilitation of this giant bank.

As for Lloyds, it became far and away the biggest retail bank in the UK when it was permitted to buy HBOS last autumn.

In fact, it only rescued battered HBOS because the deal offered a once-in-a-generation opportunity to become the unchallenged market leader in British retail banking.

So if the next government were to dismantle Lloyds, depriving it of its enormous share of the current-account, savings and mortgage markets, that would be a reputational disaster for Lloyds' management.

There are two further implications of Osborne's remarks: first, that he would privatise Northern Rock as an independent bank, rather than flogging it to another bank; second, that he would ask the City watchdog, the FSA, and the competition authorities to consider whether other big British banks should be broken up. Even those where taxpayers don't have a big stake.

In the City, where I am tapping out this blog, this is big stuff.

To do my normal thing of ramming home the bleedin' obvious, the opinion polls are currently saying Osborne will be the next chancellor. Which means that his ambitions for what our banks should look like after the spring of next year are at least as significant as the future plans of the current chancellor.

Labels: ramblings

posted by avilella # 1:10 PM 0 Comments

4/07/2009

EnsemblCompara "Back to the future": using phylogenetic information to help gene annotation

Here is an example success story of using phylogenetic information to improve human gene annotation. What do you see wrong in this EnsemblCompara GeneTree?

The human gene prediction has been split into one third to the left and two thirds to the right. Some of the other species have the full length prediction, but some of the 2x and projected genomes also have this issue. This case was reported to the Havana team at the Sanger and they have now built a human and mouse full-length prediction for the gene (notice the Havana_genes blue and dark green model):

The next Ensembl/Havana merge will hopefully reflect this change but, right now, you need to activate the Havana_genes DAS track to see the most up-to-date Havana annotation. There is a good number of these fixed now in the highly loved genomes, aka human, mouse and zebrafish. But there is a second level of genomes that are not getting any manual annotation here but may be annotated somewhere else...

Labels: ensembl, genomics

posted by avilella # 11:29 AM 0 Comments

Promoting Open Source Bioinformatics

Interesting to see all the buzz that the Benjamin Franklin Award has generated in the blogosphere, twittersphere, facebooksphere and any of the other spheres out there... I still think there is something we should try and resolve in open source bioinformatics, which is promoting Open Source software to create more awareness in the scientific community. We need to reconcile the promotion of modularity and generality with the fact that giving credit to the scientists who contribute to Open Source Bioinformatics software is still important. Projects that have built very modular and generic components may be doing a lot for the bioinformatics community at large but, at the same time, the less atomic and single-purpose your software is, the more difficult it is to publish it in a prominent scientific journal. The same goes for citing it in the downstream publications: very atomic programs are very successful in citation metrics, but infrastructure code is not. This means that well-designed, well-implemented and well-tested software is often not prominent enough for new people to notice, and too many bioinformaticians resort to their own glue code for building their bioinformatics infrastructure.

There wouldn't be anything wrong with rewriting your own code over and over again if it weren't because: (a) people spend too much time writing scaffolding code that will let them access what is really new and interesting in their project and (b) that code tends to be used and tested only internally and almost never reused for any other party unless it has been very well designed and documented --- hence the name scaffolding code.

There is a really good chance now to build an infrastructure that brings up a terminal next to the next generation Petabyte-size data sources, using emerging "cloud" technologies. These technologies are already advanced in other fields other than bioinformatics, so we can leverage what it has already been done for us and make extensive use of it. These terminals don't need to be silly, and the community should provide in them as much prebuilt code as possible so that the new breed of bioinformaticians get used to have this software at their fingertips.

A few years ago all the effort was in building packages for different Linux distributions, so that people could easily install Open Source software on their in-house CPU clusters. I think we need to shift gears now to cloud software accessibility. The good news is that it seems everybody is happy with the common Ubuntu system as a start. I fear the proliferation of iPhone-like SDKs around that will make the existing bioinformatics software useless. In an era where everybody is acutely aware about governments having to pour our taxes into infrastructure that was already paid for, noone will like to see all existing bioinformatics software become a "toxic" or "legacy asset"!

Labels: genomics, nextgen sequencing, open source

posted by avilella # 9:40 AM 0 Comments

4/06/2009

Genomes to come: wallaby

Tammar Wallaby - Wikipedia, the free encyclopedia

The Tammar Wallaby (Macropus eugenii), also known as the Dama Wallaby or Darma Wallaby, is a small member of the kangaroo family and is the type species for research on kangaroos and marsupials.

It is found on offshore islands on the South Australian and Western Australian coast. It is classified as vermin on Kangaroo Island, where it seasonally breeds in large numbers and damages the echidna habitat on the island.

Labels: ensembl, genomics

posted by avilella # 8:52 AM 0 Comments

4/02/2009

Quarterly Activity Report -- Hinxton Sequence Forum Wellcome Trust Genome Campus

A quarterly activity report for the different activities that take place under what can be considered "sequence" at the WTGC in Hinxton, Cambridge, UK:

HGNC has made great progress in solving nomenclatures for 130 cases where the community has a diversity of opinions and it's difficult to agree on something. Very good point in saying that in the Internet era it's better to give a gene a name that is distinctive to common words that would clobber your Google search results. There is now a forum set up for different communities to use in discussions for gene family names.
Havana now has started using RNAseq data to confirm new genes found in human and zebrafish that didn't have evidence before. One new feature is a "confirmed intron" for when paired Solexa reads bridge two exons, with an associated score for read depth. Confident this type of data will bring out many interesting new annotations that couldn't be found before, e.g. genes expressed in a given tissue during a lapse of a few hours in the development. There are already a few examples in zebrafish.
Wormbase has been working hard on compiling more data from the modENCODE. Small but cool infrastructure achievement in having VMware images running for old releases that investigators can just pull and bring up on demand.
Ensembl Genomes has been successfully testing the beta sites for Bacteria, Protists and the first Metazoa build. Another Metazoa build is in progress, with all the phylogenetics goodness of the 12 Drosophila genomes plus the vectors plus C.elegans and a few other outgroups. The modENCODE project is about to complete the re-annotation of gene models using CAGE data that will bring more precise gene starts for melano and elegans. Ensembl Genomes is still working together with Manchester and now the US to put together an Aspergillus resource that provides the best value for money to researchers. PombBase is also being pursued, lots of labs interested in having it Ensemblified and ready to use.

* This is a personal blog. Things said here are not to be taken as official reports.

Labels: ensembl, genomics

posted by avilella # 11:25 AM 0 Comments

3/31/2009

ape

dechronization: Hey R Users! Time to Update ape

The ape package written by Emmanuel Paradis is the foundation for phylogenetic analyses in R. Yesterday, Paradis and his coauthors posted a new version (3.2) on the CRAN archive yesterday. There don't seem to be too many new functions, but there are some important bug fixes. One these - preventing calculation of negative state probabilities when reconstructing discrete character states - solves one of the more vexing problems I've had with the ace function. You should definitely get the update if you're doing ancestral reconstruction of discretely coded traits! Now we just need to hope the April 17th upgrade to R 2.9 goes smoothly...

Labels: genomics, open source

posted by avilella # 4:36 PM 0 Comments

3/27/2009

Hooray for those 6 brave souls! :-)

When every student has a laptop, why run computer labs? - Ars Technica

According to the school's Information Technology & Communication department, 3,117 freshmen enrolled in 2007, and 3,113 of them owned their own computer. Nearly all of the machines were laptops, with 72 percent running Windows and 26 percent running Mac OS X (six hardy souls ran Linux).

Labels: open source

posted by avilella # 5:36 PM 0 Comments

3/26/2009

Jalview Google Summer of Code 2009

(Shameless plug of the day)

There is an opportunity for students to propose a Jalview
related software development project as part of the Google Summer of
Code this year, with an stipend of $4500. This project would be
supported by the NESCent mentor organisation, and ideally improve
Jalview's phylogenetic analysis capabilities, enhance the applet's use
as an AJAX web gui component, and/or extend its visualization and
editing capabilities for use as a curation tool.

The application period for student proposals is rapidly approaching, and
its important to discuss proposals with mentors before submission. If
you or anyone you know are interested, please read and/or forward the
message below, and look at the jalview project section on the following
(huge) URL:

http://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#Extend_Jalview_Alignment_visualization_tool

Regards.
Albert Vilella.
----

PHYLOINFORMATICS SUMMER OF CODE 2009

http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009

The Phyloinformatics Summer of Code program provides a unique
opportunity for undergraduate, masters, and PhD students to obtain
hands-on experience writing and extending open-source software for
evolutionary informatics under the mentorship of experienced
developers from around the world. The program is the participation of
the US National Evolutionary Synthesis Center (NESCent) as a
mentoring organization in the Google Summer of Code(tm) (http://
code.google.com/soc/).

Students in the program will receive a stipend from Google (and
possibly more importantly, a T-shirt solely available to successful
participants), and may work from their home, or home institution, for
the duration of the 3 month program. Each student will have at least
one dedicated mentor to show them the ropes and help them complete
their project.

NESCent is particularly targeting students interested in both
evolutionary biology and software development. Initial project ideas
are listed on the website. These range from hardware accerelation for
phylogenetic inference, to tree visualization within a wiki, to
alignment of next-gen sequencing data, to development of a reusable
ontology term markup module for biocuration. All project ideas are
flexible and many can be adjusted in scope to match the skills of the
student. We also welcome novel project ideas that dovetail with
student interests.

TO APPLY: Apply online at the Google Summer of Code website (http://
socghop.appspot.com/), where you will also find GSoC program
rules and eligibility requirements. The 12-day application period
for students opens on Monday March 23rd and runs through Friday,
April 3rd, 2009.

INQUIRIES: phylosoc {at} nescent {dot} org. We strongly encourage all
interested students to get in touch with us with their ideas as early
on as possible.

2009 NESCent Phyloinformatics Summer of Code:
http://hackathon.nescent.net/Phyloinformatics_Summer_of_Code_2009

Google Summer of Code FAQ:
http://socghop.appspot.com/document/show/program/google/gsoc2009/faqs

Cyberinfrastructure Traineeships (managed separately from GSoC;
postdocs also eligible):
http://hackathon.nescent.org/
Cyberinfrastructure_Summer_Traineeships_2009

To sign up for quarterly NESCent newsletters: http://www.nescent.org/
about/contact.php

---------

Todd Vision and Hilmar Lapp
National Evolutionary Synthesis Center
http://nescent.org

Labels: genomics, nextgen sequencing

posted by avilella # 11:45 AM 0 Comments

3/24/2009

Sugarcane sugar content

Sugarcane genes associated with sucrose content. [BMC Genomics. 2009] - PubMed Result

Sugarcane genes associated with sucrose content.

Papini-Terzi FS, Rocha FR, Vencio RZ, Felix JM, Branco DS, Waclawovsky AJ, Del Bem LE, Lembke CG, Costa MD, Nishiyama MY Jr, Vicentini R, Vincentz MG, Ulian EC, Menossi M, Souza GM.

ABSTRACT: BACKGROUND: Sucrose content is a highly desirable trait in sugarcane as the worldwide demand for cost-effective biofuels surges. Sugarcane cultivars differ in their capacity to accumulate sucrose and breeding programs routinely perform crosses to identify genotypes able to produce more sucrose. Sucrose content in the mature internodes reach around 20% of the culms dry weight. Genotypes in the populations reflect their genetic program and may display contrasting growth, development, and physiology, all of which affect carbohydrate metabolism. Few studies have profiled gene expression related to sugarcanes sugar content. The identification of signal transduction components and transcription factors that might regulate sugar accumulation is highly desirable if we are to improve this characteristic of sugarcane plants. RESULTS: We have evaluated thirty genotypes that have different Brix (sugar) levels and identified genes differentially expressed in internodes using cDNA microarrays. These genes were compared to existing gene expression data for sugarcane plants subjected to diverse stress and hormone treatments. The comparisons revealed a strong overlap between the drought and sucrose-content datasets and a limited overlap with ABA signaling. Genes associated with sucrose content were extensively validated by qRT-PCR, which highlighted several protein kinases and transcription factors that are likely to be regulators of sucrose accumulation. The data also indicate that aquaporins, as well as lignin biosynthesis and cell wall metabolism genes, are strongly related to sucrose accumulation. Moreover, sucrose-associated genes were shown to be directly responsive to short term sucrose stimuli, confirming their role in sugar-related pathways. CONCLUSION: Gene expression analysis of sugarcane populations contrasting for sucrose content indicated a possible overlap with drought and cell wall metabolism processes and suggested signaling and transcriptional regulators to be used as molecular markers in breeding programs. Transgenic research is necessary to further clarify the role of the genes and define targets useful for sugarcane improvement programs based on transgenic plants.

Labels: genomics

posted by avilella # 1:08 PM 0 Comments

3/23/2009

Laura Clarke - 1000 Genomes project

Depositing every week same amount of data as all was in public dbs before (past ~20-30 years).

Data coordination (EBI Paul Flicek, Laura+Zam). Keep submissions, run QCs and recalibration, present in mini Ensembl browser. Working on the Resembl (Solexa) public release, need to implement as MySQL 5.1 partitioning instead of commercial db.

Pipeline -- first align to the genome; will move to new assembly soon. 454 with ssaha, Solexa here with MAQ.

Trios data now being churned at dbSNP -- causing dbSNP more churning than usual releases, they are catching up. Low coverage also submitted, but will probably be in dbSNP 131.

Data formats: Fastq / BAM (binary SAM alignment map format) / GLF (genotype likelihood format). BAMs/GLFs will be updated as more data gets in and old ones will disappear.

Hope all the sequencing will be done by the end of 2009. Paper about pilot projects soon. Targetted sequencing (pilot 3) took more time, pull-down methods a bit longer to nail down, now working.

DCC more automated data delivery systems. Standard QC/Recalibration pipeline. Other high throughput analyses. New staff. May take over the alignment process once the alignment algorithm is consensuated.

Jim Stalker and Thomas Keane doing a lot of work at the Sanger. Eugene Kulesha and Stephen Keenan on the website work. Fiona and Yuan on calling/storing/presenting SNPs in Ensembl.

Labels: ensembl, nextgen sequencing

posted by avilella # 10:30 AM 0 Comments

3/21/2009

Browse the NCBI Short Read Archive by taxonomy

Here is the link.

Labels: nextgen sequencing

posted by avilella # 10:03 AM 0 Comments

3/20/2009

Next Generation community resources for Next Generation Sequencing

I've been looking at SEQanswers lately to try and discern where is people drawing the "Here be dragons" of their research. A lot of it is about wet lab protocols, which is great news, because it means the discoveries in different labs are shortcutting publication delays and being adopted as soon as possible. But there is also a lot of data analysis discussions which is great to identify the needs for specific bioinformatics tools in the outside world. It is becoming increasingly important in new emerging IT fields to know what *not* to work on other than what to work on, and forums like SEQanswers are great for this.

Great seeing people using this forum and not being afraid of showing results --- which is also showing muscle sometimes, but all good and fair. I wonder how much of this is known by lab bosses and how much of potentially old generation bosses understand of these Internet open community practices...

Labels: genomics, nextgen sequencing

posted by avilella # 10:30 AM 0 Comments

Use copy+paste

My surname is Vilella which is kind of the diminutive of Vila, or Ville, or littletown. This last 7 days I've had a badge and stickers given with the spellings:

Viella
Villela

Given the late confusion, I recommend everyone that has to deal with my surname to copy+paste it from somewhere else. Here a few you can use:

Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella.

posted by avilella # 9:11 AM 0 Comments