castanyes blaves

Random ramblings about some random stuff, and things; but more stuff than things -- all in a mesmerizing and kaleidoscopic soapbox-like flow of words.

9/27/2009

 

Paul Krugman explains cap-and-trade

The textbook economics of cap-and-trade - Paul Krugman Blog - NYTimes.com

I realized, after the last post, that it might be useful to write down just what the Econ 101 version of cap and trade looks like; as it happens, this also helps explain the intellectual sins of Glenn Beck and Martin Feldstein.

So here we go. Bear in mind that something like what follows can be found in just about every intro textbook.

Think of the benefits to the private sector from pollution. Yes, benefits — in the sense that it’s cheaper to pollute than not to, or that it’s easier to produce goods if you don’t worry about whatever emissions result as a byproduct. So we can think of drawing a curve representing the private marginal benefit of emissions, as in this figure:

DESCRIPTION

In the absence of government action, the private sector will increase emissions up to the point where there is no further marginal benefit. That is, emissions will rise to whatever level is implied by profit-maximization, paying no attention to the effects on the environment.

A cap-and-trade system puts a limit on overall emissions, so that emitters have to pay a price for emitting. This price will, as shown in the figure above, equal the marginal benefit of the last unit of emissions allowed.

Now, the cost to the economy of this limit is the benefit the private sector would have gotten by emitting more than is allowed under the cap. It’s shown in the figure as the red triangle labeled “deadweight loss”. CBO puts these losses under Waxman-Markey at 0.2-0.7 percent of GDP in 2020, 1.1 to 3.4 percent in 2050. These costs have to be set against the environmental benefits.

In addition to this overall economic cost, there’s a distributional effect. The creation of cap and trade means that emission permits command a market price, and the value of these permits — the blue rectangle — goes to someone. Under Waxman-Markey, some of it (a growing fraction over time) would be captured by the government through auctions, and used to cut or avoid increases in other taxes — in effect, recycled to consumers. The rest would be passed on to industry — but because the biggest recipients would be regulated utilities, much of this would also be passed on to consumers.

OK, now let’s send in Beck and Feldstein.

Beck got his number from someone who learned about a guesstimate of what the auction value of permits might be (way higher than current estimates, by the way), divided by the number of households, and proclaimed this the cost of the bill. In effect, he looked at a guess about the size of the blue rectangle, which does not represent an economic cost, and called that the cost to the economy.

In a way, though, what Martin Feldstein did was worse. He took the CBO’s estimate of “compliance costs”, which was $1600 per household in an early report (it’s now down to $900, but who’s counting?), and implied that this was the economic cost of the legislation. But “compliance costs” are basically the sum of the blue rectangle and the red triangle; the true economic costs are just the triangle, and are much smaller.

Another way to say this is that under the Feldstein method, any time you try to correct an externality, which necessarily means changing relative prices, all of the negative effects of the price change will be counted as a cost — but none of the positive effects will be counted as a benefit.

Bad stuff. And what you should bear in mind is that all I’m doing here is conventional neoclassical economics, quite literally basic textbook material. What does it say when the people who claim to believe in this stuff throw it out the window as soon as it leads to policy conclusions they don’t like?




8/06/2009

 

Caption competition

BBC - Magazine Monitor: Caption Competition
Caption Competition
12. At 12:31pm on 06 Aug 2009, LaurenceLane wrote:

The PGA have been urged to rule on the use of polyurethane suits by spectators, even if rain has been forecast.


12:00 UK time, Thursday, 6 August 2009

cap.595.jpg




Labels:


7/02/2009

 

What a NGS IT person should be skilled in

PLoS Computational Biology: Managing and Analyzing Next-Generation Sequence Data
The skills necessary within the Facility include the following.

1.

An intimate knowledge of UNIX-based operating systems.
2.

Understanding of a scripting language such as Perl.
3.

An understanding of parallel computing environments for UNIX clusters.
4.

Knowledge of network-based data storage.
5.

General knowledge of biology and genome sciences.
6.

Ability to derive data analysis and software requirements from investigators who do not have a sophisticated understanding of information technology.
7.

Ability to develop software encapsulating new analysis methods.
8.

Understanding of relational databases and database architecture.
9.

Ability to seek out and test novel bioinformatics software and analysis routines.


Labels: , ,


6/24/2009

 

Lamprey Genome rearrangements

DNA Jettisoned From Lamprey Genome During Development | GenomeWeb Daily News | Sequencing | GenomeWeb
Amemiya and his co-workers became suspicious that the lamprey's genome structure and composition was changing during development when they heard rumors lamprey genome sequences efforts were being complicated by genome fragmentation. They speculated that this might be due to genome rearrangements similar to those described for the hagfish, a chordate and superficially similar organism.

To test this, the researchers compared germ line and somatic tissues from sea lamprey caught in Lake Michigan.

Indeed, they found that the genome was larger in sperm (germ line cells) than in adult blood nuclei (somatic cells), even within the same individual. The sperm cells also contained more DNA than kidney and liver cells, which both had similar DNA content to red blood cells. Overall, the researchers noted, sperm genomes contained some 20 percent more DNA than adult cells such as red blood cells.


Labels:


6/12/2009

 
Life After GWAS: For Some Researchers, Focus Shifts to Rare Variants, CNVs | GenomeWeb Daily News | Sequencing | GenomeWeb
Over the last several years, genome-wide association studies have become the primary method for identifying variations associated with human disease, but the approach has shortcomings that are leading some in the genomics community to push more aggressively into the post-GWAS era.

At Cambridge Healthtech Institute's Genomic Tools and Technologies Summit held here this week, many speakers noted that even though GWA studies have linked hundreds of common SNPs to disease, these variants account for only a very small portion of disease heritability, which has raised doubts over their clinical value. A number of talks focused on two key alternatives to GWAS: the discovery of rare variants, as opposed to common variants, with a role in disease; and an increasing focus on copy number variants rather than SNPs.
Life After GWAS: For Some Researchers, Focus Shifts to Rare Variants, CNVs | GenomeWeb Daily News | Sequencing | GenomeWeb
"GWAS was never meant to substitute for fine genomic sequencing," but rather to identify regions of linkage disequilibrium in the genome that warrant further study
Life After GWAS: For Some Researchers, Focus Shifts to Rare Variants, CNVs | GenomeWeb Daily News | Sequencing | GenomeWeb
Lupski said that efforts like the 1000 Genomes Project will likely produce valuable information that will drive improvements in the use of sequencing for CNV detection. "It's coming along," he said. "I think this will be solved."



5/26/2009

 

Ubuntu trick -- how to reset evolution email

gconftool-2 --recursive-unset /apps/evolution

Labels:


5/21/2009

 

Gene by gene turns genome-by-genome

This is a good example of the kind of paper we are probably going to see more and more often in the future:

Gene discovery using massively parallel pyrosequencing to develop ESTs for the flesh fly Sarcophaga crassipalpis
http://www.ncbi.nlm.nih.gov/pubmed/19454017

Get (a) a certain amount of RNAseq reads for your species "X", (b) build as many full-length cDNAs as possible from the fragments and (c) compare against close species in terms of:
  • New interesting cDNAs that don't have hits against existing public cDNAs -- What do they do?
  • Expression patterns -- Are these different to the patterns in other close species?
  • Protein coding evolution -- run pairwise dNdS against closest genome or tree-based dNdS against an existing phylogeny [1,2] -- Does anything show up in a Gene Ontology enrichment analysis?
Then the data is published and stored in a publicly available database, and can be added to the pool to compare against for the next project. Iterate :-)

It used to be gene-by-gene sequencing and it's now transcriptome by transcriptome sequencing. There are still sequence error and sequence coverage issues: one of my first scientific mentors, Prof. Montserrat Aguade was one of the first to do gene sequencing on the Adh gene in Drosophila when doing her PhD in Harvard. People then extended Adh sequencing and analysis to other Drosophila species, then other clades, then other genes, then some gene families like odorant binding proteins for a bunch of Drosophila species or populations, or gene pathways like the insulin pathway, etc.

But now we have a much broader picture with a rather complete transcriptome. And most of the sequencing issues are going to be corrected across the phylogeny in pretty much the same way as allele imputation is filling the gaps at the population genomics level (e.g. 1000 Genomes Project).

I am very excited about all this!

Labels: , ,


5/11/2009

 

Anolis carolinensis: First reptile in Ensembl -- Amonida Sadissa

  • Very few anolis proteins and cDNAs, many more ESTs.
  • Used Uniprot PE Evidence ranking to generate transcript models with genewise
  • Different parameters for Exonerate, including exhaustive option for cDNAs, including 31000 chicken set (which wasn't very useful in the end)
  • Chris Ponting group provided extra models and kill list to rename some retrotransposons and pseudogenes
  • Manually looked at some of the EST genes in Chris' list to reincorporate them in the main db
Word of advise to all Genome Sequencing Centers out there: now that RNAseq is cheap and powerful, please allocate some of your budget for that instead of spending all in genomic sequencing. Contact Sanger people for PCR-free sample preparation protocols, which makes a huge difference in terms of avoiding duplicity.


Labels: , ,


5/07/2009

 

FC Barcelona 1991 and 2009 -- find the similarities

http://www.youtube.com/watch?v=6OHYAMG5RTk (jump to 1:00)
http://www.youtube.com/watch?v=1-4NpWO4ObU




The guy who jumps to celebrate with the wet coat was a very young Guardiola as a player, yesterday he jumped to celebrate at the same spot as a coach... now in a suit and with much less hair...

Labels:


5/05/2009

 

Drug combinations, gene combinations and cancer -- Sven Nelander

First seminar of the Systems Biology series at the EBI. This series
starts with a strong focus on the modeling side of Systems Biology,
but with the idea of extending it to other subfields.

Title: Drug combinations, gene combinations and cancer
Speaker: Dr Sven Nelander
Affiliation: Goetebourg University
Date & Time: Tuesday 5th May 2009;  14.00-15.00
Location: C202-3, Shared facilities
Host:  Mikhail Spivakov

There is no rational combination theory for different anticancer drugs
so far. One anticancer drug for one step in the pathway, but no
interrelations described.

Increasing number of genotype to phenotype pairs of data sets: what is
the system in the middle?

TCGA with 200 ovarian tumors, first data released last week. Amazing
data production and integrative bioinformatics, but space for more
modeling.

Example: CoPIA

CNV profiles -- transcriptional network -- final mRNA profiles

Now we have 10 million datapoints that with a fully automated
procedure give a testable hypotheses: 3 of the hub (pleiotropic) genes
are not previously implicated in glioma. GO enrichment analysis makes
sense.

PhD student - Theresia Dahl
Peter Gennemark -- mathematical models
Ulrike Nuber
Chris Sander -- old boss
Linda Karlsson-Lindahl
Debora Marks
Niki Schultz
Bodil Nordlander -- now testing one of the new hub genes


Labels: ,


5/01/2009

 

NCBI SRA blastn service

Never easier before to check your sequence against the NCBI Short Read Archive database:

NCBI SRA BLAST

First thoughts:
  • Transcriptome coverage is hugely biased to the 3' end (or 5' depending on library preparation). A lot more than I suspected.
  • Would be great to do queries for phylogenetic subclades: e.g. my human sequence against all SRA data for primates.
  • A lot of the 454 data has homopolymer issues, mostly TTT[...]TTTs but also some others:
Query  465  GGGCCTTGACAAAGTGTAAACCGCATGGATGGGCTTCCCC-AAGGATTTATTGACATTGC  523<br /><font color="#ff0000"><b>Sbjct</b></font>  249  ........................................<font color="#ff0000"><b>C</b></font>...................  190<br /><br /></pre><ul><li>Some of these (unless they are real variations) get picked up as mismatches, some as indels:<pre>Query  1    CGGCAAGGTATGTGCGTGATTTTGGGCCCACGTGTATTTCCATTAATTTT-AAGCCGTAA  59<br /><font color="#ff0000"><b>Sbjct</b></font>  224  ..................................................<font color="#ff0000"><b>T</b></font>.........  165<br /><br />Query  60   TTGTCGTTTTTGGCGGTTTCGAGTTGAACTGCGTTAGTCCGTGCGCTGTTCGCAAGTGTG  119<br /><font color="#ff0000"><b>Sbjct</b></font>  164  ..........<font color="#ff0000"><b>C</b></font>.................................................  105<br /></pre></li></ul><pre><br />Query  61   TGTCGTTTTTGGCGGTTTCGAGTTGAACTGCGTTAGTCCGTGCGCTGTTCGCAAGTGTGC  120<br /><font color="#ff0000"><b>Sbjct</b></font>  118  .....<font color="#ff0000"><b>C</b></font>......................................................  177<br />Query  61   TGTCG-TTTTTGGCGGTTTCGAGTTGAACTGCGTTAGTCCGTGCGCTGTTCGCAAGTGTG  119<br /><font color="#ff0000"><b>Sbjct</b></font>  160  .....<font color="#ff0000"><b>T</b></font>...............................<font color="#ff0000"><b>-</b></font>......................  102<br />Query  61   TTAATTTTAAGCCGTAATTGTCGTTTTTGGCGGTTTCGAGTTGAACTGCGTTAGTCCGTG  120<br /><font color="#ff0000"><b>Sbjct</b></font>  181  ...........................<font color="#ff0000"><b>C</b></font>................................  122<br /><br /><br /><br />


Labels: ,


 

A customized and versatile high-density genotyping array for the mouse -- Gary Churchill, The Jackson Laboratory, USA

Microarray based genotyping is an inexpensive and powerful tool to characterize genetic variation. High-density genotyping microarrays are commercially available for humans, economically important livestock and model organisms. However, they have not been available previously for the laboratory mouse, the premier mammalian model organism for biomedical research. Here we describe a custom high-density mouse genotyping array. The Mouse Diversity array was designed to capture the known genetic variation present in the laboratory mouse. It contains 623,124 SNPs distributed across the 19 mouse autosomes, the sex chromosomes, and the mitochondria with a median spacing of one SNP every 1,411 bp in the nuclear genome. The array also contains 916,269 invariant probes that are targeted to functional elements and regions of the genome known to harbor segmental duplications. The nature of the probes opens the door to a variety of novel applications including the characterization of copy number variation, allele specific gene expression and DNA methylation. Performance of the array based on call rate, replication and concordance with previously known genotypes is exceptional. The content-rich Mouse Diversity array provides a critical new tool for mouse genetics including the possibility of extending the successes of genome-wide association studies in humans to the mouse.

Funny comment -- This may be the last chip we do. The economics tell us the line is still below for chips, but sequencing is getting cheaper.

History: people catching mice, trading them, etc. Bottlenecks and all sort of artificial effects.

Diversity 11 classical inbred strains: some chromosomal regions with extremely low diversity. Is this petness? Longevity/Fecundity?

Problem with ascertainment bias: 623124 phylogenetically informative SNPs with known ascertainment

Collaborative Cross -- 8 M. musculus lines -- each contributing equally to the final "line". All inbred by now.

Phenotypes of intermediate CCs: e.g. voluntary exercise goes from 0 miles per day to 18 miles per day.

With 2 different inbred parents, we get complex children but with theoretically predictable phenotype, which means doing GWAS with phenotypes "a la carte".

Also, just by mixing genomes, creating diversity that was not in the parents: very useful novel phenotypic diversity.

Resolution is 7x better. CC will not be GWAS-level, but on the order of 10 genes or MB level resolution. Possibly gene level resolution in 10 generations. Always a mapping resolution panel and a validation panel, going back to the inbred lines.

Selection strictly by random number. Maintaining the diversity is good, lucky because natural selection already took a toll on the original strains.

Done in a way to maximise diversity, not to mimic human population structure. Deep reservoir of diversity for studying phenotypes.

A-Male/B-female crosses and compare to A-Female/B-Male in terms of sex-related epigenetics and other studies.

Some strains will die out along the way, but in 5-10 years should get a lot of info out of it.

Hyuna Yang a lot of array work.
David Aylor pop.struct.

Published mouse distances:

Mus cervicolor - Mus crociduroides = 7.60 MYA
Mus cervicolor - Mus haussa = 6.60 MYA
Mus cervicolor - Mus indutus = 6.60 MYA
Mus cervicolor - Mus mattheyi = 6.60 MYA
Mus cervicolor - Mus minutoides = 6.60 MYA
Mus cervicolor - Mus musculoides = 6.60 MYA
Mus cervicolor - Mus musculus = 4.80 MYA
Mus cervicolor - Mus pahari = 7.60 MYA
Mus cervicolor - Mus platythrix = 7.10 MYA
Mus cervicolor - Mus setulosus = 6.60 MYA
Mus cervicolor - Mus spretus = 4.80 MYA
Mus crociduroides - Mus haussa = 7.60 MYA
Mus crociduroides - Mus indutus = 7.60 MYA
Mus crociduroides - Mus mattheyi = 7.60 MYA
Mus crociduroides - Mus minutoides = 7.60 MYA
Mus crociduroides - Mus musculoides = 7.60 MYA
Mus crociduroides - Mus musculus = 7.60 MYA
Mus crociduroides - Mus pahari = 3.40 MYA
Mus crociduroides - Mus platythrix = 7.60 MYA
Mus crociduroides - Mus setulosus = 7.60 MYA
Mus crociduroides - Mus spretus = 7.60 MYA
Mus haussa - Mus indutus = 3.20 MYA
Mus haussa - Mus mattheyi = 2.60 MYA
Mus haussa - Mus minutoides = 3.20 MYA
Mus haussa - Mus musculoides = 3.20 MYA
Mus haussa - Mus musculus = 6.60 MYA
Mus haussa - Mus pahari = 7.60 MYA
Mus haussa - Mus platythrix = 7.10 MYA
Mus haussa - Mus setulosus = 4.00 MYA
Mus haussa - Mus spretus = 6.60 MYA
Mus indutus - Mus mattheyi = 3.20 MYA
Mus indutus - Mus minutoides = 2.50 MYA
Mus indutus - Mus musculoides = 2.50 MYA
Mus indutus - Mus musculus = 6.60 MYA
Mus indutus - Mus pahari = 7.60 MYA
Mus indutus - Mus platythrix = 7.10 MYA
Mus indutus - Mus setulosus = 4.00 MYA
Mus indutus - Mus spretus = 6.60 MYA
Mus mattheyi - Mus minutoides = 3.20 MYA
Mus mattheyi - Mus musculoides = 3.20 MYA
Mus mattheyi - Mus musculus = 6.60 MYA
Mus mattheyi - Mus pahari = 7.60 MYA
Mus mattheyi - Mus platythrix = 7.10 MYA
Mus mattheyi - Mus setulosus = 4.00 MYA
Mus mattheyi - Mus spretus = 6.60 MYA
Mus minutoides - Mus musculoides = 1.60 MYA
Mus minutoides - Mus musculus = 6.60 MYA
Mus minutoides - Mus pahari = 7.60 MYA
Mus minutoides - Mus platythrix = 7.10 MYA
Mus minutoides - Mus setulosus = 4.00 MYA
Mus minutoides - Mus spretus = 6.60 MYA
Mus musculoides - Mus musculus = 6.60 MYA
Mus musculoides - Mus pahari = 7.60 MYA
Mus musculoides - Mus platythrix = 7.10 MYA
Mus musculoides - Mus setulosus = 4.00 MYA
Mus musculoides - Mus spretus = 6.60 MYA
Mus musculus - Mus pahari = 7.60 MYA
Mus musculus - Mus platythrix = 7.10 MYA
Mus musculus - Mus setulosus = 6.60 MYA
Mus musculus - Mus spretus = 2.30 MYA
Mus pahari - Mus platythrix = 7.60 MYA
Mus pahari - Mus setulosus = 7.60 MYA
Mus pahari - Mus spretus = 7.60 MYA
Mus platythrix - Mus setulosus = 7.10 MYA
Mus platythrix - Mus spretus = 7.10 MYA
Mus setulosus - Mus spretus = 6.60 MYA


Labels: , ,


4/30/2009

 

RNAseq in the worm -- Gary Williams

Illumina Short-read transcriptome data has the potential to help solve many problems with curating gene models and the genomic sequence in C. elegans. This is an initial look at the data and some examples of how it can be used.

So far, C. elegans gene predictions:

36% - fully confirmed by ESTs
48% - partially confirmed
14% - no transcript confirmation

RNAseq data -- different worms than the genome, so some polymorphisms expected -- 200bp inserts, 36bp paired end reads

MAQ or cross-match to genomic or transcript sequences

6137 new splice junctions (6% increase)

Jumped from 70000 to 98000 splice junctions.
3x as many polyA sites
80 possible new coding genes

V-shaped coverages -- validation against traces, then:
  • Detected sequencing error, correction needed for the reference
  • Detected alternative haplotype
Moving towards single-cell sequencing -- not sequencing in tiny cells but sequencing each cell in each developmental state in the worm. Moving towards RNA sequencing C. briggsae and C. remanei.

Updated gene builds will be given to other projects. Next Ensembl Metazoa comparative genomics build may already have the modENCODE-updated C. elegans and D. melanogaster builds.

Labels: , , ,


4/28/2009

 

Peston on makein banks safe

BBC - Peston's Picks: Making banks safe
For what it's worth, there are two reasons why it might make sense to force our biggest and most complex banks to hold more capital than their smaller, simpler peers: if big super-banks have the privilege of knowing that we as taxpayers would always bail them out in a crisis, surely they've got to put in place treble protection against the risk that they'd call on us for such help; also the costs of holding the extra capital might encourage them to slim down and simplify their operations.


Labels:


 

The Molecules and Mechanisms of Instinctive Behaviour in Mammals -- Darren Logan, The Scripps Research Institute, La Jolla USA

Abstract

Social or behavioural disorders affect a quarter of individuals at some time during their lives however the molecules and mechanisms that mediate social cues, process their meaning, and initiate the corresponding behaviour are unknown. Instinctive social behaviours in mammals are thought to be largely promoted by pheromones: specialized olfactory cues secreted by one animal that directly influence the behaviour of another.  Here I will describe studies into two instinctive, olfactory mediated behaviours in mice, aggression and pup suckling. 

Our studies found that aggression is promoted by specific protein pheromones excreted in male urine. These activate specialized, finely tuned sensory neurons in the noses of other males, resulting in a robust aggressive behaviour in the recipient. Our genomic and functional characterization of the gene family encoding these pheromones reveals an extraordinary scope for information-coding. I will describe our recent efforts to elucidate their social significance using cellular and behavioural techniques.

Pup suckling is a behaviour that is found in all mammals and is thought to be promoted by pheromones emitted by the mother and detected by the infant. We found that newborn mice do use maternal odour cues to promote suckling but, in contrast to the aggression pheromones, these cues are not genetically predetermined to elicit behaviour. Instead, the cues are complex, variable and learned by pups around birth. Suckling is subsequently initiated when the pup recognizes the same odour pattern in the context of their mother's nipple. The sensory neurons that mediate this are not specialized and found in the noses of all mammals, including humans.

Together these studies demonstrate a diversity of mechanisms and molecules that underlie instinctive behaviours, and are a first step towards understanding the neural circuitry of social interaction.


Labels:


4/27/2009

 

Giving functional genomics a REST -- Alex Bruce

http://www.ensembl.org/Homo_sapiens/Transcript/ProteinSummary?db=core;g=ENSG00000084093;r=4:57468799-57493097;t=ENST00000309042

interesting case where a small skipping exon generates an extra copy for the Znf-C2H2 domain.

REST is an essential vertebrate transcription factor with very diverse roles. It has an important role in regulatory secretory pathway. Independently confirmed by 2 other groups.

RE1 array used to identify misregulated REST target genes in diseases like Huntington's.

RE1 "half" sites. Canonical/Transfac/Discovered motifs.

Different evolutionary pressures over RE1 sites seem to be associating with different function subsets: common sites are less well conserved than unique sites. Unique sites need to be tissue specific, so they are bound to keep a general binding weakness to turn off binding in non-specific tissue (if I correctly understand?).

Solexa sequencing quite good in identifying high affinity motifs, but poor at low affinity motifs.

There is an in vivo hierarchy between RE1 for REST binding and it can be discriminated at the DNA sequence level.

Labels: ,


4/24/2009

 
PLoS Computational Biology: Polymorphism Data Can Reveal the Origin of Species Abundance Statistics
Polymorphism Data Can Reveal the Origin of Species Abundance Statistics



4/23/2009

 

Saint George was an ecocriminal



Here are different images of Saint George caught in the act of killing an endangered species, a mythical dragon, that has been extinct since then...


 

Phenotype data in Ensembl -- Fiona Cunningham

European Genotype Archive: Genome Wide Association Studies (GWAS) like WTCCC data and others. Only public information is public available under very strict rules.

NHGRI GWAS will be imported in Ensembl: it's got manually curated data of high quality.

Links in Variation view: link "Phenotype Data (n)"

http://www.ensembl.org/Homo_sapiens/Variation/Phenotype?source=dbSNP;v=rs420259

Diagnostic testing: situation will now improve in Ensembl

Locus specific databases (LSDBs): p53, ABO, collagen, albinism, cystic
fibrosis, Altzheimer's disease, ... >700

The main aim is to be able to link the reference CDS sequence used by the biomedical community to the most up-to-date reference sequence in the genomics community. This mapping will allow clinicians to link all phenotype data on their end to the genomic data in the genomic community.

Political pros and cons have to be carefully handled and continuously explained. Ensembl openness, existing infrastructure and visibility is the biggest selling point to have these dbs linked in a common LSDB resource.

Website here will have LRG XML files and prettified HTML reports soon.

Labels: , ,


 

Ensembl Quality Checking -- Michael Schuster

All Ensembl gene predictions for all vertebrate species are based on experimental evidence:

  1. UniprotKB/Swiss-Prot
  2. NCBI RefSeq proteins and mRNAs
  3. UniprotKB/TrEMBL
  4. EMBL Nucleotide Sequence Archive

Aligning the evidences back to the gene prediction with Exonerate. Types of alignment results:

  • perfect
  • added start
  • longer region
  • missing start
  • non-matching start
  • non-matching region
  • shorter region
  • ...

Exonerate has an exhaustive mode that takes a lot more time but fixes some of the mini-intron and
mini-exon issues that sometimes occur. Exonerate cdna2genome is very useful for quality checking.

Genebuild now uses head-to-head alignments of genewise and exonerate, and takes the best in each case.

Some cases are still difficult to get right with algorithmic solutions: this is were the curators are needed.

Labels: , , ,


4/14/2009

 

Alzheimer -- BBC

BBC NEWS | Health | Drug offers hope on Alzheimer's

A new drug which shows promise as a treatment for Alzheimer's disease has been developed by UK scientists.

The Proceedings of the National Academy of Sciences reports the drug, CPHPC, removes a protein thought to play a key role in Alzheimer's from the blood.

Tests at the University College London found the protein also disappeared from the brains of five Alzheimer's patients given the drug for three months.

Longer and larger scale clinical studies are now being planned.



Labels:


4/08/2009

 

Paul Preston on Breaking up banks

The issue here is how to handle financial globalization: bigger UK banks are good to make them well-positioned internationally, which one could argue is good for the UK financial system but, smaller UK banks are good to the retail consumer as pointed in the blog post. What is best? Ensure that big UK banks can compete internationally in making big deals and acquisitions? Or make sure that smaller UK banks are competing against each other within the UK and providing the best retail value?

Same happened with the wave of water, energy and IT privatizations in the last 10 years in Europe. Every country was playing a double game: trying to promote internal competition and avoid national monopolistic practices but also trying to beef up their national water/energy/IT company so that it could be well-positioned to acquire other European, South American, Asian companies...


BBC - Peston's Picks: Tories to break up banks?
Tories to break up banks?

Robert Peston | 12:40 UK time, Wednesday, 8 April 2009

Royal Bank of Scotland and Lloyds TSB could be dismantled after the next election, if the Tories form the government.

George OsborneHere's why I say that, in the form of excerpts from a speech that's just been delivered by George Osborne, the Shadow chancellor.

"We cannot allow one part of our economy to behave in a way that puts the rest of the economy at risk when it fails. We need to think deeply about whether we can sustain banks that are not only too big to fail, but potentially too big to bail.

By dint of its substantial shareholdings the government has a powerful influence over the future structure of the UK banking industry, whether it likes it or not.

When the time comes to sell off those shareholdings we need to think very carefully before simply selling them to the highest bidder without thinking through the consequences for the wider economy.

We should look at whether Britain in fact needs smaller banks.

For it would be a bitter irony if we came out of this crisis with a banking system that was even more concentrated and even riskier than the one we had before it."

The background to these remarks is that Royal Bank's balance sheet is considerably bigger than the total output of the British economy and it liabilities are considerably great than the entire public-sector debt of the UK.

Hence Osborne's allusion to banks that are "too big to bail". Or to put it another way, in rescuing RBS, the government has mortgaged all our economic futures to the rehabilitation of this giant bank.

As for Lloyds, it became far and away the biggest retail bank in the UK when it was permitted to buy HBOS last autumn.

In fact, it only rescued battered HBOS because the deal offered a once-in-a-generation opportunity to become the unchallenged market leader in British retail banking.

So if the next government were to dismantle Lloyds, depriving it of its enormous share of the current-account, savings and mortgage markets, that would be a reputational disaster for Lloyds' management.

There are two further implications of Osborne's remarks: first, that he would privatise Northern Rock as an independent bank, rather than flogging it to another bank; second, that he would ask the City watchdog, the FSA, and the competition authorities to consider whether other big British banks should be broken up. Even those where taxpayers don't have a big stake.

In the City, where I am tapping out this blog, this is big stuff.

To do my normal thing of ramming home the bleedin' obvious, the opinion polls are currently saying Osborne will be the next chancellor. Which means that his ambitions for what our banks should look like after the spring of next year are at least as significant as the future plans of the current chancellor.


Labels:


4/07/2009

 

EnsemblCompara "Back to the future": using phylogenetic information to help gene annotation

Here is an example success story of using phylogenetic information to improve human gene annotation. What do you see wrong in this EnsemblCompara GeneTree?


The human gene prediction has been split into one third to the left and two thirds to the right. Some of the other species have the full length prediction, but some of the 2x and projected genomes also have this issue. This case was reported to the Havana team at the Sanger and they have now built a human and mouse full-length prediction for the gene (notice the Havana_genes blue and dark green model):

The next Ensembl/Havana merge will hopefully reflect this change but, right now, you need to activate the Havana_genes DAS track to see the most up-to-date Havana annotation. There is a good number of these fixed now in the highly loved genomes, aka human, mouse and zebrafish. But there is a second level of genomes that are not getting any manual annotation here but may be annotated somewhere else...



Labels: ,


 

Promoting Open Source Bioinformatics

Interesting to see all the buzz that the Benjamin Franklin Award has generated in the blogosphere, twittersphere, facebooksphere and any of the other spheres out there... I still think there is something we should try and resolve in open source bioinformatics, which is promoting Open Source software to create more awareness in the scientific community. We need to reconcile the promotion of modularity and generality with the fact that giving credit to the scientists who contribute to Open Source Bioinformatics software is still important. Projects that have built very modular and generic components may be doing a lot for the bioinformatics community at large but, at the same time, the less atomic and single-purpose your software is, the more difficult it is to publish it in a prominent scientific journal. The same goes for citing it in the downstream publications: very atomic programs are very successful in citation metrics, but infrastructure code is not. This means that well-designed, well-implemented and well-tested software is often not prominent enough for new people to notice, and too many bioinformaticians resort to their own glue code for building their bioinformatics infrastructure.

There wouldn't be anything wrong with rewriting your own code over and over again if it weren't because: (a) people spend too much time writing scaffolding code that will let them access what is really new and interesting in their project and (b) that code tends to be used and tested only internally and almost never reused for any other party unless it has been very well designed and documented --- hence the name scaffolding code.

There is a really good chance now to build an infrastructure that brings up a terminal next to the next generation Petabyte-size data sources, using emerging "cloud" technologies. These technologies are already advanced in other fields other than bioinformatics, so we can leverage what it has already been done for us and make extensive use of it. These terminals don't need to be silly, and the community should provide in them as much prebuilt code as possible so that the new breed of bioinformaticians get used to have this software at their fingertips.

A few years ago all the effort was in building packages for different Linux distributions, so that people could easily install Open Source software on their in-house CPU clusters. I think we need to shift gears now to cloud software accessibility. The good news is that it seems everybody is happy with the common Ubuntu system as a start. I fear the proliferation of iPhone-like SDKs around that will make the existing bioinformatics software useless. In an era where everybody is acutely aware about governments having to pour our taxes into infrastructure that was already paid for, noone will like to see all existing bioinformatics software become a "toxic" or "legacy asset"!

Labels: , ,


4/06/2009

 

Genomes to come: wallaby

Tammar Wallaby - Wikipedia, the free encyclopedia

The Tammar Wallaby (Macropus eugenii), also known as the Dama Wallaby or Darma Wallaby, is a small member of the kangaroo family and is the type species for research on kangaroos and marsupials.

It is found on offshore islands on the South Australian and Western Australian coast. It is classified as vermin on Kangaroo Island, where it seasonally breeds in large numbers and damages the echidna habitat on the island.



Labels: ,


4/02/2009

 

Quarterly Activity Report -- Hinxton Sequence Forum Wellcome Trust Genome Campus

A quarterly activity report for the different activities that take place under what can be considered "sequence" at the WTGC in Hinxton, Cambridge, UK:

  • HGNC has made great progress in solving nomenclatures for 130 cases where the community has a diversity of opinions and it's difficult to agree on something. Very good point in saying that in the Internet era it's better to give a gene a name that is distinctive to common words that would clobber your Google search results. There is now a forum set up for different communities to use in discussions for gene family names.
  • Havana now has started using RNAseq data to confirm new genes found in human and zebrafish that didn't have evidence before. One new feature is a "confirmed intron" for when paired Solexa reads bridge two exons, with an associated score for read depth. Confident this type of data will bring out many interesting new annotations that couldn't be found before, e.g. genes expressed in a given tissue during a lapse of a few hours in the development. There are already a few examples in zebrafish.
  • Wormbase has been working hard on compiling more data from the modENCODE. Small but cool infrastructure achievement in having VMware images running for old releases that investigators can just pull and bring up on demand.
  • Ensembl Genomes has been successfully testing the beta sites for Bacteria, Protists and the first Metazoa build. Another Metazoa build is in progress, with all the phylogenetics goodness of the 12 Drosophila genomes plus the vectors plus C.elegans and a few other outgroups. The modENCODE project is about to complete the re-annotation of gene models using CAGE data that will bring more precise gene starts for melano and elegans. Ensembl Genomes is still working together with Manchester and now the US to put together an Aspergillus resource that provides the best value for money to researchers. PombBase is also being pursued, lots of labs interested in having it Ensemblified and ready to use.
* This is a personal blog. Things said here are not to be taken as official reports.

Labels: ,


3/31/2009

 

ape

dechronization: Hey R Users! Time to Update ape
The ape package written by Emmanuel Paradis is the foundation for phylogenetic analyses in R. Yesterday, Paradis and his coauthors posted a new version (3.2) on the CRAN archive yesterday. There don't seem to be too many new functions, but there are some important bug fixes. One these - preventing calculation of negative state probabilities when reconstructing discrete character states - solves one of the more vexing problems I've had with the ace function. You should definitely get the update if you're doing ancestral reconstruction of discretely coded traits! Now we just need to hope the April 17th upgrade to R 2.9 goes smoothly...


Labels: ,


3/27/2009

 

Hooray for those 6 brave souls! :-)

When every student has a laptop, why run computer labs? - Ars Technica
According to the school's Information Technology & Communication department, 3,117 freshmen enrolled in 2007, and 3,113 of them owned their own computer. Nearly all of the machines were laptops, with 72 percent running Windows and 26 percent running Mac OS X (six hardy souls ran Linux).


Labels:


3/26/2009

 

Jalview Google Summer of Code 2009

(Shameless plug of the day)

There is an opportunity for students to propose a Jalview
related software development project as part of the Google Summer of
Code this year, with an stipend of $4500. This project would be
supported by the NESCent mentor organisation, and ideally improve
Jalview's phylogenetic analysis capabilities, enhance the applet's use
as an AJAX web gui component, and/or extend its visualization and
editing capabilities for use as a curation tool.

The application period for student proposals is rapidly approaching, and
its important to discuss proposals with mentors before submission. If
you or anyone you know are interested, please read and/or forward the
message below, and look at the jalview project section on the following
(huge) URL:

http://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#Extend_Jalview_Alignment_visualization_tool


Regards.
Albert Vilella.
----

PHYLOINFORMATICS SUMMER OF CODE 2009

http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009

The Phyloinformatics Summer of Code program provides a unique
opportunity for undergraduate, masters, and PhD students to obtain
hands-on experience writing and extending open-source software for
evolutionary informatics under the mentorship of experienced
developers from around the world. The program is the participation of
the US National Evolutionary Synthesis Center (NESCent) as a
mentoring organization in the Google Summer of Code(tm) (http://
code.google.com/soc/).

Students in the program will receive a stipend from Google (and
possibly more importantly, a T-shirt solely available to successful
participants), and may work from their home, or home institution, for
the duration of the 3 month program. Each student will have at least
one dedicated mentor to show them the ropes and help them complete
their project.

NESCent is particularly targeting students interested in both
evolutionary biology and software development. Initial project ideas
are listed on the website. These range from hardware accerelation for
phylogenetic inference, to tree visualization within a wiki, to
alignment of next-gen sequencing data, to development of a reusable
ontology term markup module for biocuration. All project ideas are
flexible and many can be adjusted in scope to match the skills of the
student.  We also welcome novel project ideas that dovetail with
student interests.

TO APPLY: Apply online at the Google Summer of Code website (http://
socghop.appspot.com/), where you will also find GSoC  program
rules and eligibility requirements.  The 12-day application period
for students opens on Monday March 23rd and runs through Friday,
April 3rd, 2009.

INQUIRIES: phylosoc {at} nescent {dot} org. We strongly encourage all
interested students to get in touch with us with their ideas as early
on as possible.

2009 NESCent Phyloinformatics Summer of Code:
http://hackathon.nescent.net/Phyloinformatics_Summer_of_Code_2009

Google Summer of Code FAQ:
http://socghop.appspot.com/document/show/program/google/gsoc2009/faqs

Cyberinfrastructure Traineeships (managed separately from GSoC;
postdocs also eligible):
http://hackathon.nescent.org/
Cyberinfrastructure_Summer_Traineeships_2009

To sign up for quarterly NESCent newsletters: http://www.nescent.org/
about/contact.php

---------

Todd Vision and Hilmar Lapp
National Evolutionary Synthesis Center
http://nescent.org

Labels: ,


3/24/2009

 

Sugarcane sugar content

Sugarcane genes associated with sucrose content. [BMC Genomics. 2009] - PubMed Result
Sugarcane genes associated with sucrose content.

Papini-Terzi FS, Rocha FR, Vencio RZ, Felix JM, Branco DS, Waclawovsky AJ, Del Bem LE, Lembke CG, Costa MD, Nishiyama MY Jr, Vicentini R, Vincentz MG, Ulian EC, Menossi M, Souza GM.

ABSTRACT: BACKGROUND: Sucrose content is a highly desirable trait in sugarcane as the worldwide demand for cost-effective biofuels surges. Sugarcane cultivars differ in their capacity to accumulate sucrose and breeding programs routinely perform crosses to identify genotypes able to produce more sucrose. Sucrose content in the mature internodes reach around 20% of the culms dry weight. Genotypes in the populations reflect their genetic program and may display contrasting growth, development, and physiology, all of which affect carbohydrate metabolism. Few studies have profiled gene expression related to sugarcanes sugar content. The identification of signal transduction components and transcription factors that might regulate sugar accumulation is highly desirable if we are to improve this characteristic of sugarcane plants. RESULTS: We have evaluated thirty genotypes that have different Brix (sugar) levels and identified genes differentially expressed in internodes using cDNA microarrays. These genes were compared to existing gene expression data for sugarcane plants subjected to diverse stress and hormone treatments. The comparisons revealed a strong overlap between the drought and sucrose-content datasets and a limited overlap with ABA signaling. Genes associated with sucrose content were extensively validated by qRT-PCR, which highlighted several protein kinases and transcription factors that are likely to be regulators of sucrose accumulation. The data also indicate that aquaporins, as well as lignin biosynthesis and cell wall metabolism genes, are strongly related to sucrose accumulation. Moreover, sucrose-associated genes were shown to be directly responsive to short term sucrose stimuli, confirming their role in sugar-related pathways. CONCLUSION: Gene expression analysis of sugarcane populations contrasting for sucrose content indicated a possible overlap with drought and cell wall metabolism processes and suggested signaling and transcriptional regulators to be used as molecular markers in breeding programs. Transgenic research is necessary to further clarify the role of the genes and define targets useful for sugarcane improvement programs based on transgenic plants.


Labels:


3/23/2009

 

Laura Clarke - 1000 Genomes project

Depositing every week same amount of data as all was in public dbs before (past ~20-30 years).

Data coordination (EBI Paul Flicek, Laura+Zam). Keep submissions, run QCs and recalibration, present in mini Ensembl browser. Working on the Resembl (Solexa) public release, need to implement as MySQL 5.1 partitioning instead of commercial db.

Pipeline -- first align to the genome; will move to new assembly soon. 454 with ssaha, Solexa here with MAQ.

Trios data now being churned at dbSNP -- causing dbSNP more churning than usual releases, they are catching up. Low coverage also submitted, but will probably be in dbSNP 131.

Data formats: Fastq / BAM (binary SAM alignment map format) / GLF (genotype likelihood format). BAMs/GLFs will be updated as more data gets in and old ones will disappear.

Hope all the sequencing will be done by the end of 2009. Paper about pilot projects soon. Targetted sequencing (pilot 3) took more time, pull-down methods a bit longer to nail down, now working.

DCC more automated data delivery systems. Standard QC/Recalibration pipeline. Other high throughput analyses. New staff. May take over the alignment process once the alignment algorithm is consensuated.

Jim Stalker and Thomas Keane doing a lot of work at the Sanger. Eugene Kulesha and Stephen Keenan on the website work. Fiona and Yuan on calling/storing/presenting SNPs in Ensembl.

Labels: ,


3/21/2009

 

Browse the NCBI Short Read Archive by taxonomy

Here is the link.

Labels:


3/20/2009

 

Next Generation community resources for Next Generation Sequencing

I've been looking at SEQanswers lately to try and discern where is people drawing the "Here be dragons" of their research. A lot of it is about wet lab protocols, which is great news, because it means the discoveries in different labs are shortcutting publication delays and being adopted as soon as possible. But there is also a lot of data analysis discussions which is great to identify the needs for specific bioinformatics tools in the outside world. It is becoming increasingly important in new emerging IT fields to know what *not* to work on other than what to work on, and forums like SEQanswers are great for this.

Great seeing people using this forum and not being afraid of showing results --- which is also showing muscle sometimes, but all good and fair. I wonder how much of this is known by lab bosses and how much of potentially old generation bosses understand of these Internet open community practices...

Labels: ,


 

Use copy+paste

My surname is Vilella which is kind of the diminutive of Vila, or Ville, or littletown. This last 7 days I've had a badge and stickers given with the spellings:
  • Viella
  • Villela
Given the late confusion, I recommend everyone that has to deal with my surname to copy+paste it from somewhere else. Here a few you can use:

Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella Vilella.


3/18/2009

 

The blurring line between protein alignments and genomic alignments

As more and more data is being poured into the public sequence databases, an increasingly detailed map is being drawn that relates sequences from different individuals or different species, mainly into what has been known in the field as protein or genomic (DNA) alignments. This is what one can call twenty-first century molecular cartography.


All references to molecular evolution this year should be accompanied with an analogy to Darwin's work, so here is how it works in this case: Darwin's next generation machine, the Beagle, went on a journey to accumulate an enormous variety of specimens that, when compared all together, allowed Darwin to draw the first phylogenetic tree.


Contrary to what one would think, alignments with more sequences are easier to resolve than ones with less sequences, at least when the phylogenetic tree relating the sequences increases in detail, which is almost always. And this is allowing researchers to generate genomic alignments for phylogenetically dense groups of genomes while, in parallel, the protein alignments for the corresponding protein coding genes in these genomes are combined together with more distantly related species. This dense taxon sampling is making the distinction between protein alignments and genomic alignments less and less obvious.


As an example, one can use the highly conserved protein coding exons to anchor the points in the different chromosomes that define stretches of conserved synteny among the genomes, and then align these DNA stretches all together with a genomic aligner. At the same time, one can use the exon boundaries defined in the DNA sequences of the coding genes to help infer the right protein alignment at the aminoacidic level.


A new opportunity is now arising in exploiting the information that is contained separately in the genomic and protein alignments to combine them into a single object representing both. New methods are being developed that will exploit the landmarks that both genomic and protein alignments have correctly place to converge into a single intertwined alignment object. This new type of alignment has in a way already been represented in closely related prokaryotic genomes. But prokaryotic genomes are less interesting for some topics, like alternative splicing, repetitive elements or recombination hotspots. Combined genomic and protein alignments will bring new elements of detail together that have been scattered so far for researchers to study and hopefully some new and brilliant mechanistic explanations of the innards of molecular evolution will arise from them, in the same way that Darwin did two centuries ago.


So a deluge of sequencing data is not really a problem but an opportunity.



Labels: ,


3/16/2009

 

How much should you stretch HMM classification?

There are ongoing discussions in our Campus regarding the use of de novo clustering or HMM classification to update the family models in an Orthology database from one release to the other.

One trend that I don't favour is to use the HMM models from the previous build and classify the current protein sets to it. Then re-run the alignment and tree-building steps after that.

The other trend that I favour is to re-run the new blasts/phmmers for the new proteins, re-cluster with the other hits in the updated graph, and then re-run the alignment and tree-building steps in the new set of family models.

People who argue in favour of the HMM classification procedure and want to convince me of their feasibility show give convincing answers to these questions:

Let's say you only have a few complete genomes sequences with provisional gene predictions from your clade but expect to have 20% more extra finished genomes with better gene predictions every two months. Over the course of a year your will have more than doubled the number of genomes. Do you trust the HMMs you are doing today to represent the family models in two month, four-month, six-month, eight-month, ten-month, twelve-month time?

Let's say that you have answered the previous question with a 'yes' then, where do you draw the line to update your HMM by rebuilding them from updated genome sets? Why don't you take Human, S.cerevisiae, A.thaliana, E.coli and P.furiosus and call that the ultimate representation of all family models on Earth? Do you think those families would be as good as than the ones obtained using 80 genomes instead?

If so, then it's very easy for you, just use those 5 genomes as your family model set. But I don't think that is the way to go.

Labels: ,


3/15/2009

 

Fish Genomes to come

We hosted the International Fish Genomes meeting once more this weekend,
bringing researchers from Europe, America and Asia to give talks about their
current scientific progresses and also to discuss possible collaborations.

The amount of work reported by the zebrafish people at the Sange, leaded by
Dr. Derek Stemple, was a surprise to me, only because I wasn't closely
following their efforts before (a quick summary at the bottom). They are in
some ways ahead of the human genetics people in developing methods and
protocols to take full advantage of the NextGen sequencing technologies. The
ZFIN consortium has recently released Zv8, the latest version of the Danio
rerio assembly, which will be present, with brand new gene prediction build,
in the upcoming Ensembl v54 in April.

One recurring theme in most talks was how to deal with heterozigosity and
polyploidization issues when trying to assemble fish genomes. Different
techniques to create artificial individuals with double haplotype genome sets
have been developed that make this task easier. Another recurring theme was
that the NextGen sequencing was focused on assembly, but there are a lot of
hanging fruits to be picked up for SNP and cDNA analyses that will squeeze
more biological results out of the same data. Squeezing data doesn't seem
very trendy nowadays with the second deluge we are facing, but I don't think
it's something we should feel to personal about, because in essence is only a
change in the scale of data set size, but with the same imagination and
ingenuity at hand as before.

One of the most interesting talks to me was Dr. Hugues Roest Crollius demo of
their Dyogen Synteny Browser, which complements a lot what we have in the
EnsemblCompara GeneTrees in terms of visualization and predictions errors
discovery. This browser is specially relevant for the fish clade, by bringing
out very clear patterns of gene conservation and loss after the whole genome
duplication in the Teleosts.

The extraterrestrial talk about skate cell biology and development brought
pictures of baby skates resembling very much the alien stuck onto the face of
the guy on Alien, the film. Cool and disturbing in equal amounts.


Another conclusion from the meeting is that we probably need to gather
efforts in writing a strong proposal to NHGRI for a 2x-mammalian-like sequencing
spree in the fish clade. As the mammalian genomics study is now showing, you
get a more complete picture by sequencing lots of ``millions of years'' than
getting stuck in repetitive and haplotypic knots in one single genome. Still,
no one stepped forward on Saturday to coordinate the proposal, and a bit more
haggling and ambushing may be needed :-).

----
Derek Stemple -- Zebrafish Genome Project

Heterozigosity and haplotype issues in the six related individuals
used first

Better DNA to make libraries --> Fertilized eggs with UV-inactivated
sperm -- melt the spindle in the first division -- doubled haploid
fish (DH)

Better genetic maps -- radiation hybrid, heat shock, meotic map

Gene Annotation using Solexa Sequencing -- how to do it

Q: SNPs from the Solexa data?

A: Not focusing on that now, reference already has 0.6M, 6M with other
line

Q: 3UTR repetitive element problems in Solexa?

A: Use read pairs, filter out in the alignment, it's difficult. Will
probably do bigger library inserts soon.

Q: Depth and false 5prime ends?

A: A third of reads cross a boundary, 5prime ends there is a bias
towards one strand in the sequencing

Matthew Clark

SJD is homozigous but nobody uses it because of reproduction issues in
the lab

Doing CNV analyses, comparing with repeats using a HMM (Jared Simpson)

Read pair abberrations -- lots of deletions found -- badly assembled
bits of the genome? (Klaudia Walter)

Doing a WGSA Affy for SNP chip -- will help BAC mapping

De novo map -- mapmaker, joinmap, RECORD, SMOOTH, Combin, MSTmap

MSTmap works on Turing -- combine genetic map with sequencing

Zebrafish and Danio genus polymorphism rates -- conserved elements



Labels: ,


3/13/2009

 

The crisis of credit animated


3/12/2009

 

The wonders of www.gopubmed.com

I stumbled upon gopubmed a while ago and found very useful the idea of providing some more meta-information about the searches, like author's affiliation, but mainly statistics about the search.

Since then they have now improved the statistics and added what they call "Author collaboration network" which is really cool. I was amazed to see that it also works for combinations of authors, like: "smith j [au] AND jane j [au]".

Really cool stuff that I highly recommend!

Labels:


3/11/2009

 

G1s sellint at 70-80% the rate of iPhones according to T-Mobile UK

Assessing Android - Blogs – ComputerworldUK blogs - The latest technology news & analysis on Outsourcing, HMRC data, Apple iPhone, Global warming, MySQL, Open Enterprise
Not surprisingly, given the nature of this blog, I'm pretty favourably disposed towards Google's Linux-based Android platform, even though I don't possess the only phone currently using it, T-Mobile's G1.

But it's hard to tell just how well it's doing against the iPhone, say. If any one knows, it's T-Mobile, so I was interested to receive this morning some tantalising tidbits from Richard Warmsley, head of Internet and Entertainment at T-Mobile UK.

Despite being pressed by me, he wouldn't get into specifics (now, there's a surprise) about sales, saying only that they had exceeded expectations. But he did reveal that according to their market research, G1s were selling at 70-80% the rate of iPhones. Even allowing for margins of error and any tendency to talk up such numbers, this suggests a healthy uptake.


Labels:


3/10/2009

 

What does this look like?


It's a protein multiple alignment of 871 G-protein coupled receptors visualized using Jalview Overview window. Notice the red rectangle that correspond to the bit you can actually see in detail on a 2000x1200 TFT screen, and the mouse pointer on the bottom right corner as a reference of the size of the thing.

This is one of the biggest GeneTrees in Ensembl v53, and also one of the interesting ones to navigate around.

http://www.ensembl.org/Homo_sapiens/Gene/Compara_Tree?db=core;g=ENSG00000112414

Labels:


3/09/2009

 

Stroke regeneration research

BBC NEWS | Health | Stem cell 'scaffold' for stroke
Study leader Dr Mike Modo, from the Institute of Psychiatry at King's College London, said: "This works really well because the stem cell-loaded particles can be injected through a very fine needle and then adopt the precise shape of the cavity.

"In this process the cells fill the cavity and can make connections with other cells, which helps to establish the tissue."

He said over a few days they were able to see cells migrating along the scaffold particles and forming a primitive brain tissue that interacts with the host brain.


Labels:


 

BioMart Slow Query Analysis -- Rhoda Kinsella -- European Bioinformatics Institute - EMBL

BioMart is a query-oriented data management system developed jointly by the Ontario Institute for Cancer Research (OICR) and the European Bioinformatics Institute (EBI).

Queries were crashing the mart servers or taking a long time. Log any queries longer than 60 seconds.

New hardware on v51 dropped the no. slow queries from 95000 to 3300. Picked up probably more userbase in v52 to 12000.

Something got wrong on Oct 13th as a lot of slow queries happened that day. Another 2000 on Oct 19th.

One go pull downs of EMBL, MGI, UniprotSWISSPROT. Pull down all GO. Pull down all est and gnf data. Pull all protein_feature PFAM, tfhmm data. A bit of HGNC filtering, but most of the time people want all in one query with no filtering.

Conclusion: there is a fine line between using BioMart and fetching zipped files in an FTP server. Some users seem to prefer BioMart even though it will be much slower than going to the FTP site.

For v52, lots of snp_marts. People sending queries filtering for lots of stuff.

All Xref EMBL, EntrezGene and protein_id: some genes have thousands of EMBL links, multiplied by thousands of protein_ids, makes queries slow.

SNP52: strain polymorphism table with a lot of strain filters.

SNP52: transcript_variation (variation_feature_ids)

E!52: exon_transcript table + transcript_variation table + filterings.

Solutions:
  • remove duplication and solve all NULL values on row issues (e.g. gnf)
  • limits on external attributes
  • limits on est and gnf attributes
  • indexed expression tables
  • new hardware
Upcoming solutions:
  • increase result batch size?
  • merge 3 GO categories?
  • Remove unused tables?
  • Canned queries?
  • Stop user ability to re-send a query?
  • Keep analysing the logs and make it more automated and informative -- keep an eye on what is happening
  • Maybe limits on formats for "gimme all for X" --> goes to the FTP link
  • Maybe new FTP dumps for "gimme all attributes for Y+Z" --> goes to the FTP link
  • More complicated combinations --> cached them on a "dynamic FTP"
Lots of little things that together improve things a lot.

Labels:


3/06/2009

 

University of Cambridge and Roche 454 analysis of type-1 diabetes protective alleles

Type 1 Diabetes Resequencing Study Finds Protective Variants | GenomeWeb Daily News | Sequencing | GenomeWeb
By re-sequencing a handful of candidate genes from previous genome-wide association studies, University of Cambridge and Roche 454 researchers have identified four rare variants and one common variant in an antiviral gene called IFIH1 that protect against type 1 diabetes or T1D. Since the variants appear to curb the gene's activity, the team proposes that functional IFIH1 may contribute to T1D.


Labels:


 

I saw the movie and it didn't end well...

Chilean extremophile bacteria thrive in Mars-like conditions - Ars Technica
The Atacama desert lies on the western edge of South America, covering much of northern Chile and parts of Argentina. It is the closest one can get to Mars while remaining grounded on Earth. High atop the Socompa volcano on the Eastern edge of the Atacama desert, the atmosphere is thin, the ultraviolet radiation is intense, and the climate is dry. Nevertheless, the improbable has been found: life. Near the rim of the 19,850-foot-high Socompa volcano, researchers from the University of Colorado at Boulder's Alpine Microbial Observatory found a thriving, complex microbial community that appears to be supported by gases emanating from volcanic vents around the rim.

The Atacama desert is the driest place on Earth. Weather stations in the Antofagasta region of Chile average one millimeter of precipitation per year, and a number of weather stations in the Atacama have never recorded rainfall throughout their entire operational life. The extreme climate there is often compared to the surface of Mars. It is believed to be so similar that a Science article, published in 2003, used it in an attempt to re-create the experiments that Viking One and Two performed on the Martian surface. It is also a proving ground for equipment that NASA plans to send to Mars one day. Given the geologic similarities, the discovery of life in such a hostile place suggests that life could exist elsewhere as well.


Labels:


3/04/2009

 

Cambridge UK immune to the economic downturn

Been to the public hearing for the transport commission at the Guidhall in Cambridge and got the impression that everybody was very prudent in saying that transport increases will be at all affected by the economic downturn.

Some interesting numbers were given:
  • The Silicon Fen area of influence generates £100bn per annum. In contrast to that, the national transport inversion in the area is disproportionately small.
  • Measures taken in restricting car access to the City centre have stopped and even decreased the volume of private car traffic in the last 10 years. In contrast, the access to the centre has increased steadily, and the Park and Rides are now at 2.2-2.5M per year.
  • The property bust in the economic downturn will probably last for 2009 and 2010, and number of dwellings and transport needs will spike back after 2010. The property market is already thawing from the late 2008 or early 2009 freeze. Projections of 75000 new homes by 2021 will only need to be shifted by two years, to 2023.
Cambridgeshire knowledgebased economy is and will be among the least affected by the downturn, and it will even be able to atract ex-City workers now shifting into research and development. The transport estimates for the Cambridgeshire County will only need to be shifted by two years with respect to pre-credit crunch estimates.

All very interesting and looking forward to see what comes out of this comission in the near future...


 

Myco-diesel from the fungalgenomes

Fill-er-up with Myco-diesel?
So this is actually old-ish news, but I saw this press release about paper published last year describing the ability of the fungus Gliocladium roseum to naturally synthesizes diesel compounds. The paper from Gary Strobel @Montana State and collaborators describes that G. roseum produces volatile hydrocarbon on cellulose media. Extracts from the host plant (Eucryphia cordifolia) were also able to support growth of the fungus alone. This production of products have been dubbed “myco-diesel”. G. roseum is an endophyte of E. cordifolia I wonder what kinds of advantages it might provide for the fungus or the plant to produce these hydrocarbons.

I wonder if it is better to focus on these organisms that have already evolved a way to make these hydrocarbons directly from cellulose rather than the multistep process of making easy to process sugars from different starting plant materials and then ethanol or other hydrocarbons from yeast or bacteria growing on that sugar. Growth rates, amenability to grow in bioreactors, etc certainly are considerations in building production systems, but I wonder whether these kind of finding represent inroads to solving our problems or if they are peripheral to the current bioengineering approaches that are underway.

Some of the earlier press releases I had missed it seems:

* Rainforest fungus makes diesel
* Diesel Fuel From a Tree Fungus?
* NPR
* Google news

G. A. Strobel, B. Knighton, K. Kluck, Y. Ren, T. Livinghouse, M. Griffin, D. Spakowicz, J. Sears (2008). The production of myco-diesel hydrocarbons and their derivatives by the endophytic fungus Gliocladium roseum (NRRL 50072) Microbiology, 154 (11), 3319-3328 DOI: 10.1099/mic.0.2008/022186-0

My microbiology teacher used to be very pessimistic about this ten years ago. He would say: "It's all a matter of not being able to concentrate the product at the end!". I wasn't so pessimistic back then and I am even more optimistic nowadays, but I still think there is a lot of basic engineering to be done after the biological candidates have been proven useful. Where are the good engineers when one needs them?

Maybe we will need an engineered organism that produces controlled "blooms" that are easy to separate... Who knows...

Labels:


 

Martin Taylor - EBI

chimp-human different promoter subst rates than human-mouse - CAGE next gen sequencing in Japan for TSSs

drosophila subgroup - E. Sevin - lots of TSS data there as well

indels are a problem in consistently calculating neutral subst rates

relaxation of constraint combined with pos sel?

neutral (mutation) rate higher in promoters than other genomic regions?

is this distinct chromatin environment? transcriptional status?

using transcription coupled repair bias identifies bidirectional
promoters and germline expression promoters

Q - should we be doing primates, should we be doing human populations, or both? Both.


 

Caring for the gene names

Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics
A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible; the original gene names cannot be recovered.

Thanks to JM Duran for pointing me to this paper.

Labels:


3/03/2009

 

Leo Goodstadt MRC Oxford -- Treasures in the trees

the last 5% in finishing genomes is very important

positive selection on defined main categories -- reproduction, olfaction, immunity, transcription factors

lots of examples for gene expansions in rodents

KAL1 as an example of a gene loss

RP25 retinitis pigmentosa is an interesting case -- how come this can be traced back to drosophila? wasn't this supposed to be converging evolution? is it a real ortholog? how come so many gene losses in several groups?

MD: don't you think duplication is really gene expression and it's regulation?

What should we do now? KRAB Zn-Fingers and other transcription factor families. Primates will give us lots of info that human popgen won't. RD doesn't seem to disagree...





Labels:


3/02/2009

 

Some people only complain when it's too late

A famous actor and comedian died this weekend from a long and suffering lung cancer. Media has covered this extensively and linked it to certain comments asking the Health Minister to toughen up on tobacco legislation.

One side of the media is to report and reflect on what is the buzz on the street and the other is to write knees-deep editorials on a given subject. But are you telling me no-one deserved media attention on stricter tobacco regulations when these were being voted last year? Why was all about interviews to pub owners complaining that their business was going to be hugely affected by the new regulations? What was the right proportion of pro/against opinion of media coverage then? 1/10? 1/1000?

I dare anyone to predict that laws will be a lot more permissive in 10 or 50 years time than they are today...


2/27/2009

 

Daniel Falush, Department of Microbiology, University College Cork

First third about human population admixture, then e.coli...

Example of admixture -- Barack Obama

Li and Stephens 2003 -- HMM model for genetic variation

HGDP data (Li et al., Science 2008) - 650k ~ 1000 unrelated indvs

Sloppy painting -- african portion more or less remains
african. Brahui (Pakistan) have a lot of admixture.

Hazara admixture -- 22 generations ago -- Mongol era admixture

Kalash -- 630 BC -- Outlier population with the oldest time having no
events

Aaron Darling in the bacteria side
Didelot did ClonalFrame

phlyopools (not misspelled) -- weak ARGs -- lots of recombination in ecoli set 15
strains -- data doesn't tell much at the very deep branches of the
tree, but the more external branches have a lot of information

group b1 has high recombination rates -- lots of pathogens in it

Labels:


 

Curtis Huttenhower, Lewis-Sigler Institute for Integrative Genomics and Department of Computer Science at Princeton University

MEFIT - Complex human diseases analysed at a meta-study level for gene expression datasets

Interests:
Gene     -> Function
Gene     -> Gene
Data     -> Function
Function -> Function e.g. regulatory cross-talk

Prior knowledge on yeast ~80% whereas human is less than 40%

Sleipnip - 8 hours to 1 minute in yeast -- 30 years to 2 months - 18 hours in parallel

Predicted functional relatedness of a gene against any other gene in the genome

What is the data that relates BRCA1 and BRCA2? Mint, BioGRID, etc.

Genes that are related to a known disease, eg. altzheimer's disease

Is a gene performing X specific function to another group of genes?


Labels:


2/26/2009

 

Martin Evison, University of Toronto

Measures used nowadays in court to identify faces -- photogrammetry, superimposition and anthroscopy

Face changes with age and differently for each sex. Nostrils are very good landmarks. Ear lobes are the measure that has the most variance. In any age range, females have more generous lips.

Company in Florida giving the police a face profile based on SNPs tagging facial differences on a sample of 200 people (own emploees). Narrows down the profiles police force need to look for.

Oragene saliva sample should last for 1 year. Sampling then correlating with faces. Cool 3D model of faces done by interpolating 8 1Mpx pictures, called geometrix facevision.



Labels:


2/25/2009

 

Open Source push by the UK government

BBC NEWS | Technology | UK Government backs open source
The UK Government has said it will accelerate the use of open source software in public services.

Tom Watson MP, minister for digital engagement, said open source software would be on a level playing field with proprietary software like Windows.

Open source software will be adopted "when it delivers best value for money", the government said.

It added that public services should where possible avoid being "locked into proprietary software".

Licenses for the use of open source software are generally free of charge and embrace open standards, and the code that powers the programs can be modified without fear of trampling on intellectual property or copyright.

Announcing an open source and open standards action plan, the government said it would:

* ensure that the Government adopts open standards and uses these to communicate with the citizens and businesses that have adopted open source solutions

* ensure that open source solutions are considered properly and, where they deliver best value for money are selected for Government business solutions

* strengthen the skills, experience and capabilities within Government and in its suppliers to use open source to greatest advantage

* embed an open source culture of sharing, re-use and collaborative development across Government and its suppliers

* ensure that systems integrators and proprietary software suppliers demonstrate the same flexibility and ability to re-use their solutions and products as is inherent in open source.

Government departments will be required to adopt open source software when "there is no significant overall cost difference between open and non-open source products" because of its "inherent flexibility".


Labels:


 

Text me if you can

BBC NEWS | Technology | Texting 'improves language skill'
Texting is likely to be an important part of a child's learning development, she thinks.

"The more exposure you have to the written word the more literate you become and we tend to get better at things that we do for fun," she said.

The study found no evidence of a detrimental effect of text speak on conventional spelling.

"What we think of as misspellings, don't really break the rules of language and children have a sophisticated understanding of the appropriate use of words," she said.

Other reports have produced similar results. Research from the University of Toronto into how teenagers use instant messaging found that instant messaging had a positive effect on their command of language.


Labels:


2/22/2009

 

Gramm-Leach-Bliley act of 1999

Gramm-Leach-Bliley Act - Wikipedia, the free encyclopedia
The Gramm-Leach-Bliley Act (GLBA) allowed commercial and investment banks to consolidate. For example, Citibank merged with Travelers Group, an insurance company, and in 1998 formed the conglomerate Citigroup

Economists Robert Ekelund and Mark Thornton have criticized the Act as contributing to the 2007 subprime mortgage financial crisis, arguing that while "in a world regulated by a gold standard, 100% reserve banking, and no FDIC deposit insurance" the Financial Services Modernization Act would have made "perfect sense" as a legitimate act of deregulation, under the present fiat monetary system it "amounts to corporate welfare for financial institutions and a moral hazard that will make taxpayers pay dearly". [14]



2/20/2009

 

Desensitising for peanut allergy. Cool!

BBC NEWS | Health | Hope over peanut allergy 'cure'
It's not a permanent cure, but as long as they go on taking a daily dose they should maintain their tolerance


Labels:


2/18/2009

 

Not there yet for complicated genomes

BioMed Central | Full text | Assessing the feasibility of GS FLX Pyrosequencing for sequencing the Atlantic salmon genome
[...] the addition of GS FLX Paired End reads vastly improved the capability of 454 pyrosequencing by enabling the assembly of contigs into large scaffolds. Indeed, in terms of the number of scaffolds produced, the GS FLX assembly that included the combined shotgun and paired end reads was comparable to the Sanger assembly. Moreover, the order of the GS FLX scaffolds could be established from information from BAC-end sequences and the Atlantic salmon physical map. However, numerous gaps remained within the scaffolds, which is undesirable when a complete or reference genome sequence is one of the goals. Currently, if the Atlantic salmon genome is to provide a reference sequence for all salmonids, then a substantial proportion of the sequencing will have to be carried out using Sanger technology.


Labels:


 

Intimidating?

MacBook Pro 17" Unibody First Look
  • There are three tri-wing screws holding the battery to the Unibody case. (A tri-wing screwdriver is shown in the second photo.)
  • Apple did this to intimidate people out of swapping the battery, but a small flathead screwdriver (2mm or so) works fine to remove the screws.



2/17/2009

 

Second Linux phone by Google

It took a long time for Linux to become mainstream in mobile phones, but Google realised they needed a strong platform to develop on the smartphone market, bought the company behind Android and, after a few months, the G1 was born. Now, copying the pervasiveness strategy that Microsoft took for the OS in personal computers, Google is producing an OS that will be massaged into different smartphones with different features, always making sure they provide as many Google services as possible. Why hasn't Microsoft mobile been successful in doing that? Well, first of all, they have been, as you can see in the purple cheese below:


Notice the Linux market share is only 1/4 the size of Microsoft's share. But Microsoft bases their revenue on selling the OS to the phone manufacturer, and don't have much vested interest in what services are provided by the phone. Whereas if you want your services to work on the smartphone, you allocate your resources to make sure they do. Symbian has the same issue Microsoft has: they are not strong in providing services. On the other hand, Apple (red) has successfully been doing very good iPhones previous to the Android birth. They do have a couple of very important services to offer: music and video content. But Apple will always rely on Google's efforts to have maps, mail and docs. So they rely on that in the same funny way they rely on Microsoft's Office, which is a pain to their strive for OS success. Apple in the smartphone sector is what the IBM in the PC sector was before Microsoft and the PC clones...

BBC NEWS | Technology | Second 'Google phone' is unveiled
The touchscreen HTC Magic will feature a 3.2 Megapixel camera, Wi-Fi, and GPS, but no slide-out keyboard.
 



Labels:


2/16/2009

 

Amazing Perl profiling with perl -d:NYTProf (updated)

Perl has always been considered a lesser computer language when compared against Java or C. There is always this idea that if a language doesn't have good debugging or profiling tools, it's not really a serious language. And to be fair, open source tools for C weren't even that good at profiling before the advent of valgrind. I had always been rather dissatisfied with the profiling tools in Perl, but last summer I discovered Devel::NYTProf:

http://blog.timbunce.org/2008/07/15/nytprof-v2-a-major-advance-in-perl-profilers/

Now, I can say that Perl has leap-frogged a fair number of other languages in terms of profiling capabilities. Tim Bunce, who have a talk this fall at the London Perl Mongers meeting, has released a new NYTProf 2.08 version, which incorporates even more optimization tracking goodies.



Never again let your Java/C coworkers make fun or your underrated language: the camel has ass-kicking profiling abilities now!

UPDATE: Not that I don't consider c#, vb, php or javascript as good languages, it's only that I know less of their profiling tools






2/06/2009

 

The wonders of the internet

It's like the statement: "I'm sure a teenager in Singapore has already coded this in his/her bedroom tonight"... BBC NEWS | Technology | Nine-year-old writes iPhone code
A nine-year-old Malaysian boy in Singapore has written a painting application for the Apple iPhone.

 

And now with a press release

Investor Relations .::. illumina, inc. .::. News Release
Illumina, Inc. (NASDAQ:ILMN) unveiled a development roadmap for its Genome Analyzer system that charts a path to generate greater than 95 Gigabases of high quality data per run in 2009. This roadmap, which was presented at a user-group meeting at this week’s Advances in Genome Biology and Technology (AGBT) conference, outlined advances in chemistry, algorithms, and hardware which will substantially improve accuracy, read length, data density, and ease of use. These developments chart a clear and demonstrable path for researchers to generate 25x coverage of a human genome for less than $10,000 in 2009.

“The demonstrated pace of innovation on the Genome Analyzer has enabled us and end-users to embark on ambitious, new whole-genome sequencing projects that will have a major impact on human health, especially cancer,” said David Bentley, Vice President and Chief Scientist of DNA Sequencing at Illumina. “Currently we can generate greater than 25x coverage of a human genome in three flow cells; a year ago, more than 40 flow cells were used to complete our first African genome. By year’s end, we anticipate generating the same 25x coverage on a single flow cell bringing the cost of acquiring a human genome sequence to below $10,000.”

The current configuration of the Genome Analyzer has the potential to generate in excess of 15 Gigabases of high quality data per run. From this baseline, the performance of the Genome Analyzer is expected to increases greater than six-fold in 2009. The advances to achieve this increase will be commercialized in several phases throughout the year and include the following elements:

* Chemistry advancements including new polymerases for sequencing and cluster generation to enable faster run times and paired reads in excess of 2x100 base pairs each. These advancements also improve sequencing accuracy to greater than 98.5% for 2x100 paired end reads and 99.9% for 2x50 paired end reads.
* Hardware upgrades including improved flow cell holder and larger reagent cooler provide an increase in output and walk-away automation for reads of at least 100 cycles. These hardware components will comprise the Genome AnalyzerIIx Upgrade Kit, which current Genome Analyzer users can order immediately to increase the output and enhance the automation of their system.
* Algorithm improvements including a new approach to cluster detection will increase output up to 80% on high density flow cells and improve basecalling yielding greater accuracy and a larger proportion of perfect reads per run.
* Data density is increased by use of semi-ordered arrays of one micron and subsequently sub-micron features. These ordered arrays, combined with increases in read length, are expected to yield greater than 55 and 95 Gigabases per run respectively.

The combination of these advances will not only increase the output and decrease the cost of sequencing on a Genome Analyzer, but also expand the menu of applications that researchers can perform on the system. Notably, de novo sequencing and assembly of complex genomes, already possible with the Genome Analyzer, is considerably enhanced by the capability to completely sequence DNA fragments of up to 250 base pairs using the Illumina short-insert libraries and 150 base pair reads. The ability to generate contiguous 250 base pair sequences allows researchers to use a variety of existing long read assemblers for de novo sequencing and metagenomics.

“With the largest installed base of next-generation platforms and over 200 peer-reviewed publications to date, the Illumina Genome Analyzer has enabled a variety of scientists worldwide to conduct groundbreaking research rapidly and cost effectively,” said Joel McComb, Senior Vice President and General Manager of Illumina’s Life Sciences Business unit. “With the planned system enhancements in 2009, we anticipate that the Genome Analyzer will continue to provide a scalable and flexible solution for a broad menu of applications, including large scale whole-genome analysis, de novo sequencing, and metagenomics, and accelerate the rate of discoveries leading to novel insights about human health, biodiversity, and the environment.”

2/05/2009

 

The strength of the British pound

The Bank of England (BoE) has slashed interest rates by another 50 points today, leaving the current rate at 1%. This measure is to tackle stimulate the economy at a moment where there is more worry about deflation than inflation. Also today, the European Central Bank (ECB) has decided to keep interest rates at 2%. Even though one would expect the British currency to fall spectacularly against the Euro given these two decisions by the central banks, this hasn't happened. In fact, the British pound hitted bottom 5 weeks ago, and since then it's been pretty obvious than the BoE was more prone to further slashes than the ECB. How is this explained?

http://finance.yahoo.com/q/bc?s=GBPEUR=X&t=3m

Well, in my opinion, the BoE is trying to be more proactive in stimulating the economy than the ECB. This seemed costly 2 months ago, when the pound seemed to be going down the pipe, but now it's recovering. The same can be said for the Federal Reserve in the US, an organ that has been even more aggressive than the BoE and the ECB. The US dollar has already recovered *a lot* of ground to the Euro and the British pound. So with the current situation, one can only predict that both the US dollar and the British pound will gain turf against the Euro in the next few months.

If I am wrong, time will prove me wrong :-p

2/04/2009

 

Another great read

Cambridge University - the Unauthorised History
Cambridge University – the Unauthorised History

Cambridge University is celebrating its 800th anniversary in 2009. The official history tells the tale of the buildings; but what about the ideas?

Down through the years, Oxford has produced many powerful men and Cambridge many iconoclasts – scientists, philosophers and revolutionaries. The polarisation is by no means total: Oxford's alumni include the reformer John Wyclif and the father of economics Adam Smith, while ours include the Prime Minister Charles Grey, who abolished slavery and passed the Great Reform Bill. But we've long produced more of the rebels; way back in the Civil War, for example, we were parliamentarian while Oxford was royalist. Why should this be?

I can't find anyone else trying to tell the tale, so I'll try. This web page explains how disruption has been in our DNA from the very beginning.
Creative disruption

If you want physical objects destroyed, the army can do that. As for badly-run companies, they get trashed when the economy goes into recession; the economist Joseph Schumpeter taught us that this "creative destruction" is vital for progress as it clears away the deadwood and creates space in which new businesses can grow. And it's just the same in ecosystems: from 1911, the USA put a lot of effort into stopping forest fires, but then discovered that although they saved individual plants and animals they were destroying the environment. A forest with a fire brigade is a sad old forest; a lot of plants from sequoias to proteas reproduce only in the aftermath of a fire.

Just as fire regenerates the forest, so a great university regenerates human culture – our view of the world and our understanding of it. We incinerate the rubbish. And Cambridge has long been the hottest flamethrower; we're the most creatively destructive institution in all of human history. And big new things come from that. The ground we cleared made us the cradle of evangelical Christianity in the sixteenth and seventeenth centuries, of science in the seventeenth and eighteenth, of atheism in the nineteenth, and of all sorts of cool new stuff since – including the emerging sciences of life and information.

In the beginning

I believe it goes right back to the beginning. We were founded, eight hundred years ago, by scholars fleeing persecution during a period of conflict between church and state. In 1209 the buring issue of the day was whether King John or Pope Innocent III should appoint the next Archbishop of Canterbury. Such power struggles were going on all over Europe, and had been for years (John's father Henry II had had St Thomas Becket killed). One of the church's reactions was to organise crusades – against infidels abroad and heretics at home. Robert Moore tells the story of how successive popes in the twelfth and thirteenth centuries incited the mob against lepers, gays, Jews and other undesirables, in the process forming a culture of persecution of outgroups and minorities that has blighted Europe ever since.

It was against this background that our founders fled Oxford in 1209 and settled in the newly-chartered town of Cambridge. The townsfolk of Oxford had hanged two clerks for a murder of which they were apparently innocent; the king backed the townsmen, and the scholars dispersed for five years. Some of the refugees came to Cambridge, and established our university. A generation later, in 1231, both Cambridge and Oxford got charters from Henry III which exempted us from taxes; and two years after that a Bull from Pope Gregory IX gave our graduates the right to teach everywhere. Had these men foreseen the role Cambridge would play in later reformation and revolution, they might have been less generous!

Early years

By the end of the thirteenth century, Cambridge was already making its mark in philosophy, with Duns Scotus producing disruptive ideas in theology (some of which by the 20th century had become Catholic orthodoxy). After the fall of Constantinople, the Renaissance got going and challenged the curriculum. Cambridge, like other medieval universities, had taught grammar, rhetoric and logic for the BA, then arithmetic, music, geometry and astronomy for the MA; much of the course material came from Aristotle. Suddenly this tradition was under fire, and the big debate was whether to teach Terence as well as Aristotle. In 1488, the rebels won: we started offering a four-year BA with two years in "humane letters" followed by two in logic and philosophy. We prospered amidst this tumult; six new colleges were founded between 1430 and 1496. Another fifteenth-century development was that we started hiring salaried professors, rather than leaving all the teaching to the "Regent masters" or young teaching fellows at the colleges and hostels. The professors mainly taught postgraduate subjects like law and medicine. This was an advance for scholarship, but caused some problems for governance: the university was still run by the Regent masters, but their position was weakened by the professors.

As Renaissance moved toward Reformation, there were Cambridge scholars on both sides of the barricades. One of the most influential of our critical theologians was Erasmus, said to have "laid the egg that Luther hatched". His major act of disruptive scholarship, produced at Cambridge, was a New Testament in parallel Greek and Latin texts. Until then, the church had claimed to be the sole custodian of God's word, whose official text was the Vulgate of St Jerome. By producing the first translation from the original manuscripts for over a thousand years, Erasmus undermined the Vatican's monopoly on biblical authority – although that issue would rumble on and on.

Rebellion and reformation

Thus it was that when Henry VIII needed a theologian to justify rebellion against the Pope, he turned to Cambridge and hired Edward Foxe, the Provost of Kings. Foxe was soon eclipsed by his colleague Thomas Cranmer who became the first protestant Archbishop of Canterbury, wrote the Book of Common Prayer, and was executed by Queen Mary. Another Cambridge martyr was William Tyndale who translated the Bible into English and, like Cranmer, got burned at the stake for his pains. However Tyndale had embraced the printing press. He printed 55,000 copies of the Bible before he was burned, and stoked the fires of the Reformation.

The Cambridge Puritan tradition got traction as our internal rebellion against statutes imposed on us by Queen Elizabeth in 1570, which gave college masters power over academics in the hope that they would curtail heresy. Wishful thinking! Our Puritan tradition drove the settlement of America – the Pilgrim leaders Henry Barrowe, John Greenwood and Robert Browne were all Cambridge men – and culminated in the Civil War the following century. The Cambridge MP Oliver Cromwell defeated and executed King Charles I, supported by many more Cambridge men, such as the poet John Milton and the founder of the Royal Society John Wilkins. Others spread dissent farther afield; John Harvard endowed a university in New England and left it all his books. Cambridge men such as Tyndall, Cranmer and Milton gave a huge push to the process of reining in religion – of turning it from an instrument of state power into a matter of conscience (though that wasn't always what they intended).

Physics, chemistry and biology

In 1665-7, Isaac Newton discovered the laws of motion and gravity, and the calculus. This trashed the medieval idea of a God lurking everywhere in the world, forever interfering to keep the planets in their rightful motions and the destinies of men aligned to his will. By showing that God could simply have wound up the universe and set it running, but didn't need to interfere thereafter, Newton greatly enlarged the space for men to wonder whether supernatural powers dictate our fortunes in this world and the next. He himself was religious – but a dissident. Although a Fellow of Trinity College he did not believe in the doctrine of the Trinity; but no matter. He got a special dispensation from Charles II to be a dissenter.

Francis Bacon had already written about the scientific method in the early 17th century; Newton and his Royal Society colleagues such as Wilkins, Flamsteed and Halley made science a reality. (The word "scientist" was coined much later by William Whewell.) Within a few years, people could doubt in public whether there was a next world, and not go to jail (Halley managed to become a professor at Oxford in 1703 despite being an atheist). The eighteenth-century Enlightenment flourished in the space this created. Unfortunately for Cambridge, our authorities restricted the university to Church of England members through the eighteenth and early nineteenth centuries, and even required many academics to be ordained within a set number of years appointment. There was a long argument with our mathematicians wanting dissenters admitted, without needing the royal dispensation that Newton got, while our theologians dragged their feet. As a result, much of the running in the Enlightenment was made by men from elsewhere, such as Edinburgh's David Hume.

The nineteenth century saw a number of great Cambridge men filling in the gaps in the Newtonian idea of the world as mechanism rather than magic. Charles Babbage came up with the idea of the computer, and although he couldn't really build one with the technology of the time, his idea would eventually challenge the very concept of intelligence at the deepest level. Meanwhile, James Clerk Maxwell explained electromagnetism, and in 1897 JJ Thomson discoved the electron, laying the foundations for modern physics and electronics that would lead to proper computers. We also had great social reformers, such as Henry Mayhew. But the greatest iconoclast of the nineteenth century was undoubtedly another Cambridge scientist, Charles Darwin. By explaining how animals and plants evolve by variation followed by natural selection over long periods of time, he shot down the belief that man had been created specially by God and that we were qualitatively different from other animals.

Twentieth-century troublemakers
The early twentieth century saw not just the full-blown emergence of modern physics, with Cockroft and Walton splitting the atom and theorists like Dirac showing that reality is stranger than anyone could have imagined. It also saw philosophy flourish as people sought to understand this new and scary world. Cambridge philosphers such as Wittgenstein and Russell taught us that many metaphysical problems of the past simply arose from abuse of language (an idea that our medieval logicians had also explored). Russell also came up with Russell's Paradox, and such work in logic led (via Goedel) to Alan Turing's work on the foundations of computing. After Turing went to Manchester another Cambridge man, Maurice Wilkes, built the world's first proper computer, the EDSAC, and established the lab where I work. Shortly aftwards, Watson and Crick discovered the structure of DNA, which led rapidly to the realisation that living things are also self-replicating computational machines. Bioinformatics is now one of our strongpoints; Cambridge did about a third of the Human Genome Project (and John Sulston, who ran our genome project, pushed for the genome to become public domain – disrupting the world of "intellectual property"). And biology is only one of many fields to be turned upside down by computing. Starting in the 1950s, sciences as disparate as astronomy and crystallography have been revolutionised by computing; one science after another has started shifting from the theory-intensive model pioneered by Newton to a more data-intensive way of working. And the disruption caused by computing has spread from one industry after another – from telecomms to bookselling.

As for the humanities, Alfred Marshall synthesised what was known about economics, then Maynard Keynes attacked this classical synthesis and finally Peter Bauer undermined the postwar Keynesian consensus on central planning, price controls and foreign aid. We've also produced many creative and disruptive writers, such as Siegfried Sassoon, EM Forster, Sylvia Plath, Douglas Adams and even Salman Rushdie; meanwhile F.R. Leavis set literary criticism on its head. I'm not an expert on literature; I'm an engineer, and this web page is my own perspective – hey, warts and all, as Cromwell put it. But one thing I do know is that many other Cambridge people have helped to pick apart error in just about every imaginable field of human endeavour.

The next 800 years?
Cambridge scientists and scholars have demolished more ancient superstitions than anyone else. We've not just been a bit more productive than other universities – we've been miles better. No other institution even comes close. Our effect on religion, from reformation to atheism, has been profound: if Dawkins is the Devil's chaplain, Cambridge could be called the Devil's flamethrower! But it's much wider than that. Our talent for creative destruction hasn't just lead to massive advances in liberty and prosperity. It's completely changed the way people think.

The most profound innovation was science itself, which emerged in the seventeeth century among Bacon, Newton, Wilkins, Halley and their contemporaries. Science is not like religion; it's not about finding true doctrines. It's about demolishing wrong ones. Ideas are two a penny; it's the efficient destruction of error that leads us to truth. And we really need such a method. The truths at the heart of Newtonian mechanics, evolution, electromagnetism, quantum mechanics and bioinformatics are often so counterintuitive and disturbing that we only accept them when absolutely every other possibility has been shot down in flames. To understand the mechanism, you first have to burn away the myth.

So there we have it. That's us. Cambridge has been setting cultural forest fires for the last 800 years, and I sure hope we'll be setting them for the next 800 too.

Lessons for the future

So how do we keep Cambridge at the forefront? I believe that the critical lessons from our history are the importance of academic self-government and intellectual freedom. We were a self-governing community of scholars right from the start, unlike universities such as Bologna which started out as communities of students who banded together to hire teachers. Time has proved our model to be the best. And at various times, either church or state has tried to intervene, to centralise power and control us – as with Queen Elizabeth's statutes of 1570. These interventions have never had the desired effect, but have often hobbled us for a while. The worst was in the eighteenth and early nineteenth centuries when we weren't allowed to admit nonconformists. That simply handed the baton for a while to the Scottish universities, and to new institutions such as UCL. And even in the twentieth century we weren't perfect: although we admitted women from 1869 we didn't give them proper degrees and let them vote until 1947. Yet Oxford enfranchised women in 1920. Nostra maxima culpa!

Intellectual freedom is a more modern and difficult concept. Medieval and reformation academics often sought to suppress colleagues whose views they disliked; there were Dominicans training in Cambridge to hunt heretics from 1238, and when King's College was founded in 1441 all the fellows had to take an oath not to follow the teachings of Wyclif. (And who knows – perhaps these tussles left us with a certain edge, a certain wilingness to denounce error!) As for academic freedom, it seems to have put up its first tender shoots in the early 16th century. Erasmus remarked that Cambridge became much more open in the mid-1510s, and when a fellow of John's was accused of heresy in 1527, our chancellor John Fisher changed the statutes so that heretics could be sacked – yet there was great reluctance to do so! The emergence of science in the seventeenth century and the Enlightenment in the eighteenth had a huge effect, but it was the mid-19th century before we broke the stranglehold of the Anglican church. As late as 1813, Charles Babbage's thesis was considered to be blasphemous and he wasn't allowed to graduate! The mid-19th-century liberalisation was not the end of the story, though; Bertrand Russell was sacked by Trinity College in 1916 for being a conscientious objector to World War I, and there were many further tussles until the current wording of our Statute U (which protects academics against arbitrary discipline and dismissal) was drafted in the 1980s by David Williams. Freedom of speech in academia can't be totally separated from the same freedom in the rest of society, of course, but for centuries we academics have led the way.

Academic freedom and institutional self-governance are subtly but deeply linked. They are both still under pressure – from a busybody state and a centralising university bureaucracy. The hot topic in 2009, our octocentenary year, is a proposal to curtail academics' protection by abolishing Statute U and replacing it with a much more malleable Code of Practice. Authority argues that we should treat academic and other staff equally. Fine; let's extend the protections we now enjoy to other university staff too. For more, see the Campaign for Cambridge Freedoms.

Ross Anderson

Acknowledgements: I'm grateful to Elisabeth Leedham-Green, Gillian Evans, Richard Evans, David MacKay and Peter Robinson for comments and corrections. As for the heresies expressed here, I confess! They are mine!

2/03/2009

 

Amazon games to try for free

Amazon moves into casual gaming in a very big way - Ars Technica
For the first week, visitors will even be able to grab three games for free: Jewel Quest 2, Build a Lot, and The Scruffs. That, along with the ability to try every game before you buy, should be enough to entice fans of casual gaming to check out Amazon's offerings. It also looks like this could be Amazon's first step into the world of digital distribution for video games.

 

UK Treasury's best investment in history?

BBC NEWS | The Reporters | Robert Peston
The Bank of England has provided this £185bn in the form of Treasury Bills - which are short-dated government bonds that can easily be turned into cash. And in return it has received £287bn of collateral from the banks, in the form of loans made by those banks.

All of those loans received from the banks have been securitised or turned into tradable securities. And most of them are residential mortgages converted into mortgage-backed securities.

So the best way of seeing all this is as a three-year loan of £185bn to the banks, made by all of us as taxpayers, for which we've received £287bn of assets.

And, what's more, we've received a fee of 1.15% for our trouble.

For British taxpayers, that doesn't look such a terrible deal. The risk of loss to us, given that we've lent £102bn less than the face value of the collateral we've been given, looks pretty small.

But it shows you quite how serious it was that the commercial market for mortgage-backed securities had collapsed and quite how desperate the banks were to raise cash.

 

on ack

ack -- better than grep, a power search tool for programmers
Top 10 reasons to use ack instead of grep.

1. It's blazingly fast because it only searches the stuff you want searched.
2. ack is pure Perl, so it runs on Windows just fine.
3. The standalone version uses no non-standard modules, so you can put it in your ~/bin without fear.
4. Searches recursively through directories by default, while ignoring .svn, CVS and other VCS directories.
* Which would you rather type?
$ grep pattern $(find . -type f | grep -v '\.svn')
$ ack pattern
5. ack ignores most of the crap you don't want to search
* VCS directories
* blib, the Perl build directory
* backup files like foo~ and #foo#
* binary files, core dumps, etc
6. Ignoring .svn directories means that ack is faster than grep for searching through trees.
7. Lets you specify file types to search, as in --perl or --nohtml.
* Which would you rather type?
$ grep pattern $(find . -name '*.pl' -or -name '*.pm' -or -name '*.pod' | grep -v .svn)
$ ack --perl pattern
Note that ack's --perl also checks the shebang lines of files without suffixes, which the find command will not.
8. File-filtering capabilities usable without searching with ack -f. This lets you create lists of files of a given type.
$ ack -f --perl > all-perl-files
9. Color highlighting of search results.
10. Uses real Perl regular expressions, not a GNU subset.
11. Allows you to specify output using Perl's special variables
* Example: ack '(Mr|Mr?s)\. (Smith|Jones)' --output='$S'
12. Many command-line switches are the same as in GNU grep:
-w does word-only searching
-c shows counts per file of matches
-l gives the filename instead of matching lines
etc.
13. Command name is 25% fewer characters to type! Save days of free-time! Heck, it's 50% shorter compared to grep -r.

2/02/2009

 

News with a pinch of salt

BBC NEWS | Health | 'Hidden salt' in restaurant meals
Consensus Action on Salt and Health (CASH) found that nearly three quarters of the main course dishes had levels in excess of ideal daily limits.

2/01/2009

 

Ubuntu Netbook Installers

The Ubuntu netboot installers - I Still Know What You Learned Last Summer
As I understand it, there are three ways to do a netboot install. I've previously written about installing Ubuntu by booting from files downloaded to your hard disk. It's also easy to do a netboot install from either a CD or a USB key, and the procedures are very similar.

All three methods are nice because they have small initial downloads (~10 MB); they then download the rest of your OS at install time. It's a waste of time to download a ~700MB CD image if you're going to upgrade half of your packages right after installation. (Software is usually out of date by the time you install it!) The netboot installers are also versatile. They will install, at your request, any (or all!) of the following: Ubuntu desktop, Ubuntu server, Kubuntu desktop, Xubuntu desktop, Edubuntu desktop, and many more.

The hard disk method has the advantage that, of course, you don't need to use any external media. However, in my experience, the CD and USB key installers seem to be less flaky. Unlike a hard disk installation, they also work even if you don't have Grub already installed.

For either the CD or USB key methods, you can find the appropriate files here:

http://archive.ubuntu.com/ubuntu/dists/jaunty/main/installer-i386/current/images/netboot/

(This is for Jaunty/i386. If you want a different release or have a different architecture, adjust the URL accordingly.)

For CD installation: download mini.iso and burn it to your CD.

For USB media installation: download boot.img.gz and follow the instructions here. It will boil down to doing something like this:

# zcat boot.img.gz > /dev/sdX1
# aptitude install mbr
# install-mbr /dev/sdX

Then boot from your new media into the installer.

Labels:


1/29/2009

 

Good reading

Folklore.org: Macintosh Stories: Real Artists Ship
By the fall of 1983, we had committed to announcing and shipping the Macintosh at Apple's next annual shareholder's meeting, to be held on January 24th, 1984. The failure of the Twiggy disk drive almost caused us to be late (see Quick, Hide In This Closet!) , but it seemed like the new Sony 3.5 inch drive solved all of our problems, and the rest of the hardware was ready to go. The Macintosh ROM was frozen in early September and sent out for fabrication. All that remained was finishing the System Disk, and our two applications, MacWrite and MacPaint.

The software team worked hard over the Christmas break of 1983. The Finder still wasn't finished, and there were lots of performance problems, especially when copying files between disks, which seemed interminable. There was lots of integration testing to do, like cutting and pasting between applications, or applications interacting with desk accessories. As the New Year rolled around, it was clear that we were running out of time.

By the first week of January, the software team was working around the clock, testing and fixing problems that were found. Every employee in the building was drafted as a tester, and we held a series of dinners where Apple bought catered food for anyone who stayed late to test [story:90 Hours A Week And Loving It].

Finally, the deadline for finishing the software was less than a week away, and it seemed obvious that there were still too many bugs for us to ship it. Late on Friday evening, we convinced ourselves that we needed an extra week or two to fix the remaining problems. Steve Jobs was on the East Coast, along with Bob Belleville and Mike Murray, doing press for the introduction, so we arranged for a conference call early Sunday morning to tell him about the slip.

Jerome Coonen, our software manager, spoke for the team, as we gathered around the speakerphone. We were exhausted, and progress was slow. There were still bugs that we hadn't gotten to the bottom of yet, and it didn't seem possible that we could make it in the time remaining. Jerome proposed that we ship "demo" software to the dealers for the introduction, and update all the customers with final software a few weeks later. We thought Jerome was pretty persuasive as we held our breath waiting for Steve to respond.

"No way, there's no way we're slipping!", Steve responded. The room let out a collective gasp. "You guys have been working on this stuff for months now, another couple weeks isn't going to make that much of a difference. You may as well get it over with. Just make it as good as you can. You better get back to work!"

We did manage to wrangle an extra couple of days, by virtue of working the weekend and moving the deadline to 6am Monday morning, when the factory opened, instead of Friday afternoon. We agreed to go home and rest up, and then come into work on Monday ready for the final push.

The final week was one of the most intense I ever experienced. Steve wanted Bill Atkinson and myself to fly to New York to present a Mac to Mick Jagger, but I decided that I needed to stay in Cupertino to help with the bug fixing. Some of us were pausing work to get photographed for magazines like Newsweek and Rolling Stone, which made others on the team feel terrible that they were being left out. At times, the atmosphere got pretty tense.

Friday finally rolled around and it was clear that there were still too many bugs in both the Finder and MacWrite. Randy Wigginton brought in a gigantic bag of chocolate covered espresso beans, which, along with medicinal quantities of caffeinated beverages, helped us forgo sleep entirely for the last couple of days. We starting doing release cycles that were only a few hours apart, re-releasing every time we fixed a significant problem.

When a new release was ready, we would all grab it and start testing again. At one point, around 2am on Sunday night, I stumbled across a bug in the clipboard code. I thought I knew what it might be, but I was so tired that I didn't want to deal with it. I tried to pretend that I didn't see the problem, but Steve Capps was watching my expression and knew there was something wrong. I also was too tired to sustain a pretense; he grilled me about the problem and then helped me craft a fix, since I was too tired to do it on my own.

Around 4am, we had a release where everything seemed to go wrong - even MacPaint was crashing, which was usually rock solid. But our final release, around 5:30am seemed to be much better; the worst problems seemed to have receded and we thought we might actually have a decent release candidate.

We all focused on testing the final release as much as we could until 6am, when Jerome would have to leave to drive it to the factory. It looked pretty good, but soon someone found a potential show stopper - the system seemed to hang when a blank disc was inserted during MacWrite - the disk didn't start formatting like it should. I realized that it was probably hung up waiting for an event, so I reached out and tapped on the space bar, and formatting commenced. Jerome thought the bug was bad enough to hold up the release, but he left to drive it to the factory anyway, figuring they needed to start duplication even if it was just going to be a demo release.

The sun had already risen and the software team finally began to scatter and go home to collapse. We weren't sure if we were finished or not, and it felt really strange to have nothing to do after working for so hard for so long. Instead of going home, Donn Denman and I sat on a couch in the lobby in a daze and watched the accounting and marketing people trickling into work around 7:30am or so. We must have been quite a sight; everybody could tell that we had been there all night (actually, I hadn't been home or showered for three days).

Finally, around 8:30 Steve Jobs arrived, and as soon as he saw us he immediately asked if we had made it. I explained the formatting bug to him, and he thought that it wasn't a show stopper, which meant that we were actually finished. I finally drove home to Palo Alto around 9am and collapsed on my bed, thinking that I'd sleep for the next day or two.

 

Brilliant...

BBC NEWS | The Reporters | Robert Peston
And as if to rub their noses in it, the Chinese premier confided that he re-read Adam Smith over the summer (note "re-read") to reassure himself that the founder of modern economics wasn't the dogmatic opponent of government intervention that liberal market ideologues contend.

1/28/2009

 

The list is growing, but we need more!

So far the Linux Macbook Air Killer race is led by the Linux on Sony Vaio Z-series, followed by the Linux on Lenovo X301 and the Linux on Dell Latitude E4200/E4300. If you have any of those MacBook Air Killers and have installed or tried Linux on them, please sign on!

avilella - castanyes blaves
Linux and "MacBook Air killers":

* Sony Vaio Z-series and Linux --- any of the Z series laptops.
* Lenovo ThinkPad X301 and Linux
* Dell Latitude E4200/E4300 and Linux
* Dell Adamo and Linux (soon)
* HP Vodoo Envy 133 and Linux (soon)
* MSI X-Slim Series X320 and Linux (soon)
* Toshiba Portégé R500-11J and Linux (soon)

1/26/2009

 

I support the Oxbridge train line!

BBC NEWS | England | Call for fast east-west rail link
Rail campaigners are backing a 100mph fast train route across the centre of England linking Oxford and Cambridge.

The fast train route would go via Bicester, Aylesbury, Milton Keynes and Bedford with integrated links to connect Ipswich and Norwich to Swindon.

The plan involves a combination of existing, upgraded and reopened lines.

Peter Lawrence, of Railfuture, said: "We want to link fast growing communities without causing road congestion or crossing London."

The group believes the route would encourage inward investment from industry and tourism as well as providing much-needed additional capacity for passengers and freight.

"At a time when environmental concerns are top of the agenda, it's more important than ever to invest in green transport such as rail," Mr Lawrence said.

"In Wales and Scotland miles of new railway are being built. Why not in England?

"We're calling on the East of England Assembly and development agency to work with their colleagues in the South East and continue to press for this strategic rail link".

 

Wikipedia is doing fine, stop overcriticizing the project!

This is only an issue on contentious entries like politicians, etc, not on the bulk of content generated in the Wikipedia, so why is the media, including the BBC, so quick to put this fatalistic headlines on this stories?!?!?!!

BBC NEWS | Technology | Editorial row engulfs Wikipedia
It is proposing a review of the rules, that would see revisions being approved before they were added to the site.

 

After inauguration day, release 1.6 day!

Release 1.6 - BioPerl
The first stable release in the 1.6 series is now available for download.

 

Gonçalo on psoriasis GWAS

BBC NEWS | Health | Genetic 'hotspots' for psoriasis
Dr Goncalo Abecasis, one of the researchers from the University of Michigan, thinks that the discovery may allow further progress than this, with proteins produced at the "hotspots" offering possible targets for future treatments.

"This discovery highlights the role of several genes in mediating the immune responses that result in psoriasis," he said.

"Some of the highlighted genes are already targeted by effective psoriasis therapies - others may become targets for the psoriasis treatments of the future."

1/23/2009

 

OpenOffice.org 3.1

New Features in OpenOffice.org 3.1, an Early Look - OpenOffice.org Ninja
Control slideshow media

Before OpenOffice.org would play any movies and audio when the slide opened, but Impress 3.1 can flexibly start, pause, and stop media using custom animation effects.

1/22/2009

 

Peter Schiff was righ 2006-2007

What this guy is saying and the reaction of the other guys, *laughing*, at approx 1m30s...

http://www.youtube.com/watch?v=B8r-nDBx5Jg

 

Peter Schiff was righ 2006-2007

What this guy is saying and the reaction of the other guys, *laughing*, at approx 1m30s...

 

ATAC introns

Minor spliceosome - Wikipedia, the free encyclopedia
The minor spliceosome is a ribonucleoprotein complex that catalyses the removal (splicing) of an atypical class of spliceosomal introns (U12-type) from eukaryotic messenger RNAs in plant, insects, vertebrates and some fungi (Rhizopus oryzae). This process is called noncanonical splicing, as opposed to U2-dependent canonical splicing. U12-type introns represent less than 1% of all introns in human cells. However they are found in genes performing essential cellular functions.

1/21/2009

 

Who said there are no Drosophilas in Hinxton?

Drosophila Genetics and Genomics
Programme

* Introduction to Drosophila
* Advanced genetics
* Meiosis
* The Drosophila genomes
* Evolutionary genetics
* Genetic analysis of complex characters
* Genetic screens
* Mosaics and P-element systems
* Neurogenetics
* Genetic analysis of sex determination
* Transcriptomics and Proteomics
* Developmental genetics

Course organisers

* Professor Michael Ashburner (University of Cambridge, UK)
* Dr Scott Hawley (Stowers Institute for Medical Research, MO, USA)
* Dr Casey Bergman (University of Manchester)

Guest lecturers

* Steve Russell (University of Cambridge, UK)
* Ruth Lehmann (New York University School of Medicine, USA)
* Daniel St Johnston (The Gurdon Institute, Cambridge, UK)
* Kent Golic (University of Utah, USA)
* Brian Charlesworth (University of Edinburgh, UK)
* Trudy Mackay (North Carolina State University, USA)
* Leslie Vosshall (The Rockefeller University, NY, USA)
* Bruce Baker (Stanford University, CA, USA).

1/20/2009

 

Nice

Gizmodo UK : January 16, 2009
If you don't have home broadband then fear not, because there are lofty plans afoot to guarantee all UK households access to broadband.

A draft report by Lord Carter on the future of UK telecoms and the media sector proposes a "universal service commitment" to broadband along the lines of what's in place for postal and telephone services.

According to the FT the report, due out in a few weeks, calls for a minimum of 2Mb broadband to be made available to every home in the UK by 2012 - no matter what cloud-scraping mountaintop or out of the way hamlet you're stuck in.

1/19/2009

 

Looks delicious

Cheat's Pappardelle with Slow-Braised Leeks and Crispy Porcini Pangrattato Recipe : : Food Network
Ingredients

* 5 big leeks, outer leaves trimmed back, washed
* Olive oil
* 3 good knobs butter, divided
* 3 cloves garlic, peeled and finely sliced
* A few sprigs fresh thyme, leaves picked
* A small wineglass white wine
* Sea salt and freshly ground black pepper
* 1 pint good-quality vegetable or chicken stock
* 12 slices ham, preferably Parma
* 2 (8-ounce) packages fresh lasagne sheets
* All-purpose flour, for dusting
* 2 handfuls freshly grated Parmesan, plus extra for serving

For the Pangrattato:

* 1 small handful dried porcini mushrooms
* 1/2 ciabatta bread, preferably stale, cut into chunks
* Sea salt and freshly ground black pepper
* Olive oil
* 2 cloves garlic, crushed
* 1 sprig fresh rosemary

Directions

Halve the leeks lengthways and cut at an angle into 1/2-inch slices. Heat a wide saucepan, add a splash of oil and a knob of butter, and when you hear a gentle sizzling add the sliced garlic, thyme leaves and leeks. Move the leeks around so every piece gets coated. Pour in the wine, season with pepper and stir in the stock. Cover the leeks with the slices of Parma ham, place a lid on the pan and cook gently for 25 to 30 minutes. Once the leeks are tender, take the pan off the heat.
To make the pangrattato:

Whiz the mushrooms and bread with a pinch of salt and pepper in a food processor until the mixture looks like bread crumbs. Heat a generous glug of olive oil in a frying pan. Add the garlic cloves and the rosemary and cook for a minute, then fry the bread crumbs in the oil until golden and crisp. Keep shaking the pan - don't let the bread crumbs catch on the bottom. Drain on paper towels, discard the rosemary and garlic and allow the bread crumbs to cool.

Bring a big pan of salted water to the boil. Lay the lasagne sheets on a clean working surface and sprinkle with a little flour. Place the sheets on top of each other and slice into 1/2-inch strips. Toss through your fingers to shake out the pappardelle, then cook in the boiling water 2 minutes or until al dente.

Remove the Parma ham from the saucepan, slice up and stir back into the leeks. Season to taste with salt and pepper, then stir in the Parmesan and the rest of the butter. Drain the pasta, reserving a little of the cooking water, and add the pasta to the leeks. Add a little of the cooking water if need be, to give you a silky, smooth sauce. Serve quickly, sprinkled with some pangrattato, extra Parmesan and any leftover thyme tips. Serve the rest of the pangrattato in a bowl on the side.

"Our agreement with the producers of "Jamie at Home" only permit us to make 2 recipes per episode available online. Food Network regrets the inconvenience to our viewers and foodnetwork.com users"

 

Becoming a fan of this guy, Robert Peston

In terms of UK financial insightful analysis, this guy has gained a prime pedestal in my heart. Sort of the Noam Chomsky but for UK finantial world to me right now! BBC NEWS | The Reporters | Robert Peston
As I said this morning, this is not a bank rescue plan. But if it had been, it would have failed miserably. Barclays' share price has fallen again today. At the current price of 90p, this bank's entire market value is £7.5bn. And remember, this is a bank that said on Friday night that its profits for 2008 were considerably more than £5.3bn. In other words, investors currently value this giant international bank at a little over one year's profits. Which is little short of extraordinary. And let's not even mention that Royal Bank of Scotland's shares are down by more than 50%, on the supposedly reassuring news that taxpayers will be sharing in its future pain. Confidence has drained from the banking system. And to state the obvious, today's myriad announcements from the Treasury have not succeeded in rebuilding that confidence, which is so vital to a functioning economy.

1/17/2009

 

Next week's new US presidency

I was when listening to this speech. When were you?

Barack Obama’s New Hampshire Primary Speech - New York Times
It was a creed written into the founding documents that declared the destiny of a nation: Yes, we can.

It was whispered by slaves and abolitionists as they blazed a trail towards freedom through the darkest of nights: Yes, we can.

It was sung by immigrants as they struck out from distant shores and pioneers who pushed westward against an unforgiving wilderness: Yes, we can.

It was the call of workers who organized, women who reached for the ballot, a president who chose the moon as our new frontier, and a king who took us to the mountaintop and pointed the way to the promised land: Yes, we can, to justice and equality.

Yes, we can, to opportunity and prosperity. Yes, we can heal this nation. Yes, we can repair this world. Yes, we can.

1/16/2009

 

Prediction of 16%

Spain: economy will shrink, unemployment will soar - Yahoo! News
Spain's economy will contract 1.6 percent this year and unemployment will jump to nearly 16 percent, the government predicted Friday in a desperately gloomy outlook for a country that had been one of Europe's great success stories.

1/13/2009

 

Giving the finger

BBC NEWS | Health | Finger size link to earning power
The length of a man's fingers may predict his success in the City, research findings suggest.

Labels:


 

This is the wisest I have seen in a long time

BBC NEWS | England | London | Protesters buy up Heathrow land
Land earmarked for the construction of Heathrow's third runway has been bought by anti-expansion protesters.

Labels:


1/12/2009

 

Nature vs nurture -- it's DNA methylation

Nutritional Control of Reproductive Status in Honeybees via DNA Methylation -- Kucharski et al. 319 (5871): 1827 -- Science
Fertile queens and sterile workers are alternative forms of the adult female honeybee that develop from genetically identical larvae following differential feeding with royal jelly. We show that silencing the expression of DNA methyltransferase Dnmt3, a key driver of epigenetic global reprogramming, in newly hatched larvae led to a royal jelly–like effect on the larval developmental trajectory; the majority of Dnmt3 small interfering RNA–treated individuals emerged as queens with fully developed ovaries. Our results suggest that DNA methylation in Apis is used for storing epigenetic information, that the use of that information can be differentially altered by nutritional input, and that the flexibility of epigenetic modifications underpins, profound shifts in developmental fates, with massive implications for reproductive and behavioral status.


DNMT3 has a role in DNA methylation and genomic imprinting, with DNMT3L (EnsemblCompara GeneTree) possibly required for germ line DNA methylation. DNMT3B has been associated to lung cancer and other genetic disorders.

Labels: , ,


1/11/2009

 

Worm genome for £25,000

Genome hunters set sights on creatures great and small | Science | guardian.co.uk
Professor Mark Blaxter of Edinburgh University hopes to redress this deficit by sequencing more invertebrate genomes, starting with the earthworm. Blaxter says his lab is taking a leap of faith by applying new sequencing technologies to the challenge of a whole genome, and is spending a mere £25,000 on the project, compared with previous genome projects costing millions. The whole process is surprisingly fast – the donor worm was squashed in October and he estimates the lab will have a completed genome by March 2009.

1/09/2009

 

ProbconsMorph

PLoS Genetics: Evolution of Regulatory Sequences in 12 Drosophila Species
Multiple Sequence Alignment and Insertion/Deletion Annotations

For the analysis of TFBS evolution, we developed a new multiple alignment program, “ProbconsMorph”, by integrating Probcons [41], a consistency based multiple sequence alignment program, and Morph [38], a pair-wise sequence alignment program that is specially designed to align regulatory modules. Morph uses a pair-HMM as a generative model for alignment of two orthologous CRMs, and is parameterized by the given motifs, as well as various evolutionary rate parameters that it fits to the data. It uses maximum likelihood inference to simultaneously perform TFBS annotation and alignment. It reports for every pair of positions in the two sequences, the posterior probability that they are aligned. Morph was run to produce such a probabilistic alignment of every pair of species. Probcons takes such pair-wise alignment probabilities and builds a multiple sequence alignment progressively, while using the “consistency transformation”: the probability of alignment of two nucleotides and is updated based on the alignment probabilities of and and of and , where is a nucleotide from a third species. We have shown previously that Morph provides practical benefits for inference of evolutionary events and rates by computing a better alignment; ProbconsMorph is an effective and efficient extension of this program to more than two species. We made two simple modifications to Probcons to integrate it with Morph: firstly, Probcons was made to work on DNA sequences (the current implementation handles protein sequences only), and secondly, it was made to accept a phylogenetic tree as input, rather than estimate the tree at run-time. The ProbconsMorph software is publicly available at our site http://europa.cs.uiuc.edu/TFBSevolution/.

 

CEGMA

Core eukaryotic genes dataset
CEGMA distribution contains several directories and files compressed in a tar.gz file. Source code and documentation files are included in the distribution, as well as several parameters files and other extra information. If you prefer not to install the software you can contact us at genparra(at)ucdavis.edu, and we will run the pipeline in your favorite genome.

 

Loading to Ensembl core

Plant Tech Tonics » Blog Archive » Loading an Ensembl species database from scratch
A question came up on the ensembl-dev mailing list to the effect “how do I create an Ensembl species core database from scratch”. As this is something that we have to do once in a while, I thought it may be a good topic for a blog post.

I will concentrate on getting a ‘minimal’ database populated, i.e. the sequences, genome assembly and genes (without additional annotation).

In my experience, the exact approach for each species differs, mainly due to the different formats and semantics of the source data. The following description will probably need tweaking for a particular species.

[...]

 

Garage genome hackers

Rise of the garage genome hackers - life - 07 January 2009 - New Scientist
KATHERINE AULL's laboratory in Cambridge, Massachusetts, lacks a few mod cons. "Down here I have a thermocycler I bought on eBay for 59 bucks," she says, pulling out a large, box-shaped device she uses to copy short strands of DNA. "The rest is just home brew," she adds, pointing to a centrifuge made out of a power drill and plastic food container, and a styrofoam incubator warmed with a heating pad normally used in terrariums.

In fact, Aull's lab is a closet less than 1 square metre in size in the shared apartment she lives in. Yet amid the piles of clothes she recently concocted vials of an entirely new genetically modified organism.

Aull, who works as a synthetic biologist for a biotech company by day, created her home lab after hearing about a contest on the science fiction website io9.com for "mad scientists with homebrew closet labs, grassroots geneticists, and garage genome hackers".

After two months of tinkering, she engineered a microbe that she says is capable of performing simple logic operations, which could be the forerunner to basic biological computers. "Biology is wet, squishy and imprecise. It drives engineers insane," Aull says. "This would allow us to take the noise out of biology."

 

Robert Peston on Osborne's three-pronged strategy to stimulate the economy

BBC NEWS | The Reporters | Robert Peston
The shadow chancellor, George Osborne, today called for a three-pronged strategy to stimulate the flow of credit: state insurance for £50bn of lending to business; a cut in the cost of the capital that's been provided by the Treasury to banks and also a reduction in the fees levied for guaranteeing interbank lending; and a new Bank of England facility to swap corporate loans for cash.

This would amount to a substantial extension of the nationalisation of the credit-creation process - and Osborne acknowledges that more draconian nationalisation may yet be necessary.

It's quite something when the Tories complain that a Labour-controlled Treasury is being too cautious in rolling back the boundaries of the private sector and too feeble when pumping up the role of the state.

 

Microbiomes

GenomeWeb news: GMU's Human Microbiome Research Center to Run Pyrosequencing Disease Studies
NEW YORK (GenomeWeb News) — George Mason University is starting a center that will focus on the human microbiome and will conduct genomics studies of microbes that could be involved in diseases, the university said this week. Funded in part by the National Institutes of Health and the US Department of Defense, the MicroBiome Analysis Center that will attempt to “map the world” of the bacteria, viruses, fungi, and protozoa that inhabit humans, and to study their effects on human health. The MBAC, which will focus on using multitag pyrosequencing, specifically will study microbial imbalances in the gut, mouth, respiratory tract, urinary, and reproductive systems. Using pyrosequencing will allow researchers at the center to “examine, count, and barcode hundreds of thousands of microorganisms per day” using samples from different parts of the human body. "This center will allow us to sequence and characterize these microorganisms in order to study their relationship to diseases such as obesity, cancer and irritable bowel syndrome," MBAC Director Patrick Gillevet said in a statement. Gillevet developed and patented the multitag pyrosequencing technology which will “serve as the backbone of the center’s research efforts,” GMU said. "Before this technology was developed, we would have been hard-pressed to identify a couple hundred of microbes per sample,” Gillevet said. “Now, we are identifying 50,000 or 60,000 microbes per sample. We can literally do in an afternoon what it took us 10 years to do in the past.” Gillevet’s team is currently working with others at Rush University Medical Center in Chicago to study the presences of microorganisms in patients with breast cancer, Crohn’s Disease, inflammatory bowel disease, cirrhosis of the liver, and HIV. "Finding the microbes responsible for particular diseases may increase the likelihood of developing new diagnostic tests and treatments for them,” he said.

 

GATTACA one step closer to reality, but in a good way

1/07/2009

 

R, the software, finds a place in the NYT

R, the Software, Finds Fans in Data Analysts - NYTimes.com
R first appeared in 1996, when the statistics professors Robert Gentleman, left, and Ross Ihaka released the code as a free software package.

1/02/2009

 

Interesting proteomics dataset for Arabidopsis

Discovery and revision of Arabidopsis genes by proteogenomics — PNAS
Gene annotation underpins genome science. Most often protein coding sequence is inferred from the genome based on transcript evidence and computational predictions. While generally correct, gene models suffer from errors in reading frame, exon border definition, and exon identification. To ascertain the error rate of Arabidopsis thaliana gene models, we isolated proteins from a sample of Arabidopsis tissues and determined the amino acid sequences of 144,079 distinct peptides by tandem mass spectrometry. The peptides corresponded to 1 or more of 3 different translations of the genome: a 6-frame translation, an exon splice-graph, and the currently annotated proteome. The majority of the peptides (126,055) resided in existing gene models (12,769 confirmed proteins), comprising 40% of annotated genes. Surprisingly, 18,024 novel peptides were found that do not correspond to annotated genes. Using the gene finding program AUGUSTUS and 5,426 novel peptides that occurred in clusters, we discovered 778 new protein-coding genes and refined the annotation of an additional 695 gene models. The remaining 13,449 novel peptides provide high quality annotation (>99% correct) for thousands of additional genes. Our observation that 18,024 of 144,079 peptides did not match current gene models suggests that 13% of the Arabidopsis proteome was incomplete due to approximately equal numbers of missing and incorrect gene models.


Plant genomics has always been one step behind human genomics in terms of the amount of available evidences accumulating to build gene models. Even in the most well-known plant model, Arabidopsis, there is a lot of room for improvement. This paper shows how the latest genome-wide tools for gathering protein coding evidences, like tandem MS (or RNA-seq) can improve the current protein coding gene set.

1/01/2009

 

Prelude

» My window theme » as days pass by, by Stuart Langridge
Thomas Thurman, champion that he is, took my request for a theme for Metacity that clones the XFCE Prelude theme and went ahead and did it. What a hero.

Anyway, I’ve tweaked his work a little to make it closer to the XFCE theme (primarily making the border thinner and nudging some of the colours, and hiding the buttons on inactive windows) and you can grab the theme file. Save it as $HOME/.themes/Prelude/metacity-1/metacity-theme-1.xml, then say System > Preferences > Appearance > Customise > Window Border and choose Prelude.

12/31/2008

 

Latitude ON Reader and Ready

Latitude ON Reader... - Page 3 - Notebook Forums and Laptop Discussion
Reader = Software (Linux) booting from HDD where you can access contacts and email stored in outlook. NO internet connectivity.... but includes the internal antenna wiring required for internet connectivity if you upgrade to the latitudeON hardware.

LatitudeON = Additional hardware used to run the Linux environment. Can connect to the internet via wifi / WWAN / Ethernet ... can connect to more email account than just outlook... Includes web-browser.

Ready = The motherboard in your machine has a socket to support the latitudeON hardware.

 

GPU closer to general-purpose computing

AMD unleashes open-source 3D code • The Register
AMD has released "the fundamental Linux code" needed to develop open-source 3D-acceleration drivers for its R600 and R700 ATI graphic-processors series.
OpenCL - Wikipedia, the free encyclopedia
AMD has decided to back OpenCL (and DirectX 11) instead of its now deprecated Close to Metal (aka Stream) framework.[5][6] RapidMind announced their adoption of OpenCL underneath their development platform, in order to support GPUs from multiple vendors with one interface.[7] Nvidia announced on December 9, 2008 to add full support for the OpenCL 1.0 specification to its GPU Computing Toolkit.[8]

If there is one characteristic about the computing world it would be that there are so many idle hands in bedrooms in front of cheap PCs nowadays that, no matter how imaginative you are about new developments, somebody has already put such idle hands to work on it. But these idle hands need foundations, tools to use in their tweaking and hacking. This is why the GCC toolchain and Linux kernel were of such importance a decade ago for the Googles, Yahooes and Facebooks of today. In another turn of the crank, the SETI@home and related "@home" projects took advantage of all those juicy idle CPUs from all those cheap PCs to enable new developments never imagined before. And now there is another opportunity: juicy *G*PUs in combination with juicy CPUs. Why is the announcement by AMD so important? Well, when you release the full 3D documentation for these mighty GPUs, you are providing all those bedrooms with a lot of new tools to play around with. And it seems OpenCL may have been approved in time to lay down a spec that everybody builds on. Will see. Interesting...

 

The mighty power of *escaped* beaver...

BBC NEWS | England | Escaped beaver fells river trees
Escaped beaver fells river trees

12/27/2008

 

Git for Emacs

Magit - I Still Know What You Learned Last Summer
Magit is a spectacular Emacs add-on for interacting with git. Magit was designed with git in mind (unlike VC mode, which is a more generic utility), so git commands map quite straightforwardly onto Magit commands. M-x magit-status tells you about the current state of your repo and gives you one-key access to many common git commands. However, what really sold me on Magit was its patch editor, which completely obsoletes my use of git add, git add --interactive, and git add --patch. If Magit had this patch editor and nothing else, I would still use it. That's how great this is

12/25/2008

 

Linux versus Mac vs Windows graphics

Slashdot | Linux 2.6.28 Promises Year-End Presents
Maybe I'm misunderstanding your assertion, but hasn't MacOS X had universal GPU RAM management for many years? I don't think MS has any monopoly on this... it was my impression that it was just Linux that was on Microsoft's heels playing catch-up.

Yes you are misunderstanding, and NO the Mac has not...

OS X uses the 3D GPU as a bitmap composer for the display, and that is it.

OS X's composer is years behind most Linux desktop composers as well as Vista's DWM/Aero. Vista's DWM for example is Vector/Bitmap based, and works with the WDDM of Vista that gives it a lot of power. (WDDM is the new driver model in Vista)

Here are some of the things Apple needs to add to catch up to even Vista.

- GPU RAM Virtualization/sharing (something kind of like what they are trying do with the Intel chipsets and Linux in this article - except Vista does this over the AGP/PCI bus with any Video card and works with or without dedicated GPU VRAM.

- GPU Scheduler - In Vista, the OS, not the applications controls the GPU, and Vista brings pre-emptive multi-tasking to the GPU. (And no this is not like OpenGL applicaiton yielding/cooperative multitasking, as DirectX also does what OpenGL does. This is an OS level management system that opens up a new way of thinking beyond one 3D application on screen at a time concepts that don't depend on applications yielding the GPU. Kind of like the move to the 32bit era where the Intel CPUs offered a pre-emptive scheduler.

(Example: several games on screen at once in Vista, set transparent with a HD video waterfall playing in the background and losing very few FPS in each game and Aero also using the 3D GPU to do its things, like compose the Game Windows with a transparent waterfall behind them and do a shared texture combine write to the video card.) - This is not something you want to try on a Geforce 5200, but it will work, and on newer video cards, even the 7900 series from Geforce, you can do some really amazing things when running multiple games 'viewable' on the screen at once.

- Legacy application 3D acceleration. Apple tried to get this going with 10.4 as an optional switch, but it was too buggy and scrapped as a feature for 10.5. This means that OS X still renders content using good old fashion legacy 2D GPU features or SSE Intel extensions. On Vista, even Windows 3.1 applications get a performance boost as GDI drawing, Font Rendering, and even internal bitmap APIs are shoved through the 3D GPU because it is significantly faster than older 2D GPU rendering methods.

- Vector composer. On Vista when it is running newer WPF applications, instead of the DWM getting a bitmap that is composed to the final render of the screen, the WPF applications tell the DWM/Aero what changes are made, usually vector based (XAML), and the Vista composer makes the changes at the composer level instead of the application having to redraw the application and send a new bitmap of the Window to the composer to assemble. (This is also why RDP (Remote Desktop) on Vista is faster and more featured than XP, as it works at the DWM level and a lot of the operations sent over the network to render the screen are vector based and lightweight, leaving the client to do the heavy rendering instead of passing bitmaps all the time. This is why you can do Aero Glass and WPF 3D over a slow RDP connection remotely.

----
Ok, I am going to stop here, as I am writing this off the top of my head and it would better if you would just visit technet at www.microsoft.com and lookup the Vista WDDM and DWM and WPF technologies.

The whole driver and video changes in Vista were dramatic and borrowed ideas from the XBox 360 development team and do some really impressive things, even though MS didn't put much into the 'cute' uses of it in the UI like they are doing with Windows7.

---

Linux/*nix also has some good composer technologies that make OS X's composer pretty sad in comparison.

With Linux there are still some major driver and kernel level hurdles

 

1000 Genomes project, or more?

It turns out that the Solexa machines are getting better at such a pace that the calculations that the 1000 Genomes Project made are no longer true. Under the assumption that the read length and throughput of these NextGen machines would increase in 2008 and 2009, the project was funded enough money to fully sequence 1000 genomes from a panel of diverse ethnicities. The production capacity is currently led by the Sanger Institute, the Beijing Genomics Institute and the Broad Institute, then Baylor and WashU. The MaxPlanck offered to do some production in mid 2008, and Illumina, Roche and ABI are also contributing. Now the machines are better which means that the project is going to aim at even more than 1000 genomes. Where does the money come from? Well, it comes from biomedical research funding, as the aim of the project is to create a deep catalog of human genetic variation that will represent all rare shared variants in our species. This catalog will facilitate biomedical research by enabling the prospection of phenotypes on all the sampled genotypes, and link both to identify the causes of human diseases and traits. Beyond this obvious goal, such a deep sampling of population genomics data will give us great clues on the evolutionary processes that took place in our genome in the last hundreds of thousands of years. Particularly, one will be able to see what are the polymorphism patterns in the chromosomes, and how these correlate with all the genetic features we are getting from another big project, the scale-up ENCODE project. Add to that the comparative genomics information to closely related monkeys to compare divergence vs polymorphism levels, and you have a winner!

Now that we even have a browser for the 1000 genomes project, you can get a snippet of the kind of data the project will produce:

http://browser.1000genomes.org/Homo_sapiens/genesnpview?db=core;gene=ENSG00000128573;context=200
http://browser.1000genomes.org/Homo_sapiens/transcriptsnpview?db=core;transcript=ENST00000393489;context=200

Notice the "context=200" argument in GeneSNPView and TranscriptSnpView URLs: some people may mistakenly think that the intronic sequences are depleted of variation when one would expect to have most of the SNPs there: well, they are there, but the context in Gene and Transcript SNP Views restricts intronic SNPs to 100bp left and right to the exon by default. This allows for a more coding-centric view of variations, which according to the Ensembl HelpDesk tickets, is what people working in hospitals around the world really like about this view.

I remember when I joined the Ensembl project three years ago that these new machines were only a rumour, something that was secretly happening in a small science park in Great Chesterford, something that people at that time was dismissing simply as an undelivered promise: "Oh, but I've heard that they can only sequence 25bp pairs...", etc, etc. It's been like that for a lot of other scientific and technological promises:

- like production plug-in electric/hybrid cars -- "Oh, but I've heard that they only have an autonomy of a few miles..."
- inexpensive solar energy on the roof of your house --  "Oh, but I've heard that they only pay after 25 years..."
- your very own robotic butler -- "Oh, but I've heard that it doesn't even know how to make a good latte..."

12/22/2008

 

System notifications

Mark Shuttleworth on system notifications. Interesting read.

 

BBC iPlayer for Linux

I've tried the BBC iPlayer for Linux this afternoon, and started downloading stuff into my Ubuntu laptop for my Christmas journey.



Word of caution: the player will complain that you have reached the download limit very soon,

unless you go to Settings and define a bigger "Hard disk space allocated".



I haven't found any glitches or problems so far, apart from the fact that one cannot download stuff that is

older than a given date. I wanted to get the full collection of "Stephen Fry in America" but I can only

download the latest one. Ugggh!

Thanks British Broadcasting Corporation for this Christmas gift!

Update: I've been pointed to the wonderful alternative called get_iplayer.pl, which provides even more functionality than the Windows and Mac official players...

12/21/2008

 

The G1 reviewed on the BBC News

BBC NEWS | Programmes | Click | Gadgets to keep you entertained
The G1 is one one of my favourite phones of 2008. OK, it is not the best looking, but the full qwerty keyboard is a joy to use and the trackerball is responsive enough to keep Blackberry fans happy. It allows scrolling without having to put your finger over what you are reading!

Add a smooth responsive touchscreen for all those who wanted an iPhone but just could not bring themselves to jump on Apple's bandwagon and this phone just bleeds control options. It offers speedy connectivity via 3G and wi-fi.

Google maps works a treat with GPS, and satnav is free with no subscriptions or need to buy extra maps that you sometimes find when you go with other suppliers.

Marketplace - the G1's "App Store" - is a mine of free whizzy gadgety apps that are added to on a daily basis.

I downloaded the latest Opera Mini browser (4.2) - a speedy internet browser for mobiles and now I do not miss the pinching action I was able to do with my iPhone to zoom in and out.

When I zoom in on my G1 the text on each page automatically reformats to fit the screen. Nice! The G1 is all about choice.

Whilst Apple's iPhone does things with more style and more glamour, the G1 is a more than capable and fun alternative which will only get more useful each day.

12/19/2008

 

Wrong way to go

Spain proposes tougher laws for immigrants - Yahoo! News
Grappling with rising unemployment and a moribund economy, the Spanish government proposed new immigration rules Friday to limit the influx of immigrants.

The measures, which need Parliamentary approval, would let police hold undocumented aliens longer pending expulsion and make it harder for foreign-born residents to bring relatives over. They are yet another reflection of the dramatic turnabout in Spain's economy
.


If one discourages immigration, the end result is a worsening economy. Albania, from the de-Stalinization until the end of Enver Hoxha's ruling (1960-1985) is an example of a country where influx of people was null for 25 years, and that didn't work quite well for them...

12/18/2008

 

Merry Christmas!

BBC NEWS | Technology | BBC iPlayer now available on Mac
The BBC has created a version of the iPlayer that works with both Mac and Linux computers.

 

Production Plug-in Hybrid

BYD F3DM - Wikipedia, the free encyclopedia
The BYD F3DM is a production plug-in hybrid compact sedan for sale December 15th, 2008 in China[1] and 2010 in Europe.[2] The F3DM was introduced at the 2008 Geneva Motor Show. The automaker expects to boost total sales to 350,000 cars next year from an expected 180,000 this year, founder and Chairman Wang Chuanfu told reporters in Shenzhen Dec 15th, 2008. U.S. sales of the F3 DM will likely start in 2011, Wang said.

 

MySQL landscape

The New MySQL Landscape (by Jeremy Zawodny)
Interesting things are afoot in the MySQL world. You see, it used to be that the MySQL world consisted of about 20-40 employees of MySQL AB (this funny distributed Swedish company that built and supported the open source MySQL database server), a tiny handful of MySQL mailing lists, and large databases were counted in gigabytes not terabytes. A Pentium III was still a decent server. Replication was a new feature!

Hey, anyone remember the Gemini storage engine? :-)

How times have changed...

Nowadays MySQL is sort of a universe onto itself. There are multiple storage engines (though MyISAM and InnoDB are still the popular ones), version 5.1 is out (finally), and the whole company made it over 400 employees before it was gobbled up by Sun Microsystems (a smart move, IMHO, though history will judge that) a while back.

If I had to guess 5 years or so ago what would be interesting to me today about MySQL, I'd have been really, really wrong. The future rarely turns out like we think. Just ask Hillary Clinton.

Here's a little of what's rattling around in the MySQL part of my little brain these days...
Outside Support, Patches, and Forks

The single most interesting and surprising thing to me is both the number and necessity of third-party patches for enhancing various aspects of MySQL and InnoDB. Companies like Percona, Google, Proven Scaling, Prime Base Technologies, and Open Query are all doing so in one way or another.

On the one hand, it's excellent validation of the Open Source model. Thanks to reasonable licensing, companies other than Sun/MySQL are able to enhance and fix the software and give their changes back to the world.

Some organizations are providing just patches. Others, like Percona are providing their own binaries--effectively forks of MySQL/InnoDB. Taking things a step further, the OurDelta project aims to aggregate these third party patches and provide source and binaries for various platforms. In essences, you can get a "better" MySQL than the one Sun/MySQL gives you today. For free.

Meanwhile, development on InnoDB continues. Oh, did I mention the part where they were bought by Oracle (yes, *that* Oracle) a while back? Crazy shit, I tell you. But it makes sense if you squint right.

Anyway, the vibe I'm getting is that folks are frustrated because there's not a lot of communication coming out of the InnoDB development team these days. I can't personally verify that. It's been years since I corresponded with Heikki Tuuri (the creator of InnoDB). So folks like Mark Callaghan of Google have been busy analyzing and patching it to scale better for their needs.

And we all benefit.
Drizzle

Taking things a step further yet, the Drizzle project is a re-making of MySQL started primarily by Brian Aker, who worked as MySQL's Director of Architecture for years. Brian is now at Sun and, along with a handful of others at Sun and elsewhere, is ripping out a lot of the stuff in a fork of MySQL that doesn't get used much, needlessly complicated the code, or is simply no longer needed.

In essence, they're taking a hard look at MySQL and asking what it really needs to provide for a lot of it's uses today: Web and "cloud" stuff. He visited us at Craigslist a few months ago to talk about the project a bit and get our input and feedback. I believe it was that day I joined one of the mailing list and started following what's going on. Heck, I even build Drizzle on an Atom-powered MSI Wind PC regularly.

It's great to see a re-think of MySQL going on... keeping the good, getting rid of the bad, and modularizing the stuff that people often want to do differently (authentication, for example).

It's even better to see the group that's hacking on it. They really have their heads on straight.
Unanswered Questions

Why is all this even necessary? Are the "enterprise" customers and their demands taking focus away from what used to be the core use and users of MySQL? Is Sun hard to work with?

It's clear that both the MySQL and InnoDB teams could be doing more to help. But having worked at a large company for long enough, I realize that things are rarely as simple as they should be.

Will this stuff get integrated back into mainline MySQL? Will Linux distributions like Ubuntu, Debian, and Red Hat pick up OurDelta builds? What about Drizzle?

Will Drizzle hit its target and be the sleek and lean database kernel that MySQL once could have been?

Hard to say.

It's hard to guess what the future holds and too easy to play armchair quarterback about the work of others. But these are question worth wondering about a bit.
What's it all mean?

Nowadays MySQL has a much slower release cycle that it used to. It's still available in "commecial" and free ("community") releases. There's still a company behind it--a much larger one in fact. But one that also has a vested interest in showing how it works better on their storage appliances or 256 "core" computers and whatnot.

Clustering is still very niche. Transactions are not.

Meanwhile, all the cutting edge stuff (at least from the point of view of scaling) is happening outside Sun/MySQL and being integrated by OurDelta and even Drizzle. The OutDelta builds are gaining steam quickly and Drizzle is shaping up.

Heck, I'm hoping to get an OurDelta box or two on-line at work sometime soon. And I'd like to put a Drizzle node up too. I want to see how the InnoDB patches help and also play with the InnoDB plug-in (and its page compression).

The next few years are proving to be far more interesting than I might have expected from a project and technology that looked like was on a track straight for Open Source maturity.

And you know what? I like it.

12/16/2008

 

12.8% and rising

Economic crisis means brisk business for Spanish pawnshops - Yahoo! News
Pawnshops and second-hand stores are doing a brisk business in Spain where the end of a decade-long property boom has pushed the country's unemployment rate to 12.8 percent in October, according to Eurostat, its highest level in over four years and the highest rate in the 27-nation European Union.

 

Advise which obviously is going to be ignored

BBC NEWS | Technology | Internet Explorer security alert
Users of the world's most common web browser have been advised to switch to another browser until a serious security flaw has been fixed.

 

GPU-HMMER

mpiHmmer
Installation Guide

Compiling and Installing GPU-HMMER for Linux

Prerequisites : You will need to have a C++ compiler (assumes g++), a C compiler, and the NVIDIA nvcc compiler in your path. Also take note of where you have installed the CUDA SDK. GPU-HMMER assumes that the CUDA SDK is in the default location (normally ${HOME}/NVIDIA_CUDA_SDK), but if it's not, you'll need to make a couple of changes, described later. GPU-HMMER was designed to support version 2 of the CUDA SDK.

Installation Steps:

1) Download and untar GPU-HMMER from the source archive.


* Get GPU-HMMER from our downloads section.

* tar xvzf GPU-HMMER-0.9.tar.gz

2) Configure and compile GPU-HMMER.

* run configure as you normally would :

./configure

* At this point, if you have any non-standard install options you need to apply those to the Makefile in the src directory. In particular, if you need to change the location of the CUDA SDK, you should change src/Makefile to point CUDAINCDIR to your custom location. Also, you can change the number of active sequences by changing CUDADEFS in src/Makefile. If you're using the 8800 GTX Ultra, you shouldn't need to make any changes. Otherwise, you can raise or lower the thread size and block size as your architecture allows. When you're done, build GPU-HMMER from the toplevel directory by executing:

make

* Make should complete without errors

3) Installing GPU-HMMER.

* You are not required to install the GPU-HMMER binaries in the typical system locations. Instead, GPU-HMMER can be left in-place and will run without errors. For users wishing to perform a true installation, please execute:

make install

12/15/2008

 

NYTProf and mod_perl

'A great new profiler which works with mod_perl - Devel::NYTProf' - MARC
Using it in mod_perl is as easy as adding this to your httpd.conf (before you load any other Perl modules): PerlModule Devel::NYTProf::Apache It is worth starting apache in single process mode otherwise it writes to multiple log files: apachectl -X Once you've finished using your code (in order to generate stats), you can generate the reports with: nytprofhtml -f /tmp/nytprof.$PID.out If you've got your Perl modules in some directory other than the compiled in @INC, you can use:  nytprofhtml -f /tmp/nytprof.$PID.out -lib /path/to/libs I use a framework which autogenerates a number of classes, so I had to edit nytprofhtml to add a few lines in order to initialise my framework before running the report. Try this out - it is super easy, and very very useful

 

The Economist on netbooks

How to choose a netbook | Small is beautiful | The Economist
STEVE JOBS says Apple does not know how to make a $500 computer “that’s not a piece of junk”.[...]

The most basic model of the Acer Aspire One can be found for £179 in Britain and around $300 in America. It simply switches on and runs with the minimum of fuss. It has 8 gigabytes (GB) of flash storage and 512 megabytes of RAM, which is a bit puny. But that is perfectly adequate to run the customised version of Linux that comes pre-installed on it, along with a suite of software, including Open Office. With no hard drive, and a switch to turn off the wireless connection (not the fastest in the world), power can be conserved. So a bigger, bulkier battery may not be necessary either, unless you want to use the computer untethered for long periods. Because it boots up in a few seconds, rather than thinking of the Acer as a mini laptop it might make more sense to view it as a beefed-up personal digital assistant, such as an old PalmPilot or Psion, but with a better screen and a proper keyboard.



12/14/2008

 

Well-earned victory



FCB demonstrated that they are the new updated Johan Cruiff's team, playing fast with great rhythm and very confident. Well done and all the luck!

12/13/2008

 

Efforts for war vs efforts for peace

Wind, water and sun beat other energy alternatives, study finds
Because the wind turbines would require a modest amount of spacing between them to allow room for the blades to spin, wind farms would occupy about 0.5 percent of all U.S. land, but this amount is more than 30 times less than that required for growing corn or grasses for ethanol. Land between turbines on wind farms would be simultaneously available as farmland or pasture or could be left as open space.

Indeed, a battery-powered U.S. vehicle fleet could be charged by 73,000 to 144,000 5-megawatt wind turbines, fewer than the 300,000 airplanes the U.S. produced during World War II and far easier to build. Additional turbines could provide electricity for other energy needs.

12/11/2008

 

Stephen Fry's gPhone versus iPhone


The New Adventures of Mr Stephen Fry
Without most or all of these requests being implemented Apple will find itself in danger of falling behind. But hell, they know that better than me, and I’m sure they will surprise us with capabilities I haven’t begun to think of. I believe that not only can they now afford to open up but also that they cannot afford not to. Google’s Android is the reason they have to redouble their efforts as we shall see later on.

I disagree with Stephen Fry's comment on Apple's need to open up. I think they cannot afford to open up because their code is what currently makes the iPhone sleeker than the gPhone or other phones. This may not be true at some point, if enough people pick up on the Open Handset Alliance, but right now, Apple simply has better code on a hardware that is not special in any way. This latest statement is also true in some aspects for Apple's laptops and desktop computers. They shifted to Intel processors a few years ago, so now any comparison at the hardware level is trivial, leaving only the smoother corners and slicker mice and keyboards as an excuse. For casual users of personal computers, I agree that Apple has a smoother user experience in today's computers, solely on the basis of their proprietary code. Their hardware is overpriced, to the extend that a 250GBP netbook with a hacked version of OSX Leopard seems pretty functional to me!

http://uk.youtube.com/watch?v=McI2zZHewsE


12/10/2008

 

If this is a prank, it's a very good one...

Blog of helios: Linux - Stop holding our kids back
This blog is momentarily interrupted to bring you a snippet of recently received email.

"...observed one of my students with a group of other children gathered around his laptop. Upon looking at his computer, I saw he was giving a demonstration of some sort. The student was showing the ability of the laptop and handing out Linux disks. After confiscating the disks I called a confrence with the student and that is how I came to discover you and your organization. Mr. Starks, I am sure you strongly believe in what you are doing but I cannot either support your efforts or allow them to happen in my classroom. At this point, I am not sure what you are doing is legal. No software is free and spreading that misconception is harmful. These children look up to adults for guidance and discipline. I will research this as time allows and I want to assure you, if you are doing anything illegal, I will pursue charges as the law allows. Mr. Starks, I along with many others tried Linux during college and I assure you, the claims you make are grossly over-stated and hinge on falsehoods. I admire your attempts in getting computers in the hands of disadvantaged people but putting linux on these machines is holding our kids back.

This is a world where Windows runs on virtually every computer and putting on a carnival show for an operating system is not helping these children at all. I am sure if you contacted Microsoft, they would be more than happy to supply you with copies of an older verison of Windows and that way, your computers would actually be of service to those receiving them..."

Karen xxxxxxxxx
xxxxxxxxx Middle School
AISD

12/09/2008

 

Light concentrator for solar devices

Efficiency records claimed for solar devices : Nature News
Light concentrator promises higher power output.

Katharine Sanderson

A transparent polymer plate has set a new record for concentrating the Sun's light on to a solar cell, boosting the promise of a technology that could lead to vastly improved solar-power capabilities at much cheaper cost.

12/08/2008

 

Performance tools in Linux

[Phoronix] Linux Kernel Performance Counter Subsystem
Thomas Gleixner has proposed a series of patches to the Linux kernel that would (finally) introduced a performance counter sub-system. This sub-system would make it possible to read performance-oriented data off special registers on modern processors such as the number of CPU instructions executed, cache misses, branches mis-predicted, etc.

Thomas describes this proposed performance counter subsystem as being very simple (it only takes a few lines of user-space code to read the counters) but still an extensible design that can implement a full range of features. Also posted on the Linux Kernel Mailing List was a simple monitoring demo. Thomas believes that the design of this subsystem is superior to that of some of the other recent patch sets that add similar functionality. However, there still is quite a bit of work left to be accomplished for this performance counter subsystem. Right now Intel Core 2 and newer CPUs are supported with their performance counting registers, but beyond that there isn't any non-Intel CPU support.

 

Mountaintop removal for coal

BBC NEWS | Science & Environment | Barack Obama's coal conundrum
The first sight of the impact of US hunger for coal takes the breath away, here at Kayford in West Virginia, there's a yawning chasm where a mountain used to stand.
Open cast sites in Virginia
The coal is accessed by blasting the open cast site

And stretching a dozen barren miles to the horizon there's a series of hills with unnaturally flat tops - their peaks have been blasted off in a type of mining known as "mountaintop removal"

On a flight organised by the conservation charity SouthWings, pilot Susan Lapis tells me she's "horrified" to see how the quest for coal has devastated great tracts of landscape, some estimates suggest that more than 400 tops have been demolished so far.

12/07/2008

 

Engineering and Physical Sciences in Cambridge, UK

BBC NEWS | England | Cambridgeshire | University secures new £6m centre
The University of Cambridge has won £6m funding for a centre aimed at helping create a new generation of scientists.

The Engineering and Physical Sciences Research Council has awarded the funding to the university to set up a new Doctoral Training Centre.

The centre will train students to tackle some of the biggest problems the world faces, the university said.

It will support more than 50 PhD students over the next five years and train them in a range of disciplines.

The centre will be one of 44 training facilities across the UK which will train more than 2,000 PhD students.

12/04/2008

 
PolITiGenomics » Next-Generation Sequencing Informatics
Next-Generation Sequencing Informatics

Below is a table with informatics and IT statistics for the major next-generation/massively parallel sequencing platforms. The information in the table is approximate and should only be used for general, informational purposes.
Next-Generation Sequencing Informatics Statistics
Vendor: Roche Illumina ABI
Technology: 454 Solexa SOLiD
Platform: GS 20 FLX Ti GA GA II 1 2
Reads: 500 k 500 k 1 M 28 M 80 M 40 M 115 M
Fragment
Read length: 100 200 350 35 50 75 25 35
Run time: 6 hr 7 hr 9 hr 3 d 3 d 4 d 6 d 5 d
Yield: 50 Mb 100 Mb 400 Mb 1 Gb 4 Gb 6 Gb 1 Gb 4 Gb
Images: 11 GB 13 GB 27 GB 500 GB 700 GB 900 GB 1.8 TB 2.5 TB
PA Disk: 3 GB 3 GB 15 GB 175 GB 300 GB 350 GB 300 GB 750 GB
PA CPU: 10 hr 140 hr 220 hr 100 hr 70 hr 100 hr NA NA
SRA: 1 GB 4 GB 30 GB 50 GB 75 GB 100 GB
Paired-end
Read length: 200 2×35 2×50 2×75 2×25 2×35
Insert: 3.5 kb 200 b 200 b 200 b 3 kb 3 kb
Run time: 7 hr 6 d 6 d 8 d 12 d 10 d
Yield: 100 Mb 2 Gb 8 Gb 11 Gb 2 Gb 8 Gb
Images: 13 GB 1 TB 1.3 TB 1.8 TB 3.6 TB 5 TB
PA Disk: 3 GB 350 GB 500 GB 600 GB 600 GB 1.5 TB
PA CPU: 140 hr 160 hr 120 hr 170 hr NA NA
SRA: 1 GB 60 GB 100 GB 150 GB 200 GB
Notes:

* Units: B - bytes, b - bases
* PA is primary analysis (includes image feature extraction and base calling)
* PA CPU is calculated as the wall clock multiplied by the number of CPU cores
* ABI SOLiD data are representative of a single slide
* ABI SOLiD primary analysis is done on the instrument cluster
* SRA is the size of the files (SFF or SRF) that are submitted to the NCBI Short Read Archive

 

Perl and property information

I've been playing around with Nestoria, which is a site for property information in the UK and a few other european countries. Nestoria sponsored this years London Perl Mongers Workshop and use Perl and other open source projects as the building blocks for their service. It's all googly-mappy and clean, but most of all, it shows how good visualization makes information a lot useful...

12/03/2008

 

More on partitioning

Whatever....: mysql partitioning
LINEAR HASH partitioning -> This is almost similar to hash partitioning except for the fact that the algorithm used to divide data is different. The syntax is also almost same. We use PARTITION BY LINEAR HASH instead of PARTITION BY HASH.

The algorithm used is :

Given an expression expr, the partition in which the record is stored when linear hashing is used is partition number N from among num partitions, where N is derived according to the following algorithm:


* Find the next power of 2 greater than num. We call this value V; it can be calculated as:
V = POWER(2, CEILING(LOG(2, num)))

* Set N = F(column_list) & (V - 1).

* While N >= num
{
Set V = CEIL(V / 2)
Set N = N & (V - 1)
}



The advantage in partitioning by linear hash is that the adding, dropping, merging, and splitting of partitions is made much faster, which can be beneficial when dealing with tables containing extremely large amounts of data. The disadvantage is that data is less likely to be evenly distributed between partitions as compared with the distribution obtained using regular hash partitioning.

KEY partitioning ->
Partitioning by key is similar to partitioning by hash, except that where hash partitioning employs a user-defined expression, the hashing function for key partitioning is supplied by the MySQL server. The syntax used is PARTITION BY KEY instead of PARTITION BY HASH.

In most of the cases either a primary key or an unique key is used to create partitions.

Mysql also supports sub partitioning whereby a partition can be divided into further sub partitions of similar type.

12/02/2008

 

Reboot circadian rhythm

BBC NEWS | Health | 'Time-bending drug' for jet lag
A new cure for jet lag could be on the market in the next few years after trials show a pill can reset the body's natural sleep rhythms.

Tasimelteon works by shifting the natural ebb and flow of the body's sleep hormone melatonin.

In trials, published in The Lancet, the drug helped troubled sleepers nod off quicker and stay asleep for longer.

Experts said the drug would be a welcome alternative to addictive sedatives like benzodiazepines.

Commenting on the work, Dr Daniel Cardinali from the University of Buenos Aires said the findings would be welcomed by millions of people - "shift-workers, airline crew, tourists, football teams, and many others."

 

Oh Lord!

BBC NEWS | Technology | Apple pushes anti-virus for Macs
Apple has urged Mac owners to use anti-virus software.

In a note posted on its support site in late November, Apple said it wanted to "encourage" people to use anti-virus to stay safe online.

The move is widely seen as a response to the growing trend among cyber criminals of booby-trapping webpages that can catch out Mac users.

Before now Mac users have been largely free of the security problems that plague Microsoft's Windows.

12/01/2008

 

Interesting blog entry

Dual systems « Chris’ Big Bond Blog
Iv’e started to notice a cool tendency in the computer market. It seems like we are moving towards an age of dual systems. One system for mobility and one system for performance.

ASUS and Dell Latitude ON are delivering motherboards with a Linux OS burned into a chip on the motherboard. This way you don’t have to boot into windows if you just want to do some easy surfing or listen to some music. It saves power and increases mobility for laptops.

Lenovo also incorporates this into their Ideapad machine. You get a serious graphics card, wich you can turn on or off as you need it or not. Thereby, you can decide if you want performance or mobility.

This is a very interesting line of thought. I like to be able to have performance only when I need it. I wonder how far this will go. Will you eventually end up having a computer that is actually two computers?

 

Waiting time for MySQL 5.1

5.1 was released but it looks like it's worth waiting a bit for production ready...

 

Proudly born and raised in Igualada, or not so much sometimes...

Born and raised in Igualada, in the centre of Catalonia, one of the richest regions in Spain. Although catalans are often associated to wealthy lives, expensive summer holidays and sports cars, at least in Igualada, we have the highest rate of unemployment around, currently above 15%.

Crisi industrial a l'Anoia - Televisió de Catalunya
Crisi industrial a l'Anoia

La comarca de l'Anoia no aixeca el cap i registra una de les taxes d'atur més altes a Catalunya. Sobrepassa el 15% de la població, arriba a 7.500 parats. Actualment, hi ha en marxa molts expedients temporals i indefinits que afecten prop de 900 persones més. La preocupació s'ha traduït avui en manifestació als carrers d'Igualada.

11/30/2008

 

Proudly born and raised in Igualada, or not so much sometimes...

Born and raised in Igualada, in the centre of Catalonia, one of the richest regions in Spain. Although catalans are often associated to wealthy lifes, expensive summer holidays and sports cars but, at least in Igualada, we have the highest rate of unemployment around, currently above 15%.

Crisi industrial a l'Anoia - Televisió de Catalunya
Crisi industrial a l'Anoia

La comarca de l'Anoia no aixeca el cap i registra una de les taxes d'atur més altes a Catalunya. Sobrepassa el 15% de la població, arriba a 7.500 parats. Actualment, hi ha en marxa molts expedients temporals i indefinits que afecten prop de 900 persones més. La preocupació s'ha traduït avui en manifestació als carrers d'Igualada.

 

NYTProf by Tim Bunce

This is a straight-to-the-point talk about NYTProf given by its main author, Tim Bunce. Highly recommended...

11/24/2008

 

Solar everywhere

BBC NEWS | World | Europe | Spain city sets up solar cemetery
Spain city sets up solar cemetery

 

UK Pre-Budget Report day

This one is going to be memorable... should be as important as Thanksgiving to the Americans :-)

BBC NEWS | Business | Q&A: Pre-Budget Report
What is the pre-Budget report?

Published each autumn, the pre-Budget report is the Treasury's chance to explain in advance the measures likely to be presented in the full budget the following spring, and to update its economic and budget forecasts.

11/19/2008

 

New Plant Research Centre in Cambridge, UK

BBC NEWS | England | Cambridgeshire | Gift funds plant research centre
Gift funds plant research centre
Botanic garden
The new building with be in an area of the Botanic Garden.

An £82m donation is to help pay for a new research laboratory for the study of plant development at the University of Cambridge

11/18/2008

 

Coincidence? :-)


 

Coincidence?


11/17/2008

 

Supercomputing? It's a Linux game

Operating system Family share for 11/2008 | TOP500 Supercomputing Sites
Operating system Family share for 11/2008

In addition to the table below, you can view the visual charts using the TOP500 charts page. A direct link to the charts is also available.
Operating system Family Count Share % Rmax Sum (GF) Rpeak Sum (GF) Processor Sum Linux 439 87.80 % 13341108 20822363 2104191
Windows 5 1.00 % 328114 429555 54144
Unix 23 4.60 % 881289 1198012 85376
BSD Based 1 0.20 % 35860 40960 5120
Mixed 31 6.20 % 2356048 2933610 869676
Mac OS 1 0.20 % 16180 24576 3072
Totals 500 100% 16958600.19 25449076.20 3121579

 

Adobe 64bit Flash for Linux is here

[Phoronix] Adobe Releases 64-bit Flash For Linux
Over on the Adobe Labs web-site they have released an alpha version of Flash Player 10 that's targeted at Linux x86_64. This 64-bit Flash plug-in works without the NSPlugin wrapper and will "just work" with your 64-bit web browser. The release notes for the 64-bit Linux edition can be read here.

Will Sun be next to improve the 64-bit Linux user-land by releasing an official 64-bit version of its Java plug-in? Let's hope.

11/10/2008

 

More Open Source graphics acceleration work

[Phoronix] ATI R600 DRI 3D Work Gets Closer
In late August, internally at AMD they had their first hardware-accelerated GL triangles on the RV770 GPU using the open-source stack. AMD's Alex Deucher was foregoing XDS 2008 at the Edinburgh Zoo with the celebrating to prepare this RV770 code for release, but unfortunately this code still has yet to surface.

Though by the looks of it, once this R600/700 code is released, it will hopefully be in a pretty usable state to end-users. Perhaps a nice Christmas gift from AMD?

 

Matthew's comments on user space

Advogato: Blog for mjg59
Userspace indicates what state it wants to go to and the kernel decides what's going to get powered down. This kind of coarse grained approach means that as your hardware setup becomes more complex you hit combinatorial explosion. Expressing all the useful combinations of hardware state simply becomes impractical if all you're exposing is a single variable. What would be more useful is the ability for userland to interact with individual pieces of hardware.

The amusing thing is that in many cases Linux already has this. Take a look at the backlight and LCD class drivers, for instance. They provide a trivial mechanism for userspace to indicate its desires and then modify the device power state. It's true that there are other pieces of hardware that don't currently have interfaces to provide this kind of information. And that's where cooperation with the existing community comes in. We've already successfully fleshed out interfaces for runtime power management for several hardware classes, with the main thing blocking us being a lack of awareness of what the use cases for the remaining classes are. But linux-pm has seen nobody from the Android team, and so we end up with a lump of code solving a problem that shouldn't exist.

 

Matthew's comments on user space

Advogato: Blog for mjg59
Userspace indicates what state it wants to go to and the kernel decides what's going to get powered down. This kind of coarse grained approach means that as your hardware setup becomes more complex you hit combinatorial explosion. Expressing all the useful combinations of hardware state simply becomes impractical if all you're exposing is a single variable. What would be more useful is the ability for userland to interact with individual pieces of hardware.

The amusing thing is that in many cases Linux already has this. Take a look at the backlight and LCD class drivers, for instance. They provide a trivial mechanism for userspace to indicate its desires and then modify the device power state. It's true that there are other pieces of hardware that don't currently have interfaces to provide this kind of information. And that's where cooperation with the existing community comes in. We've already successfully fleshed out interfaces for runtime power management for several hardware classes, with the main thing blocking us being a lack of awareness of what the use cases for the remaining classes are. But linux-pm has seen nobody from the Android team, and so we end up with a lump of code solving a problem that shouldn't exist.

11/07/2008

 

This is too much


11/06/2008

 

Benchmarks to compare apples to apples

My love for Phoronix reports keeps growing and growing. The latest one is the first extensive comparison between Mac OS X and Ubuntu. Given the results, I have to say that I wouldn't expect Linux to win on a Mac Mini hardware both in the OpenGL and disk benchmarking area, although the SQLite results are not so good for Linux. BYTE Unix Benchmark, way to go!

[Phoronix] Mac OS X 10.5 vs. Ubuntu 8.10 Benchmarks
Apple's Mac OS X 10.5.5 "Leopard" had strong performance leads over Canonical's Ubuntu 8.10 "Intrepid Ibex" in the OpenGL performance with the integrated Intel graphics, disk benchmarking, and SQLite database in particular. Ubuntu on the other hand was leading in the compilation and BYTE Unix Benchmark. In the audio/video encoding and PHP XML tests the margins were smaller and no definitive leader had emerged. With the Java environment, Sunflow and Bork were faster in Mac OS X, but the Intrepid Ibex in SciMark 2 attacked the Leopard. These results though were all from an Apple Mac Mini.

Archives

200409   200412   200501   200502   200503   200504   200505   200506   200507   200508   200509   200510   200511   200512   200601   200602   200603   200604   200605   200606   200607   200608   200609   200610   200611   200612   200701   200702   200703   200704   200705   200707   200708   200709   200710   200711   200712   200801   200802   200803   200804   200805   200806   200807   200808   200809   200810   200811   200812   200901   200902   200903   200904   200905   200906   200907   200908   200909  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]