Random ramblings about some random stuff, and things; but more stuff than things -- all in a mesmerizing and kaleidoscopic soapbox-like flow of words.
Interesting to see all the buzz that the
Benjamin Franklin Award has generated in the blogosphere, twittersphere, facebooksphere and any of the other spheres out there... I still think there is something we should try and resolve in open source bioinformatics, which is promoting Open Source software to create more awareness in the scientific community. We need to reconcile the promotion of modularity and generality with the fact that giving credit to the scientists who contribute to Open Source Bioinformatics software is still important. Projects that have built very modular and generic components may be doing a lot for the bioinformatics community at large but, at the same time, the less atomic and single-purpose your software is, the more difficult it is to publish it in a prominent scientific journal. The same goes for citing it in the downstream publications: very atomic programs are very successful in citation metrics, but infrastructure code is not. This means that well-designed, well-implemented and well-tested software is often not prominent enough for new people to notice, and too many bioinformaticians resort to their own glue code for building their bioinformatics infrastructure.
There wouldn't be anything wrong with rewriting your own code over and over again if it weren't because: (a) people spend too much time writing
scaffolding code that will let them access what is
really new and interesting in their project and (b) that code tends to be used and tested only internally and almost never reused for any other party unless it has been very well designed and documented --- hence the name
scaffolding code.
There is a really good chance now to build an infrastructure that brings up a
terminal next to the next generation Petabyte-size data sources, using emerging "cloud" technologies. These technologies are already advanced in other fields other than bioinformatics, so we can leverage what it has already been done for us and make extensive use of it. These terminals don't need to be silly, and the community should provide in them as much prebuilt code as possible so that the new breed of bioinformaticians get used to have this software at their fingertips.
A few years ago all the effort was in building packages for different Linux distributions, so that people could easily install Open Source software on their in-house CPU clusters. I think we need to shift gears now to cloud software accessibility. The good news is that it seems
everybody is happy with the common Ubuntu system as a start. I fear the proliferation of iPhone-like SDKs around that will make the existing bioinformatics software useless. In an era where everybody is acutely aware about governments having to pour our taxes into infrastructure that was already paid for, noone will like to see all existing bioinformatics software become a "toxic" or "legacy asset"!
Labels: genomics, nextgen sequencing, open source