Pleasingly Parallel MCMC: cracked wide open for MapReduce and Hadoop

MCMC methods guarantee an accurate enough result (say parameter estimation for a phylogenetic tree). But they give it to you usually in the long-run and many burn-in steps might be necessary before performing ok. And if the data size grows larger, the number of operations to draw a sample grows larger too (N -> O(N)… Continue reading Pleasingly Parallel MCMC: cracked wide open for MapReduce and Hadoop

2 recent Global Alliance for Genomics and Health standard candidates: ADAM and Google Genomics

Global Alliance for Genomics and Health includes > 150 health and research organizations to progress/accelerate secure and responsible sharing of genomic and clinical data. GA4GH (for short) is something you will here about more and more in the short term future. In the context of genomics standards think of mainly data formats and code to access and process… Continue reading 2 recent Global Alliance for Genomics and Health standard candidates: ADAM and Google Genomics

3 recent Hadoop/MapReduce applications in the life sciences: RNA structure prediction, neuroimaging genetics, EEG signal analysis

3 open access papers, 3 prototypes, source code available only for 1, healthy diversification of topics. 1. Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce code available: haven’t found it referenced in the paper Our previous research shows that cutting long sequences into shorter chunks, predicting secondary structures of… Continue reading 3 recent Hadoop/MapReduce applications in the life sciences: RNA structure prediction, neuroimaging genetics, EEG signal analysis

Google invests into DNAnexus: aging-driven big data bioinformatics without the Hadoop Ecosystem?

First time DNAnexus made me think a little about what they can achieve was when they came up with an alternative search and browse interface for the complete Sequence Read Archive (SRA) database. They came to the ‘rescue’ as NCBI discontinued SRA in 2011 although later they’ve changed their mind, so SRA is still up and running there.… Continue reading Google invests into DNAnexus: aging-driven big data bioinformatics without the Hadoop Ecosystem?

Coming of age for proteogenomics: 10% less human protein coding genes based on mass spec proteomics data?

Guessing the number of real protein-coding genes is an ‘ancient’ bioinformatics game and now a new argument & newish research field has been applied to this problem. Proteogenomics can refer to different type of studies but the basic idea is that mass spectrometry peptide/protein evidences are used to improve genome annotations. Now a joint Spanish –… Continue reading Coming of age for proteogenomics: 10% less human protein coding genes based on mass spec proteomics data?

Three links in Aging, Regenerative Medicine & Healthy Lifespan Extension: 17 December 2013

1. DNA methylation age of human tissues and cell types by Steve Horvath: This is the type of relevant data mining study most bioinformaticians are dreaming of: you pull together a large body of publicly available datasets (CpG methylation) that are not too heterogeneous (Infinium type II assay on Illumina 27K or Illumina 450K array platform), derive robust… Continue reading Three links in Aging, Regenerative Medicine & Healthy Lifespan Extension: 17 December 2013

Three links in Aging, Regenerative Medicine & Healthy Lifespan Extension: 8 December 2013

1. Is aging linear or does it follow a step function? A good & simple question on Quora that surprised even Aubrey de Grey. If you are a bioinformatician out there – looking for a new pet project – go pull together some data & try to plot it! Let me know if you have something.… Continue reading Three links in Aging, Regenerative Medicine & Healthy Lifespan Extension: 8 December 2013