2 recent Global Alliance for Genomics and Health standard candidates: ADAM and Google Genomics

Global Alliance for Genomics and Health includes > 150 health and research organizations to progress/accelerate secure and responsible sharing of genomic and clinical data. GA4GH (for short) is something you will here about more and more in the short term future.

In the context of genomics standards think of mainly data formats and code to access and process these formats (APIs if you like this term, well I don’t).

2 emerging projects in the genomics standards field, one of them is bleeding, the other one is cutting edge:

1. ADAM

“Global Alliance is looking at ADAM as a potential standard” check slide 12 of this slideshow.

what is it: “ADAM is a genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. Apache 2 licensed.”

Currently it includes a complete variant calling pipeline amongst others.

main codebase: http://bdgenomics.org/

some folks behind it: @matt_massie @fnothaft and several other people from places like AMPLab at Berkeley, GenomeBridge, The Broad Institute, Mt Sinai.

2. Google Genomics

Announced by Jonathan Bingham on the Research at Google bloghttp://googleresearch.blogspot.co.uk/2014/02/google-joins-global-alliance-for.html introducing:

“a proposal for a simple web-based API to import, process, store, and search genomic data at scale

a preview implementation of the API built on Google’s cloud infrastructure, including sample data from public datasets like the 1,000 Genomes Project

a collection of in-progress open-source sample projects built around the common API”

It provides a Genome Browser, command line interface, MapReduce wrapper amongst others.

Update, 2014-03-06: Global genomic data-sharing effort kicks off

David Altschuler:

“For example, there is a set of file formats currently used that came out of the 1,000 Genomes Project because we needed it. We think the current generation requires not file formats but machine-readable application programmer interfaces (APIs), and this group is developing — together with academics and for-profit companies — an open-source public API for genome sequencing reads and for genetic variants. That’s a very concrete set of things that we think the field needs.”