Sage Bionetworks is a not-for-profit organization developing an open-access “pre-competitive” platform for networked and annotated models of human disease. It’s a huge and unparalleled bioinformatics enterprise: starting with an anonymous $5 million donation and soon making high throughput, large-scale human and mouse biological data (largely from Merck) available in the range that’s already in the public domain today. The co-founders are real big shots, Stephen Friend, a former successful Merck Executive and Eric Schadt, now a Chief Scientific Officer of Pacific Biosciences, who is “an industry leader in network biology with a number of high-profile publications over the past 5 years that have energized the systems biology community.”
For the last couple of months there was only minimal information available on the Sage website but now scientists interested can get the big picture in more details via a significant update.
The strong motivation behind is to build an open-access standard platform for human disease biology because
human disease biology has no common languages, no accessible communal repositories and no government, corporate or foundation investment in generating an inclusive resource….The experimental data underlying disease biology, like the genome itself, needs to be open access because the data is simply the beginning of the process….
Human disease biology is so complex, interconnected and expensive to research that the existing dominant business strategies of building and patenting unique models need to be replaced by a common standard. Like the internet, disease biology models will gain strength by their very nature as public platforms for interoperability and communication – this approach is at the very heart of that strength.
At the heart of the Sage model are the so called Global Coherent Datasets that will be for the first time available for scientists working all around the world. We’re talking about a real goldmine here for researchers:And if that doesn’t sound good enough for a start then the following Sage Datasets will be available in 1 to 2 years:
- Extended D&O datasets to include >2,000 additional mouse and >1,000 additional human individuals (totals: >3,000 & >3,500 respectively)
- Extended neurological datasets for mouse to include sleep, anxiety and depression traits
- Extended CVD phenotypes for mouse and human to include additional relevant tissue-types (kidney, arterial wall, heart, plaque, etc)
- Age-related phenotypes relevant to sarcopenia (Mouse)
- Oncology datasets relevant to hepatocellular carcinoma including paired tumor/adjacent normal tissue and networks predictive of outcome. Also breast cancer and colon cancer datasets
- Human/mouse datasets relevant to respiratory and inflammatory disease
You name it.
As an approximate, working definition Global Coherent Data Sets are those “where DNA variation, genome-wide molecular phenotypes such as gene-expression, and clinical phenotypes have been measured in a sizable population of genetically diverse individuals.” Sizable: hundreds of individuals at least.
Meet the Sage Repository and Commons concepts: Sage Commons will contain the Global Coherent Data Sets, the network models derived from those sets, and the analytical methods and code used to generate the network models. Sage Repository will contain the same triplets only those “may not be annotated or coherent to the degree required for the Sage Commons, but which are nevertheless useful to the biological research community”
How can you utilize the data once it’s available starting probably 2010ish? 2 Case Studies are mentioned (identify novel target for obesity & reposition a drug) out of which I’d like to emphasize drug repurposing:
The pharmaceutical industry has an ever expanding portfolio of compounds that have been shown to be safe in human testing and to effectively modulate particular targets. In some cases these drugs show efficacy and go on to become marketed drugs whilst in other cases they may fail to show efficacy precluding their development for the selected indication. In either case there is a significant value to the industry in finding new indications for drugs with these characteristics (safe & modulate a known target) as a means to realize value on investment. Using network models tied to disease traits it is possible to generate potential new indications for compounds. Figure 2 illustrates how this was done for a compound originally developed for asthma. A molecular signature for the drug was interrogated against disease networks and a significant enrichment of the drug signature was noted for a particular pancreatic islet network linked to obesity and insulin resistance traits in a mouse F2 population. This led to the prediction that this drug may modulate these phenotypes through an effect on the islet module. This was tested in a mouse DIO model in which the compound helped normalize insulin and glucose levels.