Positive, published scientific data form the tip of the iceberg of any scientific data produced in labs. As at least 90% (my guess) of all experiments are failed or lead to negative results, those data sets become “dark data“. But those dark data are as important for making science happen as positive data and this information must be free – argues Thomas Goetz Wired’s deputy editor (and another SciFoo camper) in an opinionated piece in the October issue of Wired (available only offline at this moment, update: it is now online), called Mind the gaps. The idea is to push open access science to its limits.
“Liberating dark data makes many scientists deeply uncomfortable, because it calls for them to reveal their “failures”. But in this data-intense age, those apparent dead ends could be more important than the breakthroughs….Your dead end may be another scientist’s missing link. Freeing up dark data could represent one of the biggest boons to research in decades, fueling advances in genetics, neuroscience, and biotech.”
“Advocating the release of dark data is one thing, but it’s quite another to actually collect it, juggling different formats and standards. There’s the issue of storage….Google, among others, is lending a hand with its Palimpsest project, offering to store and share monster-size data sets (making the data searchable isn’t part of the effort.)”
Stop for a minute! The Palimpsest project was entertainingly presented at SciFoo by Jon Trowbridge (my iPhone shot of one his slide published here with Jon’s permission) and my guess is that this presentation is the source of Thomas Goetz’s sentence. I tried to make a hint of this project in my SciFoo Camp, 2007: data (Google) publishing (Nature) geeks (O’Reilly) post:
One of the most frequently used key term was “scientific data”. And the question is: how to collect, upload, organize and index them. With the exponentially increasing data sets, that are produced by scientists worldwide, it is obvious that we need really powerful tools to benefit them. After a couple of beta years it is highly probable that Google (according to its mission statement) will offer new ways to manage the enormous amount of valuable scientific data. Without that, the efficiency of the science industry will dramatically decline.
Here is my favorite part out Goetz’s article about the science culture problem of freeing dark data:
“If their research is successful, many academics guard their data like Gollum, wringing all the publication opportunities they can out of it over years. If the research doesn’t pan out, there’s strong incentive to move on ASAP, and a disincentive to linger in eddies that may not advance one’s job prospects.”
Wait for a sec! During the summer I did 2 experiments that failed (=negative data), but then I explored in the literature why I exactly failed and now this knowledge and insight presumably will lead me to successful experiments. The failed experiments are crucial parts of my work and if there is a coherent story to tell and publish in a peer review journal, those data will be there as negative control or whatever. So for me, it would be harmful to publish them now (say here in the blog) as they could easily suggest the point I am trying to prove. But I promise here, that after I published my positive results (if our hypothesis is true and not a false alarm), I’ll publish here the failed experiments, which also serve as the heuristic path leading to something positive. Enough said, I recommend that freeing negative data should be a post publication depending on publishing related positive data first. At least, in this world, where open-access science is not by default.
What follows in the article is well-known amongst the readers of this blog and the well-connected science blogosphere, but is entirely new to the meganiche or micro mainstream Wired audience:
“There are some island of innovation. Since 2002, the Journal of Negative Results in BioMedicine has offered a peer-reviewed home to results that go negative or against the grain. Earlier this year, the journal Nature started Nature Precedings, a Web-based forum for prepublication research and unpublished manuscripts in biomedicine, chemistry and the earth sciences. At Drexel university, chemist Jean-Claude Bradley practices “open notebook” science – chronicling his lab”s work and sharing data via blog and wiki. And PloS is planning an open repository for research and data that is otherwise abandoned.”
Goetz goes further with journalistic enthusiasm: “These are great first steps. But freeing dark data should be the norm, not the exception. Once the storage and format problems are solved, scientists will need easy ways to search and retrieve each other’s data. Congress should be mandate that all federally funded research be disseminated, whatever the results.”
3-4 years ago I’ve read somewhere (but not checked it), that the failed, not funded N.I.H. grants are published on a website and thought that the idea is terrific. Maybe it’s time now to check it.