Freeing dark, negative research data is the next in open access science?

goetzarticlePositive, published scientific data form the tip of the iceberg of any scientificgoetz data produced in labs. As at least 90% (my guess) of all experiments are failed or lead to negative results, those data sets become “dark data“. But those dark data are as important for making science happen as positive data and this information must be free – argues Thomas Goetz Wired’s deputy editor (and another SciFoo camper) in an opinionated piece in the October issue of Wired (available only offline at this moment, update: it is now online), called Mind the gaps. The idea is to push open access science to its limits.

“Liberating dark data makes many scientists deeply uncomfortable, because it calls for them to reveal their “failures”. But in this data-intense age, those apparent dead ends could be more important than the breakthroughs….Your dead end may be another scientist’s missing link. Freeing up dark data could represent one of the biggest boons to research in decades, fueling advances in genetics, neuroscience, and biotech.”

“Advocating the release of dark data is one thing, but it’s quite another to actually collect it, juggling different formats and standards. There’s the issue of storage….Google, among others, is lending a hand with its Palimpsest project, offering to store and share monster-size data sets (making the data searchable isn’t part of the effort.)”

Stop for a minute! The Palimpsest project was entertainingly presented at SciFoo by Jon Trowbridge (my iPhone shot of one his slide published here with Jon’s permission) and my guess is that this presentation is the source of Thomas Goetz’s sentence. I tried to make a hint of this project in my SciFoo Camp, 2007: data (Google) publishing (Nature) geeks (O’Reilly) post:

trowbridgeSciFoo“scientific data”

One of the most frequently used key term was “scientific data”. And the question is: how to collect, upload, organize and index them. With the exponentially increasing data sets, that are produced by scientists worldwide, it is obvious that we need really powerful tools to benefit them. After a couple of beta years it is highly probable that Google (according to its mission statement) will offer new ways to manage the enormous amount of valuable scientific data. Without that, the efficiency of the science industry will dramatically decline.

But it was Deepak, who later shared his experience on the presentation in details:
Scifoo: Google and large scientific datasets

Here is my favorite part out Goetz’s article about the science culture problem of freeing dark data:
“If their research is successful, many academics guard their data like Gollum, wringing all the publication opportunities they can out of it over years. If the research doesn’t pan out, there’s strong incentive to move on ASAP, and a disincentive to linger in eddies that may not advance one’s job prospects.”

Wait for a sec! During the summer I did 2 experiments that failed (=negative data), but then I explored in the literature why I exactly failed and now this knowledge and insight presumably will lead me to successful experiments. The failed experiments are crucial parts of my work and if there is a coherent story to tell and publish in a peer review journal, those data will be there as negative control or whatever. So for me, it would be harmful to publish them now (say here in the blog) as they could easily suggest the point I am trying to prove. But I promise here, that after I published my positive results (if our hypothesis is true and not a false alarm), I’ll publish here the failed experiments, which also serve as the heuristic path leading to something positive. Enough said, I recommend that freeing negative data should be a post publication depending on publishing related positive data first. At least, in this world, where open-access science is not by default.

What follows in the article is well-known amongst the readers of this blog and the well-connected science blogosphere, but is entirely new to the meganiche or micro mainstream Wired audience:

“There are some island of innovation. Since 2002, the Journal of Negative Results in BioMedicine has offered a peer-reviewed home to results that go negative or against the grain. Earlier this year, the journal Nature started Nature Precedings, a Web-based forum for prepublication research and unpublished manuscripts in biomedicine, chemistry and the earth sciences. At Drexel university, chemist Jean-Claude Bradley practices “open notebook” science – chronicling his lab”s work and sharing data via blog and wiki. And PloS is planning an open repository for research and data that is otherwise abandoned.”

Goetz goes further with journalistic enthusiasm: “These are great first steps. But freeing dark data should be the norm, not the exception. Once the storage and format problems are solved, scientists will need easy ways to search and retrieve each other’s data. Congress should be mandate that all federally funded research be disseminated, whatever the results.”

3-4 years ago I’ve read somewhere (but not checked it), that the failed, not funded N.I.H. grants are published on a website and thought that the idea is terrific. Maybe it’s time now to check it.

17 thoughts on “Freeing dark, negative research data is the next in open access science?

  1. Pingback: » Liberating negative data » business|bytes|genes|molecules

  2. I completely agree with this, dark data has to be open access. I think the biology world needs to adapt the “open source” “gnu” ideology of the computer world. I use a lot of good open source softwares (like Firefox, like Biotapestry, like FreeMind, the list goes on…) and I always think the life sciences should (one day) be the same way, people should discuss and share more, another scientist should not have to make the very same experiments just because mine failed and I did not share it. This is a waste of resources, waste of time, waste of workforce of qualified people.

    Open access sounds like a utopia to many people though. It will be very hard to change people’s minds, and unfortunately the current “grant” system makes it a very ugly competitive world.

    I hope these discussions will “open” new doors and will make us gain new insights.

    Thanks for bringing this up.

  3. This is a great idea whose time has come, thanks to the power of data storage and retrieval. There are no failures, there are ways we learn that do not work for a particular experiment, but may work in other ways for other scientists’ experiments or processes.

    In the end, it will actually save thousands of hours of work.

    Peace!

  4. Pingback: » Google and promiscuous distribution of data » business|bytes|genes|molecules

  5. Pingback: Science in the open » The need for open data is getting more mainstream

  6. Pingback: Pimeä data julki « kirjastokone

  7. Pingback: theorywatch » Blog Archive » Museum of negative representation

  8. These are great first steps. But freeing dark data should be the norm, not the exception. Once the storage and format problems are solved, scientists will need easy ways to search and retrieve each other’s data. Congress should be mandate that all federally funded research be disseminated, whatever the results

  9. Pingback: Science in the Open » Blog Archive » The need for open data is getting more mainstream

  10. yes, there is definitely a bias operating when it comes to reporting only positive outcomes in research studies. I just now had a good idea for a solution. There should be one main international databank for all research with negative findings (ie, results contrary to the hypotheses). it could be called the “dark data bank”… what do you think, attila??

Comments are closed.