Google’s Palimpsest project: promiscuous distribution of all science data sets
Posted by attilachordash on September 25, 2007
Google’s Palimpsest project, once realized (in the near future) has the potential to change the way science is done by accepting gigantic (raw?) data sets from all disciplines and making them open and free (including dark data?). Jon Trowbridge from Google Inc. (you know, The Facebook of information) had a presentation on SciFoo, 2007 at the Googleplex not documented well, but you can download his slides on the project that was presented at XTech 2007 in Paris, this May: Making Massive Datasets Universally Accessible and Useful Presentation. You are not restricted to the zip file as Jon kindly gave a permission to publish his slides with SlideShare here. From his intro: This talk will discuss a project underway at Google to collect and distribute large scientific datasets using a 21st century “Sneakernet”: multi-terabyte disk arrays shipped via FedEx and other common carriers.
The project is strictly non-profit, but fits well with Google’s mission.
Other links:






September 25, 2007 at 3:55 pm
Attila
This is great. Thanks for posting.
September 26, 2007 at 9:10 am
[...] Google about Google’s efforts in this direction. While the talk from Scifoo is not available, Attila got permission to upload Jon’s talk up on Slideshare. The presentation is quite similar to the talk at [...]
September 27, 2007 at 4:15 am
Thanks for posting the slides. It is interesting. This is still very much for very large data volume but maybe whatever they build around this (maybe a GBase segment for scientific data) could be use for lower data volume uploaded via net.
September 27, 2007 at 4:14 pm
Pedro,
At least for now they don’t necessarily have plans to do much with the data other than make it available on the web. Ideally, I think a Freebase/GBase type approach would be great. With an appropriate API and knowledge of the data structure, people could start building apps and of course, Google would do a great job of indexing the whole thing
November 6, 2007 at 11:54 pm
[...] Google’s Palimpsest project [...]
January 18, 2008 at 9:08 pm
[...] (12:01pm): Attila Csordas of Pimm has a lot more details on the project, including a set of slides that Jon Trowbridge of Google gave at a presentation in [...]
January 18, 2008 at 9:45 pm
[...] “Palimpset”. The Wired piece also links to this blog, “Pimm”, which has a presentation about this project available on Slideshare. Pimm’s blog said that this project is strictly nonprofit (I [...]
January 19, 2008 at 6:16 am
[...] For more on Google and it’s data efforts, keep tabs on Attila’s blog, included his post on the promiscuous distribution of large datasets [...]
January 19, 2008 at 5:23 pm
[...] annotation and commentary solution. What does that mean, exactly? Heck if I know. Venture over to this Pimm blog post to cycle through a brief slide show to get some measure of what one will likely encounter on launch [...]
January 19, 2008 at 11:43 pm
[...] data on http://research.google.com/. Alexis Madrigal mentions more details in a post on Wired. Pimm’s post on the same topic displays the device to send data on slide 10/16. Google seems to plan to collect [...]
January 20, 2008 at 1:55 am
[...] (including about why Google intend to import data by shipping RAID arrays around the world) here and (more up to date) [...]
January 20, 2008 at 12:05 pm
[...] Il progetto sarà accessibile a tutti gli scienziati che desiderano condividere con la comunità i propri dati, vista la mole di dati da trasportare, il pogetto sfrutterà il vecchio adagio informatico: “niente ha la larghezza di banda di un TIR che viaggi in autostrada carico di hard disk”, e così faranno: gli scienziati riceveranno una valigetta contentente un black box sul quale caricare 3TB di dati che poi saranno fisicamente spediti a Googl per l’inclusione nel grande database. Il dataset con il quale il progetto partirà è quello delle fotografie dell’Hubble Space Telescope. Maggiori dettagli e alcune slide sull’aromento disponibili qui. [...]
January 20, 2008 at 9:40 pm
[...] Wired is reporting that Google will begin hosting terabytes of open-source data at http://research.google.com(the project was supposed to open this week, but missed the deadline, but will debut soon). The space will be free to scientists and the data will be available to all. The project is known as Palimpsest. The storage will allow scientists to explore amazing amounts of data. Tons of more information on the project is available at Pimm. [...]
January 21, 2008 at 8:29 am
[...] Csordas di Pimm ha maggiori dettagli sul progetto, compresa una serie di diapositive, che Jon Trowbridge di Google ha mostrato ad una presentazione a [...]
January 21, 2008 at 9:35 am
[...] [via Wired and Pimm] [...]
January 21, 2008 at 2:35 pm
[...] y discutir sus estudios y datos con otros científicos del mundo. El proyecto tiene el nombre de palimpsest, que será manejado desde el dominio [...]
January 21, 2008 at 4:46 pm
Google is playing to win in the 700 MHz auctions
Many say Google will bid to lose in the upcoming 700 MHz auctions and many more are equivocating. The idea is Google’s entry alone will induce enough openness, and besides they couldn’t afford to become an operator. This shows a
January 21, 2008 at 6:10 pm
That’s great.
I am curious what browse/search feature Google will provide. It will be nice the data be well annotated using semantic web technology.
January 22, 2008 at 7:46 am
[...] ” Wired” meldet, hätte das Projekt mit dem Namen Palimpsest eigentlich schon letzte Woche online gehen sollen. Doch der Start musste verschoben werden, soll [...]
January 22, 2008 at 11:43 pm
We at the Ctr. for Inherited Disease Research routinely ship data from genome scans to PIs and back-and-forth to NLM/NCBI on large encrypted disk arrays. We also continually archive and will eventually have to delete all the level 0 or “raw” data - the actual image data from which the genomic data is derived. I think someday we will regret deleting this data since better algorithms are developed every day yet many of these studies use the very last of available DNA from a given research subject who may be dead or otherwise no longer available to extract more. Having someplace to store them for future re-analysis, imho, be a great service.
Now, what about the data from extremely high-res 3D scana of the world’s entire collection of several hundred thousand cuneiform tablets, the world’s oldest written records and in many ways the foundational documents of human civilization? It might be a few petabytes or so: http://www.jhu.edu/digitalhammurabi/
February 10, 2008 at 12:29 pm
[...] method works well for all large datasets. A presentation by Jon Trowbridge at SciFoo (slides available here) makes a compelling argument that disk hardware capacity has consistently outpaced network [...]
February 29, 2008 at 8:19 am
[...] Google透露了一个新的针对科学社区的项目:Palimpsest,网址research.google.com,用于储存数以TB计(一开始是3TB,可能会进一步扩充到20T)的开放性科学数据。科学家可免费储存数据,任何人都可以自由访问,网站已经在去年8月被科学家预先测试过。Palimpsest将以Google去年收购的数据分析公司Trendalyzer研究的数据可视化技术为基础,加上自己开发的信息检查和查询算法。新网站将提供YouTube风格的注释和评论功能。这里有一位参与测试的生物学家在去年写下的评述及幻灯片。 [...]