Pimm – Partial immortalization

A Biotech Geek (micro)Blogger’s adventures through science, technology and the web…

  • email me

    [attilacsordas][at][gmail.com]
  • Attila on Twitter

    • Red Mars before sleep &after JavaScript:dropping windmills=>spin=>heat in coils=>release to atmosphere, winds slowing down=>dust storms down 14 hours ago
    • Hard to believe, learn in what sense? See/trial & error? RT @GreatDismal I learn more watching people use new tech than using it myself 16 hours ago
    • nephews (11,13) just learned how to run, modify & debug the 'Hello World' JavaScript on the iPhone w/ Notes, variables & functions next ;) 21 hours ago
    • Family party this afternoon: preparing w/little JavaScript snippets on the iPhone for my nephews so they can run scripts on their iPod touch 1 day ago
    • Safari is losing http requests to Chrome/Firefox on my laptop due to the lack of an omnibox capability 1 day ago
  • Recent Comments

    GB on Visualize 23andMe haplogroup d…
    MaryHollmy on Google Health, IBM: real-time,…
    colon hydrotherapy l… on Why the Dyna-Vision G1 Android…
    revathi on Human mitochondrial DNA vs. nu…
    Erik Cole on Michael Rose, evolutionary SEN…
    drugrehabusa on Stem Cell Therapy Market, US, …
    Letago on Can you tell a good article fr…
    Online Offers on Life extension people are happ…
    เสื้อผ้า on How to read PDF files on iPhon…
    atsoft on Add stem cells and eat the lab…
  • licence

    Creative Commons License
  • c

  •  

    June 2008
    M T W T F S S
    « May   Jul »
     1
    2345678
    9101112131415
    16171819202122
    23242526272829
    30  

Petabyte Age Wiredesque lesson on what science can learn from Google

Posted by attilachordash on June 22, 2008

I argued many times here that biology based biotechnology is the next information technology but in order to do so, biotech should harness good IT patterns and mimic its massive computing practices to handle the enormous amount of constantly accumulating data. Often this trend could be summarized in a simple way: keep your eye on Google and conduct thought experiments in advance in which science is done in a Googleplex like environment in terms of the computing & financial resources and algorithm heavy engineering culture. Use Python and learn cluster computing and MapReduce. With the expected launch of the massive scientific dataset hosting Google service – nicknamed Palimpsest – this year finally a direct interface between scientists and Googlers emerges and hopefully opens up possibilities for scientists to cooperate with Google. (Remember my joke on Google BioLabs back in 2006)? I get emails from biologists, bioinformaticians asking me how to be hired by Google ever since then. As I tweeted yesterday: I growingly have the impression that “being ambitious” today = ‘worked, is currently working, is going to work at/for Google’ Taking Google’s inter-industrial power into consideration I see a real chance that some day the “Google of Biotechnology” title goes not to a startup yet to be emerged, not to Genentech or to 23andMe but……to Google itself. No kidding here. Fortunately Google’s model is “to build a killer app then monetize it later” says Andy Rubin, the man behind Google’s Android mobile software in the July issue of Wired so scientists working for the big G probably won’t have to worry about turning their scientific killer app into an instant cash machine.

And now in the very issue of Wired magazine (not online yet ) there is an exciting cover story on the same pattern I talked about concerning the life sciences but in the broader context of every kind of science with the provocative, Fukuyama-like title The End of Science. There is a witty and short essay from editor-in-chief Chris Anderson entitled The End of Theory followed by examples of the ‘new science’ like the The Large Hadron Collider expected to generate 10 petabytes if data/second, The Sloan Digital Sky Survey heaven catalog maker accumulating 25 terrabytes of data so far, the skeleton scanning project of Sharmila Majumdar and the Many Eyes project “where users can share their own dynamic, interactive representations of big data”.

For many people around the globe, Chris Anderson is a freeconomist & the author of a popular airport book but fewer people are aware that he was actually trained as a (quantum) physicist and even worked at Los Alamos (after a 3 min search on Google Scholar and the likes I gave up to find any peer reviewed article Anderson coauthored). So when he writes on science readers should keep in mind that what he really understood once was the practice of physics in the 80’s (looking forward to a biology oriented Wired editor-in-chief in the near 21st century future, Thomas Goetz perhaps?). Anderson talks about the end of traditional science, or at least the end of science according to the most popular philosophical account on how science operates: testable hypothesis for the underlying mechanism, causation – model – test, experiment – confirm/falsify.

“The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades is that we don’t know how to run the experiments that would falsify the hypotheses – energies are too high, accelerators too expensive, and so on.”

Anderson concludes:

“There is now a better way. Petabytes allow us to say” “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.”

Interesting point, but isn’t it too early to call all this the End of Science with a typical overjournalistic lingo? From a cultural point of view: it is just not cool to use the vulgar philosophical (and vulgar Hegelian) generalization of a stone conservative philosopher/political theorist who is one of the biggest enemies of everything biotech, specially in the journal famous for its cool technophilia and geekery. /Think about what Fukuyama would say on Venter’s synthetic biology, a biological example used by Anderson in his essay/

On the other hand, Anderson may have right about other sciences he mentions, like psychology, or sociology, and most notably his scientific home, physics but individually conducted, hypothesis based experimental biology is far from being over, or at least that is my very academic experience. Although more & more wet lab biologists find themselves dried out and choose bioinformatics or computational biology instead it is still totally possible to master hypothesis based, carefully designed (good controls!!!), beautiful experiments using less than say 20 main variables with the expected outcome and I see no reason why it should turn out to be the other way.

Even myself (trained and worked as an experimental scientist so far minus 5 year philosophy) have now growing bioinformatics, database building and large-scale pattern seeking dreams, but it’s due to my main commitment – the reason I chose biology once – to robust healthy life extension. One reason scientists are usually switching to computational heavy problem solving in the field of the biomedical sciences is that it is the way to tackle real big, complex issues, like cancer or diabetes or…..aging. But not every scientist would like to solve problems like these.

Anderson is right about the tendencies and trends (yes, science can and must learn a lot from Google, isn’t that obvious?) but even the petabyte age will need and produce Einstein like scientists, brilliant theoreticians, creative thought experimenters, methodological outliers and not just perfect statisticians.

Update: What about the large-scale statistical models? – asks Fernando Pereira via Three-Toed Sloth via Bill Hooker on FriendFeed, where people are mostly underwhelmed by Anderson’s arguments.

6 Responses to “Petabyte Age Wiredesque lesson on what science can learn from Google”

  1. Deepak said

    Attila, there is a lot more to biological data than just managing it. What are the biological problems you are trying to solve? What questions are you trying to answer? Do you know how to present the information to different groups in your organization? These are problems that a non-scientist cannot answer. So what we need is a marriage of the two minds. I doubt Google has any interest in being a scientific company per se. It’s hard work and they’ll essentially have to be two companies. IBM has done some good science in its day (some brilliant science actually), but that’s still not it’s core business and shouldn’t be.

    What we need is the mindset, and the realization that we need to think about new ways of managing data and making the results available to scientists and decision makers. We need to start thinking about distributed computing paradigms, but those are thing science has always learned from the tech industry. We’ve just been too slow doing that.

  2. Deepak said

    I’ll add that there are multiple avenues for Google to be an enabling platform (e.g. android, palimpset, google code, google scholar, etc).

  3. Jon Rowley said

    I haven’t read my Wired yet, but I loved the cover and have been thinking about it since I got it – of course with respect to biotechnology. There has definitely been a trend in biotechnology towards screening-type experiments and away from individual hypothesis driven experiments. There are arguments both ways that high throughput screening has led to the Demise of Big Pharma, but there are some cool ways to use the technology. I was involved in the first group to apply screening with application towards ‘discovering’ environments to drive stem cell fate which is currently being used internally for BDs cell therapy initiative and for external commercialization(http://www.bd.com/technologies/discovery_platform/screening.asp). While the platform allows for screening hundreds of conditions, there are still a lot of hypotheses being asked per experiment. BD had a 10+ person informatics team to customize the informatics for the 30+ strong biology group – so it was very obvious the importance of computing power to push that forward. I could totally see how that or other platforms could be extended and with infinite resources (~$300M/year?) you could just screen everything and look for what works. While that isn’t happening today, I can see how 10-30 years down the line (where Chris Anderson & Wired tries to ‘predict’) that non-hypothesis driven work will dramatically increase the ‘market share’ of even academic science. With that, there is the potential of a snowball effect, fewer scientists learning how to shape and test hypotheses, more screening/informatics approaches, towards the ‘end of science’ – that is the story that has developed in my head since seeing Wired’s cover. So while the End of Science is far from a reality, it isn’t so far fetched that it won’t happen in this milenium.

  4. Jim H said

    I find it interesting that this concept appears diametrically opposed to the DIYbio movement: generation of petabytes of computational power will require enormous wealth. I understand the idea of pulling this power through pooled resources, but the necessity of a massive, centralized processor seems insurmountable.

    Am I missing something?

  5. Ward said

    Attila,

    very interesting post. Ever since reading your first post regarding the potential Google has in the science/health field I agreed with you – because I had similar thoughts. The data housing and mining of this data with Google’s infrastructure and brain power could lead to valuable insights and lead to a whole new field of science (even a paradigm shift).

    I will have to run out and get the new Wired magazine.

    ps GlaxoSmithKline just released a huge data base of cancer cell line data (http://blog.wired.com/wiredscience/2008/06/massive-cancer.html)

    brainhealthhacks.com

  6. [...] at PIMM, Attila in his blog discusses a feature article in Wired magazine and offers a potential solution to this data [...]

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>