Pimm - Partial immortalization

A Biotech Geek Blogger’s adventures through science, technology and the web…

Archive for the 'bioinformatics' Category


BioBarCamp in the Valley before the SciFoo Camp!

Posted by attilachordash on April 28, 2008

It seems that my favorite ever unconference, the SciFoo Camp will be aroundunconferenced by a BioBarCamp this year. The whole idea of the BioBarCamp is based upon the SciFoo Camp, so it is by no means a competitive but a complimentary event.

From the BarCamp wiki: “The BioBarCamp is an idea (fed by the tweets of the BioTwitterer community) to organize a life sciences - biotechnology - personalized genomics & medicine - bioinformatics unconference at the Bay Area around the 3rd SciFoo Camp time, which is 8-10th August. The SciFooCamp generates a lot of enthusiasm & activity but not just for those who are invited (only 200). On the other hand, it would be nice to organize a bio-related BarCamp, just like the Cambridge BarCamb, in which the bio-related SciFoo Campers and all the other biogeeks could gather together.”

The main activity is happening right now at the public BioBarCamp Google Group. If interested please join there or just follow the discussions. We are right now in the process of finding a proper venue and sponsors and any help would be most welcome. Right now 6 or 7 August seems to be the consensus day and we have a very generous offer from The Institute for the Future via Alex Soojung-Kim Pang in Palo Alto (no response from 23andMe so far, see below).

It’s against a classic Twitter story, just like this before. You can reconstruct the whole conversation with Twitter Search Engine Tweet Scan by searching for terms SciFoo, BioBarCamp, SciBarCamp but here are my selected tweets:

Scene One, 04/10/08 How the idea was born on that day in reverse chronological order:

Scene Two 04/22/08 How the biospecificity and name was born alongside with a possible venue idea: Read the rest of this entry »

Posted in 23andMe, Bay Area, BioBarCamp, Sci Foo, SciFoo, USA, biohacking, bioinformatics, biology, biotechnology, partial immortalization, unconference | 8 Comments »

Human proteome project: 21000 genes/1 protein, 10 years, $1 billion?

Posted by attilachordash on April 23, 2008

In order to have the slightest change to design a robust, systemic life extension technology, we need to accumulate every systemic macromolecular, cellular, tissue- and organ level data of the normal, physiological human body, connect the trillions of nodes with scalable software algorithms and suck out the draft of the proper sequence of consecutive treatment/regeneration steps later. Fortunately not only life extension technology needs systems biology projects (this is not enough for getting grants), but more importantly the effective design of new drug targets and the discovery of disease biomarkers are clearly crying for the systemic level. The urgent diagnostic and therapeutic demands are sufficient to launch international, many-lab projects.

Finally a complete ‘Human Proteome Project’ is in the pipeline (illustration via BioMed Search). It aims the tissue-level complete knowledge of the human proteome revealing “which proteins are present in each tissue, where in the cell each of those proteins is located and which other proteins each is interacting with”. Keep in mind also that around 21′000 human genes encode 1 million different proteins and that the effort cannot localize exactly which cell types in a given tissue is producing which protein. According to Nature’s Helen Pearson: Biologists initiate plan to map human proteome

“Those involved in the draft plan say that a human proteome project is now feasible partly because estimates of the number of protein-coding genes have shrunk. It was once thought that there might be around 50,000 or 100,000, but now, just 21,000 or so are thought to exist, making the scale of human proteomics more manageable. And the group plans to focus on only a single protein produced from each gene, rather than its many forms.

The plan is to tackle this with three different experimental approaches. One would use mass spectrometry to identify proteins and their quantities in tissue samples; another would generate antibodies to each protein and use these to show its location in tissues and cells; and the third would systematically identify, for each protein, which others it interacts with in protein complexes. The project would also involve a massive bioinformatics effort to ensure that the data could be pooled and accessed, and the production of shared reagents.”

The idea is to analyze and list all the proteins manufactured by chromosome 21 within 3 years as a pilot study and then finish the whole project within 10 years. Chromosome 21 is the smallest child in the family and likely contains between 200 and 400 genes, so the pilot study can yield us a couple hundreds proteins. Another powerful idea (actually I prefer this) is to start with the human mitochondrial proteome which is around 1000-1500 proteins as far as I know, that is at least 3 times as many as encoded by chromosome 21. Read the rest of this entry »

Posted in Nature, Nature Publishing Group, bioinformatics, biology, data, partial immortalization, proteome, science, systems biology | 6 Comments »

Blow your Brain Explorer out with the Human Allen Brain Atlas!

Posted by attilachordash on March 23, 2008

At the SciFoo Camp last year at the Googleplex I suggested a little unconference session (ok, there were some slides ready on my MacBook) and one participant was Chinh Dang (another was this inventor) Technology Director of the Allen Institute for Brain Science who made a little intro to the work of the Institute to the 9-10 attendees after this slide of mine:

SciFoo Brain Atlas

Paul Allen is the likable, Steven Wozniak-type co-founder of Microsoft, but I guess a bit richer (once we estimated with a friend of mine that he could buy all the Budapest condos circa 180 times or sg like that).

But instead of doing that he provided $100M - amongst others - in seed money to fund the Allen Brain Atlas.

Read the rest of this entry »

Posted in SciFoo, USA, bioinformatics, biology, biotechnology, brain, open science, science, technology | No Comments »

How much data is produced by a life scientist/day?

Posted by attilachordash on March 3, 2008

3TBThe current operational idea behind Google’s Palimpsest Project is to ship 3TB (terrabyte= 1.0995 x 1012 bytes) drive array (Linux RAID-5) for scientists, who upload their data and FedEx the hard drives back to Google. Google then make those data publicly available and manageable. This file transfer method was heavily criticized by Dai Davies in Ars Technica. “This is a bit like using Flintstones technology in the Internet era.” although there are arguments behind this choice, see Jon Trowbridge’s 11th slide. Forget about this uploading/updating problem to the amount of this post. Here I only care about the end-user, the scientist who is provided with whatever tool to upload 3TB of research, measurement data on behalf of her research facility. While for an astronomer hundreds of gigabytes/day can seem as a normal output my angle is on how a life scientist and his data fits to this 3TB equation and eventually to the Palimpsest Project. Accordingly, my question is this:

How much data is produced by an average wet lab scientist, biomedical researcher/day?

I try to come out with a rough guess in the hope of subtle corrections from the commenters: I assume the following (rather busy) daily production of data by our average scientist in an average lab:

running a gel - making a gel photo 300 KB .tiff

preparing 5 samples for sequencing at the core facility, output: 500 KB - 1MB ab1, seq files

FACS sorting of different cell populations: 1 MB of special FACS files and 100 KB pdf out of it

Read the rest of this entry »

Posted in bioinformatics, biology, biotechnology, data, science, technology | 2 Comments »

Let’s compile a Biotech for IT folks book and publish it!

Posted by attilachordash on February 28, 2008

IT people are the dominant high tech tribe today and especially on the web. But biotechnology (BT) is the next infotech so no wonder that the IT crowd is growingly curious about everything biotagged on the one hand, while they are usually not too savvy in DNA-RNA-protein-organelle-cell-tissue-organ-organism related matters on the other hand. Check for instance Tim O’ Reilly at Nature: science meets bored tech-savvyness to find new things.

And what can biotech bloggers do in order to meet the growing demands: well here is a little conversation from my twitter channel in the last 20 minutes:

Biotech for IT folks

Posted in bioinformatics, biology, biotechnology, blog, o'reilly, open science, open source, science, science blogs, science publishing, technology | 4 Comments »

The human mitochondrial consensus genome sequence by Robert Carter

Posted by attilachordash on February 19, 2008

For historical reasons the standard human mitochondrial sequence, the Revised Cambridge Reference Sequence (rCRS) is a reconstruction of a single European individual’s mtDNA and contains several rare alleles. That’s why many times a usual mtDNA sequence alignment must appeal to phylogenetic historical reconstructions. The rCRS nevertheless provides a uniform nucleotide numbering scheme (0-16569). On the other hand, as there are thousands of high-quality, full-length mitochondrial sequences are now available, Robert Carter thought that it is time to construct and analyze a comprehensive human mitochondrial consensus sequence and published his efforts in Nucleic Acid Research, March, 2007: Mitochondrial diversity within modern human populations The sequence itself available as a supplementary material but with the permission of the author I copy it into this post below.

According to Robert Carter:

So far, all feedback has been good. By introducing the idea of “poly-x” sites (see later), I successfully created a technique that avoids all pre-conceived ideas about genetic history. This also allows one to effectively deal with indels, something that many authors have avoided in the mtDNA literature.

Briefly, 827 sequences were used, a master sequence alignment was created in BioEdit and BioPerl was used for all calculations using the rCRS as a template for nucleotide numbering. Read the rest of this entry »

Posted in bioinformatics, biology, biotechnology, genetics, genomics, mitochondria, science | No Comments »

MitoWheel 1.0: the human mitochondrial genome just got visual!

Posted by attilachordash on January 24, 2008

MitowheelYour 16569 basepair long human mitochondrial genome does a lot for you and tells a lot about you. It encodes protein subunits playing crucial role in the production and conversion of ATP, the body’s main chemical energy currency. On the other hand the actual sequence of one’s mitochondrial DNA in a particular tissue or cell population gives a lot of health associated (mitochondrial diseases, aging) and ancestry information.

So far users were restricted to non-intuitive and visually poor text based databases every time they wanted to take a look on the mitochondrial DNA. But now with MitoWheel version 1.0 (yes, it is beta) the situation is about to change. From now on you can spin MitoWheel and play MitoRoulette (details on the game later)! MitoWheel is a graphical representation of the circular human mitochondrial genome, hence the name. The sequence used is the standard Revised Cambridge Reference Sequence. The 3 main components of the app is: a search box, a sequence bar and the wheel.

MitoWheel is the brainchild of Gábor Zsurka, a human mitochondrial geneticist we’ve already met in the post on The power links of the mitochondriologist. Gábor has been doing 100% of the programming too. Disclaimer: With some suggestions and testing I qualified myself to become a member of the developer team! The wheel was made with Flash Professional 8.0 and the code harnessed the power of Actionscript , a scripting language designed specially for Flash.

What are the basic things you can do with MitoWheel, if you are a scientist in the lab or a student in the seminar or just a tech geek eager to learn biology?

- spin the wheel: browse the genome by clicking the left and right arrows in the sequence bar

mitowheel sequence bar

- search for a nucleotide position or sequence in the search box with numbers: input: 15450 output: T

Read the rest of this entry »

Posted in DNA, bioinformatics, biology, biotechnology, mitoWheel, mitochondria, science | 2 Comments »

Xoogler goes biotech

Posted by attilachordash on January 23, 2008

I found this quote in John Battelle’s blog from a recent CNET article on ex-Googlers by Stephanie Olsen, but I’d like to repeat it just with a different emphasis as I found all the other parts interesting for the biotech community except the one sentence bolded by Battelle. So I bolded those parts: Read the rest of this entry »

Posted in Silicon Valley, bioinformatics, biotechnology, google, medicine, technology | 2 Comments »

23andMe on the biparental inheritance of mitochondrial DNA and more

Posted by attilachordash on December 10, 2007

23andMelogoIn my former blog post inF.A.Q. for 23andMe: what if I have mitochondrial DNA from Pa? I meditated on 23andMe’s capability of detecting paternal mitochondrial DNA in their customers’ saliva with their Illumina microarray chips scanning around 2000 mitochondrial single nucleotide variants. Published here the initial answer of the 23andMe Editorial Team to this fairly technical, but nevertheless crucial question with permission granted. Besides, I am happy to report that I am working on a blogterview with one of the key member of 23andMe’s Research Team. Hopefully I’ll be able to get back to you with some first-hand information on the science and technology behind the personal genome service of 23andMe and on how 23andMe can facilitate academic research.

Dear Attila Csordas,

Thank you for your interest in 23andMe’s research mission. The question of paternal inheritance of mtDNA is a fascinating one, and the debate in the literature has continued over the past couple of decades. Currently, there is little evidence for paternal inheritance of mtDNA, outside of isolated individuals. However, the array platform lets us resolve multiple SNP states independently. 23andMe’s technology and throughput may indeed provide a novel way to address the question. We will include the question in our consideration of research projects. In the meantime, here are a couple of articles discussing the subject:

Bandelt et al., “More evidence for non-maternal inheritance of mitochondrial DNA?”
Chinnery, “The Transmission and Segregation of Mitochondrial DNA in Homo Sapiens” in Human Mitochondrial DNA and the Evolution of Homo Sapiens.

Sincerely,

The Editorial Team at 23andMe

The question is crucial for a personalized genetics company like 23andMe providing Maternal Ancestry Tree service for the customers based on the exclusively maternal inheritance of mitochondrial DNA. As one of my correspondent partner wrote: Read the rest of this entry »

Posted in 23andMe, DNA, USA, bioinformatics, biology, blogterview, genetics, mitochondria, peer-review, personalized genomics, science, technology | No Comments »

SciFoo Camp, 2007: data (Google) publishing (Nature) geeks (O’Reilly)

Posted by attilachordash on August 9, 2007

SciFoo is over, and I’ve just arrived back to New Orleans from SF. First of all: a big thanks for the organizers (Chris DiBona, Timo Hannay, Tim O’Reilly, Google, Nature, O’Reilly) and campers, it was really the highest end. Here is a quick SciFoo key terms summary (photos, detailed accounts later):

“scientific data”

One of the most frequently used key term was “scientific data”. And the question is: how to collect, upload, organize and index them. With the exponentially increasing data sets, that are produced by scientists worldwide, it is obvious that we need really powerful tools to benefit them. After a couple of beta years it is highly probable that Google (according to its mission statement) will offer new ways to manage the enormous amount of valuable scientific data. Without that, the efficiency of the science industry will dramatically decline.

“science publishing”

Yes, the old question ranging from open access science to different pre- and post publishing opportunities, addressing peer-review tools. A new and clear vocabulary is needed. Nature people were honest about the problems, asking for the optimal solutions.

“the geek factor”

Mainstream scientists are rather conservative folks, they can easily have revolutionary thoughts in their niche research fields, but are not too open minded and experimental when it is about new web and technology tools. The alpha geeks from the O’Reilly Media reminded the science population of the SciFoo (not the typical technology neutral mainstream scientists) that there are many innovative things that could be done in and out of science too. (You don’t necessarily need the newest Mac gadgets for that, just try out some mind performance hacks)

Posted in Bay Area, Nature, Nature Publishing Group, Natureplex, SciFoo, Silicon Valley, USA, bioinformatics, california, geek, google, googleplex, linux, networking, open science, open-access, partial immortalization, science, science publishing, unconference | No Comments »

23andMe: the early bird of web based biotech startups

Posted by attilachordash on February 12, 2007

23andMe23andMe is a biotech focused web startup based in Mountain View, California (yes, the Googleplex neighbourhood) self-defined as an early stage startup developing tools and producing content to help people make sense of their genetic information. Our goal is to take advantage of new genotyping technologies and help consumers explore their genetics, informed by cutting edge science. Genome deciphering technologies have reached affordable levels, allowing consumer access. For the individual, such information will provide personal insight into ancestry, genealogy and health. For society, the collection of genotypic and phenotypic information on a large scale will provide scientists with novel avenues for research.”
Briefly, they are concentrating on the enormous genomics data we already have to analyze them for customers. They are probably right, because in biotech, genomics could be the first field that has enough results, easy measurement methods (a little blood or biopsies), infotech background and enough commercial demand to make the business profitable within 1-2 years. Unfortunately, regenerative medicine and the stem cells frontier are not in this position yet. The next business step could be monetizing data from proteomics, transcriptomics. With the promising combination of computer science, biology and informatics 23andMe is an early bird of a biotech-based web domain, because there will be times when all your genes, RNAs, peptides (and in my opinion: cells and tissues) will be taken into account by your initiative to know your future prospects, and a web-based service is a proper choice for managing all of your biodata. Security problems will emerge, of course.
Read the rest of this entry »

Posted in Bay Area, DNA, IT, IT&BT, USA, bioinformatics, biotechnology, business, california, google,