The current operational idea behind Google’s Palimpsest Project is to ship 3TB (terrabyte= 1.0995 x 1012 bytes) drive array (Linux RAID-5) for scientists, who upload their data and FedEx the hard drives back to Google. Google then make those data publicly available and manageable. This file transfer method was heavily criticized by Dai Davies in Ars Technica. “This is a bit like using Flintstones technology in the Internet era.” although there are arguments behind this choice, see Jon Trowbridge’s 11th slide. Forget about this uploading/updating problem to the amount of this post. Here I only care about the end-user, the scientist who is provided with whatever tool to upload 3TB of research, measurement data on behalf of her research facility. While for an astronomer hundreds of gigabytes/day can seem as a normal output my angle is on how a life scientist and his data fits to this 3TB equation and eventually to the Palimpsest Project. Accordingly, my question is this:
How much data is produced by an average wet lab scientist, biomedical researcher/day?
I try to come out with a rough guess in the hope of subtle corrections from the commenters: I assume the following (rather busy) daily production of data by our average scientist in an average lab:
running a gel – making a gel photo 300 KB .tiff
preparing 5 samples for sequencing at the core facility, output: 500 KB – 1MB ab1, seq files
FACS sorting of different cell populations: 1 MB of special FACS files and 100 KB pdf out of it





