My work involves analysing data from the Tara Oceans Expedition – a project that collected 35,000 seawater samples from all over the world. Part of this work involved sequencing the genomes of all the organisms present in the Tara samples. We sequenced around 40 million genes, using a huge chunk of EMBL’s HPC cluster for several weeks. Jurij in IT Services helped me set this up as a low-priority job to maximise the computational power I could get without causing problems for anyone else.
We’re also doing work on imaging the Tara samples. We’re using machine learning to analyse microscope images and identify the organisms in them. With the computational firepower at EMBL, we’ve been able to do this in high throughput, analysing around 4 terabytes of data in just a few days. Having seen the computational infrastructure that’s available elsewhere, I believe we were able to get this work done much faster at EMBL than would be possible at most other institutions.