Online Magazine of the European Molecular Biology
Building bioinformatics capacity in Latin America
The CABANA project was born out of a desire to strengthen bioinformatics capacity and accelerate data-driven biology in Latin America.
Having one bioinformatician on a research team used to be enough, but as biology becomes more data-driven, bioinformaticians are in high demand.
This is certainly the case in Latin America, where the data revolution is well underway in the life sciences. But one thing that is still missing is a critical mass of bioinformaticians to manage, analyse, and share the data more widely.
Bioinformatics – getting insights from big data
Bioinformatics is the science of analysing, managing, storing and sharing biological data, usually on a large scale. Discovery and innovation rely on scientists sharing the data generated by their experiments; this way, the data can be reused by others to explore different scientific questions and gain new insights.
But as the number and scale of experiments increase – and more data are generated – the need for specialist databases, analysis tools, and data experts becomes urgent.
Bioinformatics enables scientists to exploit the large datasets available in public data resources such as the ones managed by EMBL’s European Bioinformatics Institute (EMBL-EBI), to answer diverse research questions, for example:
How and why do we differ from one another?
Why are some people more susceptible to disease?
Why do some drugs work for certain people but not for others?
How can we make crops more resistant to a changing climate?
What microorganisms live in the oceans and what functions do they fulfil?
How do we identify and monitor biodiversity?
Bioinformatics is essential for cutting-edge research, such as drug discovery – developing new medicines or repurposing existing ones to treat different diseases – and ‘white biotechnology’, which aims to develop more useful products with less energy, while generating less waste. This could include, for example, enzymes that can degrade plastic, improve cleaning products to make them less toxic, etc.
The applications of bioinformatics are endless, but to unlock them, researchers first need to be able to find and analyse molecular data from public databases. EMBL-EBI’s Training team enables life scientists to do exactly this and make the most of the biological data that is openly available, in order to expand their science and gain new insights.
Strengthening bioinformatics capacity
The CABANA project was born out of a desire to strengthen bioinformatics capacity and accelerate data-driven biology in Latin America. The project was developed by nine research organisations in the region and the EMBL-EBI Training team. Their vision resonated with UK Research and Innovation which, in 2017, funded the project through the Global Challenges Research Fund. Five years later, the project has come to an end, but the impact of of this EMBL-EBI collaborative work continues.
“Our aim was to help researchers in Latin America participate in large global consortia equitably, and to contribute to solving global challenges, specifically biodiversity, food security and communicable diseases,” explained Cath Brooksbank, Head of Training at EMBL-EBI. “The only way to solve these big challenges is by bringing together a wealth of knowledge and experiences from all over the world; it simply cannot be done without our colleagues in Latin America.”
Alfredo Herrera-Estrella, CABANA Co-investigator at the National Laboratory of Genomics for Biodiversity, in Mexico said, “Through CABANA, we have the opportunity through genomics and bioinformatics in particular to find ways to contribute to solving or facing the problem of climate change.”
Training with an impact
“A lot of thought went into the project planning, to ensure the impact would be widespread and long-term,” explained Brooksbank. “We knew our funding was limited, so with our partners, we decided to develop a network of people and institutes across Latin America, which would continue to exist after the funding ran out.
“This way, the network would benefit from an initial wave of bioinformatics training, supported by knowledge exchange and links to other international consortia. As the project came to an end, the network could continue to jointly apply for funding to support new initiatives in their areas of interest.”
CABANA enabled the delivery of many bioinformatics workshops in Latin America, as well as the creation of bespoke e-learning courses and train-the-trainer activities. At the heart of the project were secondments that enabled Latin American scientists to visit other research institutes and embed themselves in another lab. The project also supported seven collaborative research projects in the region.
“Projects like CABANA also allow people in Latin America to build further bonds between bioinformatics groups, to be a part of this community and carry out research using bioinformatics,” said Guillermo Rangel-Piñeros, CABANA Secondee from the University of Los Andes in Colombia.
Guillermo Rangel-Piñeros, CABANA secondee from University of Los Andes, Columbia, now a postdoctoral researcher at the University of Copenhagen. Credit:
Supporting the pandemic response
When COVID-19 was declared a global pandemic, the CABANA partners were among the many scientists who wanted to support the local and international response. CABANA allocated five of its partners a large innovation award for this purpose. Under the coordination of Alfredo Herrera in Mexico, they supported the sequencing and analysis of COVID-19 samples from the region.
Despite some of the institutes not previously specialising in infectious disease, they built on their genomics and bioinformatics expertise to develop the open source PiPeCov pipeline to analyse COVID-19 genomic data. The project aimed to help understand the evolution and distribution of the virus in Latin America and contribute data from the region to international databases such as the European COVID-19 Platform, which was set up by EMBL-EBI in 2020.
“The pandemic was a stress test of the CABANA network,” said Brooksbank. “It was amazing to see our partners spring into action, and use their skills and expertise to address the unfolding global health crisis.”
A case for open data
The Latin America and Carribean region is home to 60% of terrestrial life, many freshwater and marine species, as well as a multiethnic human population. But despite this diversity, the continent isn’t well represented in open biological databases such as those managed by EMBL-EBI.
By involving Latin American researchers in large, collaborative projects and supporting them to generate and submit data to open databases whenever possible, EMBL hopes to make biological data generated in Latin America more easily accessible to everyone, while also enabling Latin American researchers to make the most of data generated elsewhere. Open access to data and tools has the potential to accelerate the rate of research and discovery worldwide.
“What we’ve learned from the COVID-19 pandemic is that sharing data is important to all of us, and no one should be working alone on this. All the new COVID-19 information that researchers have generated has had an impact on the health of everyone in every country,” said Josefina Campos, Coordinator of Genomics and Bioinformatic Platforms at INEI-ANLIS in Argentina.
When the going gets tough, the tough get creative
When the COVID-19 pandemic hit, CABANA was in full swing, with many in-person training courses and secondments left to go. This had been CABANA’s selling point: opportunities for researchers to visit other labs and embed themselves into a different research group and a new approach. But pandemic travel restrictions made this impossible.
The team had no choice but to adapt to the new pandemic reality – a world where people would be unable to travel for an unknown period of time. They put their heads together to figure out how to make the training sessions virtual, while maintaining their interactive nature, and how secondments could continue remotely.
“At first, we thought the pandemic would mean we had to put CABANA on hold,” explains Brooksbank. “But after the initial shock, we started to think of options to continue the project remotely, making the most of virtual collaboration tools. After a few intense months of fighting fires, everything seemed to fall into place. In fact, shifting the focus to online activities meant we could make our training accessible to a wider range of researchers.”
This is only the beginning
As the project came to an end in May 2022, one question remained: Would the CABANA network continue to exist, or would it fizzle out?
“We were pleased to see that the appetite for new collaborations had not diminished,” explains Ian Willis, CABANA Project Manager at EMBL-EBI. “The network has already submitted several funding proposals in the key interest areas. These include a project to sequence cacao species in four Latin American countries, and to improve how COVID data collected in the region is analysed locally, and shared with the world more widely.
“It’s excellent to see the experience gained during CABANA applied more widely. We’re also hoping that the network will expand to include other countries in the region, and partnerships on other key themes. We hope to see a snowball effect as more and more bioinformaticians are trained and projects are funded.”
“We wanted CABANA to be a framework for future projects to build bioinformatics capacity; the idea was for it to be easily replicated in other regions” explained Brooksbank. “In our experience these are the key training requirements to build capacity: focusing on thematic areas, secondments to encourage knowledge exchange, train-the-trainer sessions to improve capacity quickly, and access to e-learning materials – ideally translated into the local language.”
The days when entire institutes and companies only had a token bioinformatician on the team are long gone. As big data takes its place at the heart of the life sciences, computational skills are more important than ever.