Career Accelerator for Research Infrastructure Scientists
Biocurators’ work forms the foundation of many biological databases. Such databases are essential to make sense of the vast quantities of biological data produced in the post-genomic era. To extract information from the scientific literature in a structured format that can serve such databases, biocurators rely on specialized tools. However, an efficient process to retrieve desired information from literature searches is still an ongoing challenge in the biocuration arena.
Under the supervision of Johanna McEntyre and Henning Hermjakob at EMBL-EBI, ARISE fellow Matt Jeffryes is currently developing LitSieve, a literature search tool for biocurators. Using 2 billion text-mined annotations, including genes, species, diseases, and chemicals, LitSieve will allow biocurators to efficiently filter their literature search across the content available in Europe PMC, a comprehensive database of life science literature.
“Although there are a variety of other tools available for biocuration of literature search, to our knowledge, no others can search based on this number of types of annotation,” said Matt.
According to Matt, LitSieve provides consistent and understandable search results, in contrast to some tools powered by machine learning (ML) and large language models (LLM) like ChatGPT. “In these systems, users often don’t understand why the algorithm retrieves certain results or, in some cases, the root causes of their errors,” pointed out Matt. “By offering a user-specified query where biocurators can define their search filters, LitSieve overcomes these issues, providing full transparency and control over one’s search.”
Additionally, LitSieve offers flexible article organization features. For instance, articles found using LitSieve can be saved to lists, allowing biocurators to organize, prioritize, or group related articles as they wish. Additionally, users can add private notes to articles, which they may use to highlight curatable passages from an article or add other pertinent details.
“We have anticipated that by integrating biocuration-related features into a single application, biocuration workflows would be more efficient,” remarked Matt and his collaborators at EMBL-EBI.
“We believe our tool will improve biocuration efficiency. Consequently, this will increase the data available to train machine learning algorithms and develop AI technologies that will progress scientific discoveries, from protein structure prediction to drug design and disease treatment.”