{"id":15989,"date":"2019-07-10T14:19:10","date_gmt":"2019-07-10T12:19:10","guid":{"rendered":"https:\/\/news.embl.de\/?p=15989"},"modified":"2024-03-22T10:56:32","modified_gmt":"2024-03-22T09:56:32","slug":"programming-language","status":"publish","type":"post","link":"https:\/\/www.embl.org\/news\/science\/programming-language\/","title":{"rendered":"Programming: language"},"content":{"rendered":"<p>\u201cWe have 35 million records; that\u2019s about seven times the size of English Wikipedia,\u201d says Maria Levchenko, community manager at <a href=\"http:\/\/europepmc.org\">Europe PMC<\/a>. \u201cWe\u2019re in a very good position to utilise text mining.\u201d<\/p>\n<p>Hosted by EMBL\u2019s European Bioinformatics Institute (EMBL-EBI), Europe PMC is a database for life science literature. It aims to provide free, worldwide access to scientific research. To handle its vast collection of textual data, Europe PMC is one of the increasing number of organisations capitalising on the technological gold rush of text mining: using computer software to comb through existing text and extract new knowledge.<\/p>\n<p>One of Europe PMC\u2019s main goals is to use text mining to accelerate scientific discovery. \u201cYou could be researching genes, proteins or organisms, and with our tool SciLite [scientific highlighter] you can see them at a glance,\u201d says Levchenko. \u201cWe also link publications to data so it\u2019s easy to go from one to the other.\u201d These tools help scientists generate new insights from existing research, without spending the many lifetimes it would take to read all of the relevant scientific publications themselves.<em><br \/>\n<\/em><\/p>\n<figure id=\"attachment_15996\" aria-describedby=\"caption-attachment-15996\" style=\"width: 622px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/news.embl.de\/wp-content\/uploads\/2019\/06\/Text-mining_01.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-15996\" src=\"https:\/\/news.embl.de\/wp-content\/uploads\/2019\/06\/Text-mining_01-300x150.png\" alt=\"Stylised graphic displaying scientist writing with hot beverage while a machine picks out the important bits of a text for them.\" width=\"622\" height=\"311\" srcset=\"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/Text-mining_01-300x150.png 300w, https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/Text-mining_01-768x384.png 768w, https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/Text-mining_01-1024x512.png 1024w\" sizes=\"auto, (max-width: 622px) 100vw, 622px\" \/><\/a><figcaption id=\"caption-attachment-15996\" class=\"wp-caption-text\">Text mining technology at Europe PMC speeds up scientific discovery by identifying key elements from large collections of text. IMAGE: Europe PMC<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>Europe PMC is also a key contributor to <a href=\"https:\/\/www.opentargets.org\">Open Targets<\/a> \u2013 a collaboration between industrial and academic institutions uncovering new links between genes and diseases. By combining genetic theory and text mining, Open Targets hopes to identify genes that could be potential targets for new disease treatments. \u201cIt\u2019s a public\u2013private partnership,\u201d says Levchenko. \u201cIt\u2019s companies working together to make discoveries happen, and the lion\u2019s share of its new gene\u2013disease associations comes from text mining here at Europe PMC.\u201d<\/p>\n<h3>Reading between the lines<\/h3>\n<p>However, handling scientific publications presents a unique set of challenges. \u201cBiology literature can be very messy from a text mining standpoint,\u201d says Xiao Yang, Europe PMC\u2019s text mining specialist. \u201cThere are lots of abbreviations, acronyms and big, ambiguous words.\u201d Context is also important. \u201cThe same gene in <em>Drosophila\u00a0<\/em>and humans has the same name,\u201d says Levchenko, \u201cbut you often need to distinguish between these two very different things.\u201d<\/p>\n<p>There is also more to finding new gene\u2013disease associations at Open Targets than just searching publications for mentions of genes and diseases. For example, when analysing \u201cGene A does not cause disease B\u201d, the computer must be able to tell that the relationship is negative. In order for Europe PMC to overcome these challenges, their computers need a deeper understanding of how we communicate.<\/p>\n<blockquote><p>Any sort of input can be handled with the same computer science.<\/p><\/blockquote>\n<p>\u201cHuman language is ambiguous, fuzzy and imprecise,\u201d says Katja Ovchinnikova, natural language processing (NLP) expert at EMBL Heidelberg. NLP is the area of computer science that translates the meaning behind our natural language \u2013 how we typically speak to one another \u2013 into a form that computers can understand and use. It often involves the use of machine learning algorithms that give a computer an \u2018intuition\u2019 for language, like a child learning their mother tongue. \u201cYou don\u2019t say to children, \u2018These words mean the same thing, these words mean opposite things,\u2019\u201d says Ovchinnikova. \u201cThey just listen to speech and pick it up.\u201d Like children, the computers used by modern biologists learn how to understand our language by experiencing a million conversations.<\/p>\n<h3>Fluent microbe<\/h3>\n<p>The cornerstone of NLP is identifying how different words are related. One way to do this is by studying word context: which other words does a word often appear close to? Strings of words that appear together often enough to provide more information about a text than the individual words separately are called <em>n<\/em>-grams. For example, \u2018New York City\u2019, a 3-gram, tells you more about the locations of apartments in a database than its individual words.<\/p>\n<figure id=\"attachment_16010\" aria-describedby=\"caption-attachment-16010\" style=\"width: 300px\" class=\"wp-caption alignleft\"><a href=\"https:\/\/news.embl.de\/wp-content\/uploads\/2019\/06\/GephiNoLabels2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-16010 size-medium\" src=\"https:\/\/news.embl.de\/wp-content\/uploads\/2019\/06\/GephiNoLabels2-300x300.png\" alt=\"A spider web-like illustration of a network of connected objects.\" width=\"300\" height=\"300\" srcset=\"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/GephiNoLabels2-300x300.png 300w, https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/GephiNoLabels2-150x150.png 150w, https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/GephiNoLabels2-768x768.png 768w, https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/GephiNoLabels2.png 1024w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-16010\" class=\"wp-caption-text\">NLP algorithms identify how strongly words are related by measuring how often they appear close together in text. IMAGE: Josh Tapley\/EMBL<\/figcaption><\/figure>\n<p><em>n<\/em>-grams can even be applied beyond conventional words. The Iqbal group at EMBL-EBI use <em>n<\/em>-grams \u2013 known as <em>k<\/em>-mers in computational biology \u2013 in their search engine <a href=\"https:\/\/news.embl.de\/science\/a-dna-search-engine-for-microbes\/\">Bitsliced Genomic Signature Index<\/a> (BIGSI). BIGSI substitutes microbial genetics for human language, using genes as <em>k<\/em>-mers and their nucleotides as words. Seeing a gene for antibiotic resistance as a <em>k<\/em>-mer of its nucleotides and searching for it with BIGSI quickly shows you all of the datasets and species in which this gene has been reported before.<\/p>\n<p>The big challenge when switching from human to genetic language is that new microbial genomes often contain new \u2018languages\u2019 that have never been seen before. BIGSI, unlike many NLP technologies associated with human language, was therefore developed with an emphasis on scalability to rapidly expanding vocabularies.<\/p>\n<h3>Beyond language<\/h3>\n<p>But why stop at words and letters? \u201cAny sort of input can be handled with the same computer science,\u201d says Ovchinnikova. She and the Alexandrov team at EMBL Heidelberg are using NLP algorithms to extract knowledge from a large-scale community knowledge base for spatial metabolomics called <a href=\"https:\/\/metaspace2020.eu\">METASPACE<\/a>.<\/p>\n<p>METASPACE contains spatial maps of the metabolites in many types of tissue. Metabolites are small molecules that are used by cells to steer their internal processes such as energy production, anti-tumour activity or intracellular communication. Hundreds of scientists from around the world use METASPACE to share the metabolomics data they produce using MALDI imaging mass spectrometry (MALDI-IMS).<\/p>\n<figure id=\"attachment_16025\" aria-describedby=\"caption-attachment-16025\" style=\"width: 300px\" class=\"wp-caption alignright\"><a href=\"https:\/\/news.embl.de\/wp-content\/uploads\/2019\/06\/MALDI-imaging-small-file.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-16025 size-medium\" src=\"https:\/\/news.embl.de\/wp-content\/uploads\/2019\/06\/MALDI-imaging-small-file-300x194.jpg\" alt=\"MALDI-IMS technology, a large mass spectrometer next to two computer screens displaying data.\" width=\"300\" height=\"194\" srcset=\"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/MALDI-imaging-small-file-300x194.jpg 300w, https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/MALDI-imaging-small-file.jpg 750w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-16025\" class=\"wp-caption-text\">MALDI-IMS technology in the lab. IMAGE: Alexandrov team\/EMBL<\/figcaption><\/figure>\n<p>MALDI-IMS data represent a tissue as a 2D grid of pixels, each about the size of a typical cell. A laser is used to release molecules from the area of tissue corresponding to each pixel. These molecules are then sucked into a mass spectrometer and analysed. The resulting mass spectra reveal which molecules were present in each pixel.<\/p>\n<p>Making sense of MALDI-IMS data can be a challenge. However, the team has found that the NLP algorithm <a href=\"https:\/\/code.google.com\/archive\/p\/word2vec\/\">Word2Vec<\/a>, originally developed at Google to measure the relationships between words, is surprisingly well suited for mining the terabytes of data in METASPACE. Analysing spatial metabolomics data, it turns out, has a key similarity with analysing textual data: both aim to find patterns in datasets full of a large number of related but spatially distributed objects.<\/p>\n<h3>Cell.txt<\/h3>\n<p>Word2Vec uses a sliding \u2018window\u2019 that moves across a body of text and records which words are often found close together. \u201cIf two words occur in the same window, they are said to occur in the same context,\u201d says group leader Theodore Alexandrov. \u201cIf words often occur in the same context, they are related.\u201d<\/p>\n<p>The algorithm has proved adaptable to a wide range of research topics. \u201cWe\u2019re modelling a cell as a text document and a metabolite as a word in this document. We want to find functionally related metabolites, so we\u2019re applying Word2Vec to the 2D spatial context of metabolites seen in the MALDI-IMS data.\u201d<\/p>\n<figure id=\"attachment_16007\" aria-describedby=\"caption-attachment-16007\" style=\"width: 598px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/news.embl.de\/wp-content\/uploads\/2019\/06\/spatial-heterogenity-of-cells-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-16007\" src=\"https:\/\/news.embl.de\/wp-content\/uploads\/2019\/06\/spatial-heterogenity-of-cells-1-300x194.png\" alt=\"A cluster of cells are coloured according to the predominant metabolite inside them. Here, a lone green cell is surrounded by red ones with more greens visible at the edges of the image.\" width=\"598\" height=\"386\" srcset=\"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/spatial-heterogenity-of-cells-1-300x194.png 300w, https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/spatial-heterogenity-of-cells-1-768x496.png 768w, https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/spatial-heterogenity-of-cells-1-1024x661.png 1024w, https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/spatial-heterogenity-of-cells-1.png 1802w\" sizes=\"auto, (max-width: 598px) 100vw, 598px\" \/><\/a><figcaption id=\"caption-attachment-16007\" class=\"wp-caption-text\">Illustration of the spatial heterogeneity of two types of co-cultured cells. The Alexandrov team uses MALDI-IMS to look at the metabolites in each cell and predict which cell is of which type. IMAGE: Alexandrov team\/EMBL<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>Metabolites that are identified together when a cell performs a particular task could be a part of, or related to, the same reaction or metabolic pathway. Cancer cells, for example, have their metabolisms reprogrammed and accumulate particular metabolites, sometimes called \u2018oncometabolites\u2019, that are remarkably different from those found in healthy cells. The team aims to use the data from METASPACE to build a Word2Vec-powered network of metabolite relationships. By looking at which metabolites are related to known oncometabolites, this network will hopefully allow scientists to identify new ones not yet associated with cancer.<\/p>\n<p>Research like this has great potential for other applications in the clinic. But the current focus for Alexandrov and Ovchinnikova is to explore how combining NLP algorithms with biological techniques can help answer fundamental research questions. Indeed, these algorithms \u2013and the computing power that drives them \u2013are helping researchers throughout EMBL to take the science designed to help computers make sense of language and use it to deepen our understanding of biology.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>How computer processing of human language is harnessed by EMBL scientists<\/p>\n","protected":false},"author":67,"featured_media":15992,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2,17591],"tags":[416,125,189,36,428,43,779,219,763],"embl_taxonomy":[],"class_list":["post-15989","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-science","category-science-technology","tag-alexandrov","tag-big-data","tag-computational-biology","tag-embl-ebi","tag-europepmc","tag-heidelberg","tag-iqbal","tag-metabolomics","tag-open-targets"],"acf":{"vfwp-news_embl_taxonomy":null,"featured":null,"show_featured_image":null,"field_target_display":"embl","field_article_language":{"value":"english","label":"English"},"article_intro":"How computer processing of human language is harnessed by EMBL scientists","related_links":"6","source_article":null,"in_this_article":null,"press_contact":null,"article_translations":null,"languages":null},"embl_taxonomy_terms":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Programming: language | EMBL<\/title>\n<meta name=\"description\" content=\"How computer processing techniques designed for analysing human language are being harnessed for biological research at EMBL\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.embl.org\/news\/science\/programming-language\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Programming: language | EMBL\" \/>\n<meta property=\"og:description\" content=\"How computer processing techniques designed for analysing human language are being harnessed for biological research at EMBL\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.embl.org\/news\/science\/programming-language\/\" \/>\n<meta property=\"og:site_name\" content=\"EMBL\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/embl.org\/\" \/>\n<meta property=\"article:published_time\" content=\"2019-07-10T12:19:10+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-03-22T09:56:32+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/EMBLetc_wordcloud_for_web-ib.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"620\" \/>\n\t<meta property=\"og:image:height\" content=\"425\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Josh Tapley\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@embl\" \/>\n<meta name=\"twitter:site\" content=\"@embl\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Josh Tapley\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"NewsArticle\",\"@id\":\"https:\/\/www.embl.org\/news\/science\/programming-language\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.embl.org\/news\/science\/programming-language\/\"},\"author\":{\"name\":\"Josh Tapley\",\"@id\":\"https:\/\/www.embl.org\/news\/#\/schema\/person\/d242d2d21f1166a7e8e67e3e28fd5479\"},\"headline\":\"Programming: language\",\"datePublished\":\"2019-07-10T12:19:10+00:00\",\"dateModified\":\"2024-03-22T09:56:32+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.embl.org\/news\/science\/programming-language\/\"},\"wordCount\":1378,\"publisher\":{\"@id\":\"https:\/\/www.embl.org\/news\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.embl.org\/news\/science\/programming-language\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/EMBLetc_wordcloud_for_web-ib.jpg\",\"keywords\":[\"alexandrov\",\"big data\",\"computational biology\",\"embl-ebi\",\"europepmc\",\"heidelberg\",\"iqbal\",\"metabolomics\",\"open targets\"],\"articleSection\":[\"Science\",\"Science &amp; Technology\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.embl.org\/news\/science\/programming-language\/\",\"url\":\"https:\/\/www.embl.org\/news\/science\/programming-language\/\",\"name\":\"Programming: language | EMBL\",\"isPartOf\":{\"@id\":\"https:\/\/www.embl.org\/news\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.embl.org\/news\/science\/programming-language\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.embl.org\/news\/science\/programming-language\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/EMBLetc_wordcloud_for_web-ib.jpg\",\"datePublished\":\"2019-07-10T12:19:10+00:00\",\"dateModified\":\"2024-03-22T09:56:32+00:00\",\"description\":\"How computer processing techniques designed for analysing human language are being harnessed for biological research at EMBL\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.embl.org\/news\/science\/programming-language\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.embl.org\/news\/science\/programming-language\/#primaryimage\",\"url\":\"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/EMBLetc_wordcloud_for_web-ib.jpg\",\"contentUrl\":\"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/EMBLetc_wordcloud_for_web-ib.jpg\",\"width\":620,\"height\":425,\"caption\":\"A word cloud showing the most frequently used words in issue 93 of the EMBLetc. magazine. Text mining can quickly tells us a lot about the content of a text. IMAGE: Josh Tapley\/EMBL\"},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.embl.org\/news\/#website\",\"url\":\"https:\/\/www.embl.org\/news\/\",\"name\":\"European Molecular Biology Laboratory News\",\"description\":\"News from the European Molecular Biology Laboratory\",\"publisher\":{\"@id\":\"https:\/\/www.embl.org\/news\/#organization\"},\"alternateName\":\"EMBL News\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.embl.org\/news\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.embl.org\/news\/#organization\",\"name\":\"European Molecular Biology Laboratory\",\"alternateName\":\"EMBL\",\"url\":\"https:\/\/www.embl.org\/news\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.embl.org\/news\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2025\/09\/EMBL_logo_colour-1-300x144-1.png\",\"contentUrl\":\"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2025\/09\/EMBL_logo_colour-1-300x144-1.png\",\"width\":300,\"height\":144,\"caption\":\"European Molecular Biology Laboratory\"},\"image\":{\"@id\":\"https:\/\/www.embl.org\/news\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/embl.org\/\",\"https:\/\/x.com\/embl\",\"https:\/\/www.instagram.com\/embl_org\/\",\"https:\/\/www.linkedin.com\/company\/15813\/\",\"https:\/\/www.youtube.com\/user\/emblmedia\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.embl.org\/news\/#\/schema\/person\/d242d2d21f1166a7e8e67e3e28fd5479\",\"name\":\"Josh Tapley\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.embl.org\/news\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/0622fdbfdbdb2386706fcc255a84dfbeda9dc52061e18574539c7db8be545318?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/0622fdbfdbdb2386706fcc255a84dfbeda9dc52061e18574539c7db8be545318?s=96&d=mm&r=g\",\"caption\":\"Josh Tapley\"},\"description\":\"Josh is a science writer at EMBL with a master's degree in astrophysics. He loves science, education and the 'Eureka!' moment when you wrap your head around that tricky scientific concept.\",\"url\":\"https:\/\/www.embl.org\/news\/author\/josh-tapley\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Programming: language | EMBL","description":"How computer processing techniques designed for analysing human language are being harnessed for biological research at EMBL","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.embl.org\/news\/science\/programming-language\/","og_locale":"en_US","og_type":"article","og_title":"Programming: language | EMBL","og_description":"How computer processing techniques designed for analysing human language are being harnessed for biological research at EMBL","og_url":"https:\/\/www.embl.org\/news\/science\/programming-language\/","og_site_name":"EMBL","article_publisher":"https:\/\/www.facebook.com\/embl.org\/","article_published_time":"2019-07-10T12:19:10+00:00","article_modified_time":"2024-03-22T09:56:32+00:00","og_image":[{"width":620,"height":425,"url":"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/EMBLetc_wordcloud_for_web-ib.jpg","type":"image\/jpeg"}],"author":"Josh Tapley","twitter_card":"summary_large_image","twitter_creator":"@embl","twitter_site":"@embl","twitter_misc":{"Written by":"Josh Tapley","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/www.embl.org\/news\/science\/programming-language\/#article","isPartOf":{"@id":"https:\/\/www.embl.org\/news\/science\/programming-language\/"},"author":{"name":"Josh Tapley","@id":"https:\/\/www.embl.org\/news\/#\/schema\/person\/d242d2d21f1166a7e8e67e3e28fd5479"},"headline":"Programming: language","datePublished":"2019-07-10T12:19:10+00:00","dateModified":"2024-03-22T09:56:32+00:00","mainEntityOfPage":{"@id":"https:\/\/www.embl.org\/news\/science\/programming-language\/"},"wordCount":1378,"publisher":{"@id":"https:\/\/www.embl.org\/news\/#organization"},"image":{"@id":"https:\/\/www.embl.org\/news\/science\/programming-language\/#primaryimage"},"thumbnailUrl":"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/EMBLetc_wordcloud_for_web-ib.jpg","keywords":["alexandrov","big data","computational biology","embl-ebi","europepmc","heidelberg","iqbal","metabolomics","open targets"],"articleSection":["Science","Science &amp; Technology"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.embl.org\/news\/science\/programming-language\/","url":"https:\/\/www.embl.org\/news\/science\/programming-language\/","name":"Programming: language | EMBL","isPartOf":{"@id":"https:\/\/www.embl.org\/news\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.embl.org\/news\/science\/programming-language\/#primaryimage"},"image":{"@id":"https:\/\/www.embl.org\/news\/science\/programming-language\/#primaryimage"},"thumbnailUrl":"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/EMBLetc_wordcloud_for_web-ib.jpg","datePublished":"2019-07-10T12:19:10+00:00","dateModified":"2024-03-22T09:56:32+00:00","description":"How computer processing techniques designed for analysing human language are being harnessed for biological research at EMBL","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.embl.org\/news\/science\/programming-language\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.embl.org\/news\/science\/programming-language\/#primaryimage","url":"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/EMBLetc_wordcloud_for_web-ib.jpg","contentUrl":"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/EMBLetc_wordcloud_for_web-ib.jpg","width":620,"height":425,"caption":"A word cloud showing the most frequently used words in issue 93 of the EMBLetc. magazine. Text mining can quickly tells us a lot about the content of a text. IMAGE: Josh Tapley\/EMBL"},{"@type":"WebSite","@id":"https:\/\/www.embl.org\/news\/#website","url":"https:\/\/www.embl.org\/news\/","name":"European Molecular Biology Laboratory News","description":"News from the European Molecular Biology Laboratory","publisher":{"@id":"https:\/\/www.embl.org\/news\/#organization"},"alternateName":"EMBL News","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.embl.org\/news\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.embl.org\/news\/#organization","name":"European Molecular Biology Laboratory","alternateName":"EMBL","url":"https:\/\/www.embl.org\/news\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.embl.org\/news\/#\/schema\/logo\/image\/","url":"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2025\/09\/EMBL_logo_colour-1-300x144-1.png","contentUrl":"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2025\/09\/EMBL_logo_colour-1-300x144-1.png","width":300,"height":144,"caption":"European Molecular Biology Laboratory"},"image":{"@id":"https:\/\/www.embl.org\/news\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/embl.org\/","https:\/\/x.com\/embl","https:\/\/www.instagram.com\/embl_org\/","https:\/\/www.linkedin.com\/company\/15813\/","https:\/\/www.youtube.com\/user\/emblmedia\/"]},{"@type":"Person","@id":"https:\/\/www.embl.org\/news\/#\/schema\/person\/d242d2d21f1166a7e8e67e3e28fd5479","name":"Josh Tapley","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.embl.org\/news\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/0622fdbfdbdb2386706fcc255a84dfbeda9dc52061e18574539c7db8be545318?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/0622fdbfdbdb2386706fcc255a84dfbeda9dc52061e18574539c7db8be545318?s=96&d=mm&r=g","caption":"Josh Tapley"},"description":"Josh is a science writer at EMBL with a master's degree in astrophysics. He loves science, education and the 'Eureka!' moment when you wrap your head around that tricky scientific concept.","url":"https:\/\/www.embl.org\/news\/author\/josh-tapley\/"}]}},"field_target_display":"embl","field_article_language":{"value":"english","label":"English"},"fimg_url":"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/EMBLetc_wordcloud_for_web-ib.jpg","featured_image_src":"https:\/\/www.embl.org\/news\/wp-content\/uploads\/2019\/06\/EMBLetc_wordcloud_for_web-ib.jpg","_links":{"self":[{"href":"https:\/\/www.embl.org\/news\/wp-json\/wp\/v2\/posts\/15989","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.embl.org\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.embl.org\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.embl.org\/news\/wp-json\/wp\/v2\/users\/67"}],"replies":[{"embeddable":true,"href":"https:\/\/www.embl.org\/news\/wp-json\/wp\/v2\/comments?post=15989"}],"version-history":[{"count":36,"href":"https:\/\/www.embl.org\/news\/wp-json\/wp\/v2\/posts\/15989\/revisions"}],"predecessor-version":[{"id":16609,"href":"https:\/\/www.embl.org\/news\/wp-json\/wp\/v2\/posts\/15989\/revisions\/16609"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.embl.org\/news\/wp-json\/wp\/v2\/media\/15992"}],"wp:attachment":[{"href":"https:\/\/www.embl.org\/news\/wp-json\/wp\/v2\/media?parent=15989"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.embl.org\/news\/wp-json\/wp\/v2\/categories?post=15989"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.embl.org\/news\/wp-json\/wp\/v2\/tags?post=15989"},{"taxonomy":"embl_taxonomy","embeddable":true,"href":"https:\/\/www.embl.org\/news\/wp-json\/wp\/v2\/embl_taxonomy?post=15989"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}