Predicting how gene expression varies
Discoveries at EMBL will help researchers to interpret one of the most common types of experiments in genomics and medical studies
By studying the genomes of fruit flies, scientists from the Zaugg and Furlong groups at EMBL Heidelberg have identified a set of genomic features that can predict how much the expression of a gene varies between individuals in a species. Their findings, published in Molecular Systems Biology, show that these same features can also be used to predict whether the expression of specific genes is likely to respond to environmental changes. This will help researchers to interpret a common type of experiment known as RNA sequencing (RNA-seq), which is used to compare levels of gene expression between experimental groups.
Adapting to the environment
The study was motivated by one of the fundamental questions in biology: how do organisms respond to environmental changes like temperature or pressure in order to survive? At the genetic level, this depends on changing gene expression – the process in which the instructions encoded in a gene are converted into a functioning product, like a protein. Gene expression is regulated at multiple levels, for example the spatial rearrangement of chromatin – the complex of DNA and its associated proteins – activation of DNA regions known as promoters, or by the activity of proteins known as repressors. Whether you are a fly or a human, the expression of some genes must be tightly regulated, while the organism needs mechanisms to rapidly change the expression of other genes to respond to its environment.
“If you have an organism that is developing from a single cell, some genes have to respond to environmental changes if the organism wants to survive,” says EMBL group leader Judith Zaugg, who co-led the study. “But, at the same time, there are nearby genes that are responsible for developing the brain, for example, or some other specific cells. These have to stay within their controlled range and should not be affected by changes in the regulation patterns. The question is, how does the organism manage to do that?”
Finding predictive features
The scientists examined embryos of the fruit fly Drosophila, using a machine learning approach to identify the features that are fundamental to gene expression variation.
“We had 75 ‘individuals’ – fruit fly embryos with differences in their genetic background,” explains Olga Sigalova, a PhD student in the Furlong and Zaugg groups. “For each individual, we could measure the expression level of specific genes in the embryo. This meant we could also study their variation: which genes had similar and which had different expression levels at the embryo stage across the 75 individuals.”
By doing this, the scientists were able to identify a set of around 100 genetic features that made the expression of some genes especially robust, allowing them to maintain their normal function while the organism reacts to environmental change. Knowing these features now makes it possible to predict whether a gene will show high or low expression variation – also known as expression ‘noise’.
“Although enhancers are essential to tell genes when and where to be expressed,” says EMBL group leader Eileen Furlong, who co-led the study, “I was very surprised when our model uncovered promoter features as the most important for expression noise.” Promoters are sequences of DNA at the beginning of a gene where proteins bind to initiate the gene’s expression. The scientists found that these promoter regions are similarly useful for predicting gene expression variation in humans. The fact that this is observed in both fruit flies and humans indicates that this is an ancient mechanism to control noisy gene expression, across a huge evolutionary timespan.
Knowledge of these genetic features will have a significant impact on one of the most common types of experiments in the genomics and medical community, known as differential expression experiments. These experiments are used to compare gene expression levels between two groups, for example patients receiving treatment and a control group.
Using the set of features that the scientists identified, it is possible to predict many of the changes that are typically observed in differential expression experiments. This will help researchers to interpret these kinds of experiments, because they can now identify which genes are variable by nature and which variations are specific to the experimental conditions, such as a particular medical treatment.
The researchers also discovered that a specific set of genetic features were characteristic of genes that are known to be targeted by existing drugs, which can provide additional in silico assessment of novel potential drug targets.