A classification tool for transcription factors

The software diffTF quantifies activity of transcription factors and predicts their mode of action

TF bound to DNA — Transcription factors bound to DNA. IMAGE: Adobe Stock.

Transcription factors are proteins that bind to specific DNA sites and regulate the transcription of genes. Their main function as activators or repressors determines various important cellular processes and varies depending on external factors, such as cell type, and the presence of cofactors or a disease. Many transcription factors remain unclassified – whether they act mainly as repressors or as activators is unknown.

Researchers in the Zaugg group at EMBL Heidelberg have developed a software named diffTF. “It identifies differentially active transcription factors and captures their dominant mode of action, if there is one,” explains Christian Arnold, a bioinformatician in the group. “We classify whether a transcription factor functions mainly as a repressor or as an activator.” diffTF is the first data-driven tool that provides a generalised classification of transcription factors by their modes of action.

The development of diffTF

The Zaugg group had previously worked on many projects that involved chromatin accessibility. Chromatin is the mass of DNA and proteins in the cell nucleus. This structure can be tightly packed or open and accessible for DNA binding proteins, which makes the transcription of genes possible. The group’s work involved comparing the activity of a large number of transcription factors based on changes in the accessibility of their binding sites across different conditions. “We wanted to follow up on some of the transcription factors and know whether they act as activators or repressors,” says group leader Judith Zaugg. “However, when we looked into the literature we simply could not find a database with that information.” This triggered the development of the classification mode of diffTF.

To develop and benchmark diffTF, the group used a dataset of chronic lymphocytic leukaemia (CLL) cells. CLL is a blood cancer with two major subtypes. This was the largest dataset on chromatin accessibility that was available at the time the group started developing the software. By correlating data on gene expression and the accessibility of the chromatin, it was possible to predict the main mode of action of a transcription factor. The group has experimentally validated their predictions in collaboration with the Molecular Medicine Partnership Unit headed by Sascha Dietrich and Wolfgang Huber. “Chromatin accessibility has become a very popular topic in the last 10 years,” says Ivan Berest, a predoctoral fellow in the Zaugg group. “Researchers are generating more and more chromatin data and also need more tools to generalise their ideas. That’s what diffTF is meant to do.”

An open tool for science

The mode of action is one of the first things it’s necessary to know once a transcription factor is identified. “Maybe you need to downregulate the transcription factor because it’s an activator, or you have to activate it because it’s a repressor. Those are important things to know if you want to achieve a specific effect,” says Zaugg.

The researchers explain that diffTF can be applied in any research project in which chromatin-based data is used on a large scale. For example, the tool can be used with ATAC-seq, a technique to assess genome-wide chromatin accessibility, or with ChIP-seq of histone marks, a method used to analyse the modification of histone proteins on the DNA. The Zaugg group runs courses annually to teach these methods. They emphasise the importance of Arnold’s careful documentation of diffTF. “In the end, the most important thing about our study is that we have a tool that other people can use,” says Zaugg.

diffTF will be important for the future work of the Zaugg group, and many new projects use it as a basic tool. “For example, we are now building networks between patients with autoimmune disease and healthy individuals,” says Zaugg. Applying diffTF to these networks allows scientists to identify differences in transcription factor activity and how this affects target genes.

“It’s been a lot of fun working with so many transcription factors,” Zaugg says and laughs. “We have worked with over 600 of them and know many by heart now; some come up in every project.”