Galaxy@EMBL
Local Galaxy instance
At your side to solve your daily data management and data analysis challenges
While Galaxy is the most accessible and efficient when performing standard analyses, more advanced statistical modeling or visualization usually requires specialized code, which can be written and executed using R on our RStudio Server instance.
Workflow modeling, with Galaxy and other Workflow Management System (WMS), to achieve better analysis automation and reproducibility is also in our area of expertise and we can provide advice and support to beginners-to-advanced researchers.
Galaxy is a web app that allows performing reproducible data analyses in a user-friendly graphical interface.
Everyone at EMBL has access to our Galaxy instance. Login happens with your EMBL credentials.
Galaxy can be used by anyone but reveals to be an incredible asset specifically for bench scientists with little computer knowledge, as it makes it easy to run the most commonly used bioinformatics software.
A variety of bioinformatics tools are available in a few clicks. This includes the most famous NGS, proteomics, and image data software. More can be deployed on-demand when specific interest is raised to us, do not hesitate to contact us.
Resource intensive jobs launched with Galaxy are automatically executed on on EMBL’s high-performance computing infrastucture (maintained by ITS). This means no additional hassle for researcher who never used a HPC cluster before.
A quota of 200Gb is allocated to each user. We encourage users to download useful analysis results to their group share when they are produced, and we expect users clean and purge useless data from their history in order to recover disk space.
Galaxy is not to be used as storage and we cannot guaranty the data will be kept for long-term.
This is the quickest option, but we do not recommend it for bigger files and/or files that are already stored on your group share, as this will unnecessarily hurt your quota and potentially duplicate data.
The data available on your group share at /g/<groupname> can directly be linked to your Galaxy data library.
This is has to be done by an admin. To do so, please open a request with us, with the list of files that need to be made available. This avoid unnecessary data duplication, which saves your group resources (disk space and ?)
Connections have been established between our Galaxy instance and LabID, out data management platform. Datasets can be transferred from LabID to Galaxy in a few clicks (and without data duplication). Sending the data back from Galaxy to LabID is currently being beta testing. This allows to permanently store Galaxy’s analysis results and referencing it it lab notes, linking it to samples, annotations, protocols, and reagents, etc.
Local Galaxy instance
Collection of tutorials developed and maintained by the worldwide Galaxy community
Internal chatroom for our Galaxy users, to get advice and troubleshoot issues
RStudio – sometimes now referred to as Posit™ Workbench – is a powerful Integrated Development Environment for R, the go-to programming language for bioinformaticians and statistician aiming at extracting valuable information from experimental data.
Everyone at EMBL can access RStudio via JupyterHub (at https://jupyterhub.embl.de). Login happens with your EMBL credentials.


Local RStudio instance
Internal chatroom for our RStudio users, to get advice and troubleshoot issues
To achieve better automation and reproducibility of analysis, we much encourage the usage of analysis workflows and Workflow Management Systems (WMS).
We will assist less computer savvy colleagues in their standard NGS data analysis (RNA-seq, ChIP-seq, ATAC-seq, HiC, scRNA-seq…) by providing ready-to-use Galaxy workflows.
Non standard analysis workflows have to be developed by you, nevertheless we can teach you the basics of Galaxy so that you can assemble your own workflow in no time.
Our expertise in other domain than NGS is limited, however we help you with assembling your own workflow.
MODIS have regularly been providing training internally, and the Galaxy Training Network provide live material to learn by yourself. This covers a large area of domains, including sequencing, miscroscopy, proteomics, metabolomics, etc.

For bioinformaticians proficient with command line tools, we advise looking into command-line based WMS. The most commonly used at EMBL are Nextflow and Snakemake*.
(*) We cannot recommend one WMS over another. Snakemake and Nextflow are both powerful tool, and other WMS also exist out there. Picking the right tool is a hot topic in life sciences, many aspects are to be considered and the choice ultimately is up to you. However we at MODIS do have a better expertise on Nextflow.
When your group is part of the GB Unit, we can provide further support and collaborate on workflow development. This for example can either mean developing a custom Galaxy or Nextflow workflow, or collaborating on the development of a Nextflow workflow with bioinformaticians in your group in order to teach them best practice of software development with git and of modular workflow development.
We maintain a super computer named Seneca, which we use to run RStudio Server. This computer can be accessed via ssh and is connected to your group share. It can be used to run basic Unix commands and resource inexpensive processing.
/tmpdata)Everyone at EMBL has access to Seneca. Login happens remotely via ssh to seneca.embl.de (when connected to the EMBL network).
Seneca is configured as a SLURM submit host and therefore can be used to submit cluster jobs like login01.cluster.embl.de or login02.cluster.embl.de. Find more information on the ITS Cluster Wiki.
The majority of software and their versions are handled with Easybuild, the software framework used and maintained by ITS. Software is specifically compiled against the platform it’s running on and is therefore optimised. A specific version of a software – compiled by a specific toolchain – is referred to as an environment module. Modules are loaded in the user environment on demand, by the user themself, using the module command. Loading a given module does load all the needed software dependencies with it.
Easybuild builds software modules. Linux comes with the module command-line tool to interact with modules (we use Lmod), and typically load them into your environment, list the existing and/or loaded ones, etc.
module avail lists all modules.module avail <string> lists all module with <string> in their name (case insensitive), e.g. module avail python returns Python and IPython modules, etc.module spider and module spider <string> do a similar job.module load <module_name> [<module2_name> ...] loads the given module(s), e.g. module load Python/3.10.8-GCCcore-12.2.0 SciPy-bundle/2023.02-gfbf-2022b loads both Python and SciPy. Find names with the avail or spider commands.When possible, load matching toolchain versions, i.e. versions that have been compile with the same toolchain.
NB 1: When loading multiple modules and hitting a dependency conflict, the last loaded module wins, i.e. the last module that needs the dependency dictates the loaded version of said dependency.
module list lists all the loaded module. Even after explicitly loading a single module, the list may contain multiple module. This is because loading a module means loading the given one and all the module it depends on. For example loading R-bundle-Bioconductor/3.16-foss-2022b-R-4.2.2 effectively loads R, Bioconductor, as well as 123 other dependencies.
module unload <module_name> unloads a given module and all obsolete dependencies .module purge unloads all the loaded module.
You cannot install your own software with Easybuild*.
When you identify a piece of software that is not available, you can request its install to us or to IT Services. On our side, installing should not take long, providing that either (1) an official easybuild recipe exists, or (2) that the install procedure is standard & following best practices.
As an alternative, you may also use virtual environments managers (like conda) but we provide only limited support for them.
* Effectively you could maintain your own Easybuild install, but this is advanced usage and out of scope of this document
The machine running RStudio Server is powerful but is a shared resources accessible by all EMBL scientists. Be mindful of others.
Do not run resource intensive jobs on this machine or they will be killed.