Analysis and computational expertise
In the previous posts, we saw how to get started with snakemake, reduce command-line options, submit your jobs to a cluster, define resources and threads and handling memory and timeout errors. In this last post about snakemake profiles, I will show how to use singularity containers. If you followed the previous posts, deactivate your environment to continue:
#!/usr/bin/bash
conda deactivate
cd ..
Explaining how to create and use singularities is out of the scope of this tutorial. Be aware that Singularity joined the Linux Foundation and rebranded as Apptainer. Therefore, this part might quickly become out of date.
In this section, we will use a singularity on a “toy example” rule as we did before. I will assume that you are able to create the singularity. Start with creating a project folder:
#!/usr/bin/bash
# Create the folder containing the files needed for this tutorial
mkdir snakemake-profile-singularity
# Enter the created folder
cd snakemake-profile-singularity
# Create an empty file containing the snakemake code
touch snakeFile
# Create toy input files
mkdir inputs
echo "toto" > inputs/hello.txt
echo "totoBis" > inputs/helloBis.txt
# Create an empty folder to create a conda environment
# This is done to make sure that you use the same snakemake version as I do
mkdir envs
touch envs/environment.yaml
# Create an empty folder to create a profile
mkdir profile
touch profile/config.yaml
# Create a folder that will hold the singularity
mkdir singularities
Use the following recipe to build the fastqcv0119.sif
singularity and copy it to the singularities
folder:
Bootstrap: docker
From: biocontainers/fastqc:v0.11.9_cv7
%runscript
echo "Running container biocontainers/fastqc:v0.11.9_cv7, FastQC v0.11.9"
exec /bin/bash "$@"
Copy the following content to envs/environment.yaml
(the indentations consist of two spaces):
channels:
- bioconda
dependencies:
- snakemake-minimal=6.15.1
Then execute the following commands to create and use a conda environment containing snakemake v6.15.1:
#!/usr/bin/bash
conda env create -p envs/smake --file envs/environment.yaml
conda activate envs/smake
Copy the following content to snakeFile
:
onstart:
print("##### TEST #####\n")
print("\t Creating jobs output subfolders...\n")
shell("mkdir -p jobs/helloSingularity")
FILESNAMES=["hello", "helloBis"]
rule all:
input:
expand("results/{recipient}-sing.txt", recipient=FILESNAMES)
rule helloSingularity:
input:
"inputs/{recipient}.txt"
output:
"results/{recipient}-sing.txt"
threads: 1
singularity: "singularities/fastqcv0119.sif"
shell:
"""
cat {input} > {output}
"""
We added a new singularity
section to the rule that can contain the absolute or relative path to fastqcv0119.sif
. In profile/config.yaml
, add the section use-singularity: True
and singularity-args: "--bind mypath/snakemake-profile-singularity"
(do not forget to modify mypath
). The --bind
instruction enables the singularity to access the input files (inputs/hello.txt and inputs/helloBis.txt).
The binding folder should always be higher than your files in the folder hierarchy. Copy the following content to profile/config.yaml
:
---
snakefile: snakeFile
latency-wait: 60
reason: True
show-failed-logs: True
keep-going: True
printshellcmds: True
# Cluster submission
jobname: "{rule}.{jobid}" # Provide a custom name for the jobscript that is submitted to the cluster.
max-jobs-per-second: 1 #Maximal number of cluster/drmaa jobs per second, default is 10, fractions allowed.
max-status-checks-per-second: 10 #Maximal number of job status checks per second, default is 10
jobs: 400 #Use at most N CPU cluster/cloud jobs in parallel.
cluster: "sbatch --output=\"jobs/{rule}/slurm_%x_%j.out\" --error=\"jobs/{rule}/slurm_%x_%j.log\" --mem={resources.mem_mb} --time={resources.runtime} --parsable"
cluster-status: "./profile/status-sacct.sh" # Use to handle timeout exception, do not forget to chmod +x
# singularity
use-singularity: True
singularity-args: "--bind mypath/snakemake-profile-singularity"
# Job resources
set-resources:
- helloSingularity:mem_mb=1000
- helloSingularity:runtime=00:03:00
# For some reasons time needs quotes to be read by snakemake
default-resources:
- mem_mb=500
- runtime="00:01:00"
# Define the number of threads used by rules
set-threads:
- helloSingularity=1
Create a profile/status-sacct.sh
(see the previous posts for details) with the following content:
#!/usr/bin/env bash
# Check status of Slurm job
jobid="$1"
if [[ "$jobid" == Submitted ]]
then
echo smk-simple-slurm: Invalid job ID: "$jobid" >&2
echo smk-simple-slurm: Did you remember to add the flag --parsable to your sbatch call? >&2
exit 1
fi
output=`sacct -j "$jobid" --format State --noheader | head -n 1 | awk '{print $1}'`
if [[ $output =~ ^(COMPLETED).* ]]
then
echo success
elif [[ $output =~ ^(RUNNING|PENDING|COMPLETING|CONFIGURING|SUSPENDED).* ]]
then
echo running
else
echo failed
fi
Make profile/status-sacct.sh
executable and perform a run:
#!/usr/bin/bash
chmod +x profile/status-sacct.sh
snakemake --profile profile/
If in jobs/helloSingularity/*log
you get the error message FATAL: container creation failed: unable to add
, this means that the path in the singularity-args
section of profile/config.yaml
is incorrect. Otherwise, you should see in the log files:
Activating singularity image singularities/fastqcv0119.sif
This line confirms that the code of your rule was run in your singularity environment.