27 April 2022

Snakemake profile – 4: Defining resources and threads

In the previous posts, we saw how to get started with snakemake, reduce command-line options, and submit your jobs to a cluster. I ended the previous post by mentioning resources that were set by default. In this post, we will see how to adapt the amount of RAM, time, and the number of threads to a particular job through the profile config.yaml file.

If you followed the previous posts, you can skip the following section.

Preparation of files

For more details about the steps described in this section, see the previous posts. Run the following script to create the folder structure:

#!/usr/bin/bash
# Create the folder containing the files needed for this tutorial
mkdir snakemake-profile-demo
# Enter the created folder
cd snakemake-profile-demo
# Create an empty file containing the snakemake code
touch snakeFile
# Create toy input files
mkdir inputs
echo "toto" > inputs/hello.txt
echo "totoBis" > inputs/helloBis.txt
# Create the folder containing the configuration file, it can be named differently
mkdir profile
# Create a config.yaml that will contain all the configuration parameters
touch profile/config.yaml
# Create an empty folder to create a conda environment
# This is done to make sure that you use the same snakemake version as I do
mkdir envs
touch envs/environment.yaml

Copy the following content to snakeFile:

onstart:
    print("##### Creating profile pipeline #####\n") 
    print("\t Creating jobs output subfolders...\n")
    shell("mkdir -p jobs/printContent")
rule all:
  input:
    expand("results/{sampleName}.txt", sampleName=["hello", "helloBis"])
rule printContent:
  input:
    "inputs/{sampleName}.txt"
  output:
    "results/{sampleName}.txt"
  shell:
    """
    cat {input} > {output}
    """

Copy the following content to envs/environment.yaml:

channels:
  - bioconda
dependencies:
  - snakemake-minimal=6.15.1

Copy the following content to profile/config.yaml:

---
snakefile: snakeFile
cores: 1
latency-wait: 60
reason: True
show-failed-logs: True
keep-going: True
printshellcmds: True
rerun-incomplete: True
restart-times: 3
# Cluster submission
jobname: "{rule}.{jobid}"              # Provide a custom name for the jobscript that is submitted to the cluster.
max-jobs-per-second: 1                 #Maximal number of cluster/drmaa jobs per second, default is 10, fractions allowed.
max-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10
jobs: 400                              #Use at most N CPU cluster/cloud jobs in parallel.
cluster: "sbatch --output=\"jobs/{rule}/slurm_%x_%j.out\" --error=\"jobs/{rule}/slurm_%x_%j.log\""

Create and activate the conda environment:

#!/usr/bin/bash
conda env create -p envs/smake --file envs/environment.yaml
conda activate envs/smake

Defining resources

When submitting a job to a cluster, you usually want to control the amount of time and RAM needed to perform your calculation. We will not bother with the disk_mb parameter here. In the profile/config.yaml we add now the specifications of the job printContent in a set-resources section:

---
snakefile: snakeFile
cores: 1
latency-wait: 60
reason: True
show-failed-logs: True
keep-going: True
printshellcmds: True
rerun-incomplete: True
restart-times: 3
# Cluster submission
jobname: "{rule}.{jobid}"              # Provide a custom name for the jobscript that is submitted to the cluster.
max-jobs-per-second: 1                 #Maximal number of cluster/drmaa jobs per second, default is 10, fractions allowed.
max-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10
jobs: 400                              #Use at most N CPU cluster/cloud jobs in parallel.
cluster: "sbatch --output=\"jobs/{rule}/slurm_%x_%j.out\" --error=\"jobs/{rule}/slurm_%x_%j.log\""
# Job resources
set-resources:
    - printContent:mem_mb=2000
    - printContent:runtime=00:01:00

We defined that the job printContent should run for one minute and it should use at most 2G of memory. We need now to take into account these parameters when calling sbatch in the cluster: section. We hence add the slurm options --mem={resources.mem_mb} and --time={resources.runtime}. Notice first that the brackets indicate that these are snakemake wildcards. The values mem_mb and runtime are accessed through the wildcards resources. It is important to have in mind that resources can be defined directly in rules (as we did for input and output). If you do define resources within each rule, be aware that the values defined in the profile will overwrite those. Overall, profile/config.yaml should look like this:

---
snakefile: snakeFile
cores: 1
latency-wait: 60
reason: True
show-failed-logs: True
keep-going: True
printshellcmds: True
rerun-incomplete: True
restart-times: 3
# Cluster submission
jobname: "{rule}.{jobid}"              # Provide a custom name for the jobscript that is submitted to the cluster.
max-jobs-per-second: 1                 #Maximal number of cluster/drmaa jobs per second, default is 10, fractions allowed.
max-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10
jobs: 400                              #Use at most N CPU cluster/cloud jobs in parallel.
cluster: "sbatch --output=\"jobs/{rule}/slurm_%x_%j.out\" --error=\"jobs/{rule}/slurm_%x_%j.log\" --mem={resources.mem_mb} --time={resources.runtime}"
# Job resources
set-resources:
    - printContent:mem_mb=2000
    - printContent:runtime=00:01:00

Perform a dry run (after removing results/ if necessary):

#!/usr/bin/bash
# Remove the results folder if necessary
rm -r results/
# Perform dry run
snakemake --profile profile/ -n

As you can see in your console, the newly defined resources were integrated by snakemake:

Building DAG of jobs...
Job stats:
job             count    min threads    max threads
------------  -------  -------------  -------------
all                 1              1              1
printContent        2              1              1
total               3              1              1
[Fri Mar  4 16:03:41 2022]
rule printContent:
    input: inputs/hello.txt
    output: results/hello.txt
    jobid: 1
    reason: Missing output files: results/hello.txt
    wildcards: sampleName=hello
    resources: tmpdir=/tmp, mem_mb=2000, runtime=00:01:00
    cat inputs/hello.txt > results/hello.txt
[Fri Mar  4 16:03:41 2022]
rule printContent:
    input: inputs/helloBis.txt
    output: results/helloBis.txt
    jobid: 2
    reason: Missing output files: results/helloBis.txt
    wildcards: sampleName=helloBis
    resources: tmpdir=/tmp, mem_mb=2000, runtime=00:01:00
    cat inputs/helloBis.txt > results/helloBis.txt
[Fri Mar  4 16:03:41 2022]
localrule all:
    input: results/hello.txt, results/helloBis.txt
    jobid: 0
    reason: Input files updated by another job: results/hello.txt, results/helloBis.txt
    resources: tmpdir=/tmp
Job stats:
job             count    min threads    max threads
------------  -------  -------------  -------------
all                 1              1              1
printContent        2              1              1
total               3              1              1
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

Finally, define the number of resources that should be defined to jobs that do not have their resources defined in the set-resources section. For this, we can add a new section called default-resources:

---
snakefile: snakeFile
cores: 1
latency-wait: 60
reason: True
show-failed-logs: True
keep-going: True
printshellcmds: True
rerun-incomplete: True
restart-times: 3
# Cluster submission
jobname: "{rule}.{jobid}"              # Provide a custom name for the jobscript that is submitted to the cluster.
max-jobs-per-second: 1                 #Maximal number of cluster/drmaa jobs per second, default is 10, fractions allowed.
max-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10
jobs: 400                              #Use at most N CPU cluster/cloud jobs in parallel.
cluster: "sbatch --output=\"jobs/{rule}/slurm_%x_%j.out\" --error=\"jobs/{rule}/slurm_%x_%j.log\" --mem={resources.mem_mb} --time={resources.runtime}"
# Job resources
set-resources:
    - printContent:mem_mb=2000
    - printContent:runtime=00:01:00
# For some reasons time needs quotes to be read by snakemake
default-resources:
  - mem_mb=500
  - runtime="00:01:00"

Defining threads

Our rule printContent is very simple and does not need several CPUs to perform the calculation. This is however not the case with a lot of bioinformatic tools such as short read sequence aligners that need to use several threads. A famous one is Bowtie2. Below is what a Bowtie2 rule looks like in a real pipeline:

rule bowtie2_best_paired:
  input:
    trimmedFq1 = "ChIPSeq-subworkflow/results/data/trimGalore/paired/{pairedEndName}_1_val_1.fq.gz",
    trimmedFq2 = "ChIPSeq-subworkflow/results/data/trimGalore/paired/{pairedEndName}_2_val_2.fq.gz"
  output:
    sam = temp("ChIPSeq-subworkflow/results/bam/paired/bowtie2_results/{genome}/{pairedEndName}_trimmed_best.sam"),
    log = "ChIPSeq-subworkflow/results/bam/paired/bowtie2_results/{genome}/{pairedEndName}_trimmed_best.log"
  threads: 20
  singularity: "ChIPSeq-subworkflow/singularities-ChIPSeq/bowtie2v241.sif"
  params:
    indexPath = lambda wildcards: "ChIPSeq-subworkflow/data/bowtie2_index/mm39/Mus_musculus.GRCm39" if(wildcards.genome == "mm39") else "ChIPSeq-subworkflow/data/bowtie2_index/FVB/Mus_musculus_fvbnj.FVB_NJ_v1"
  shell:
    """
    bowtie2 -q -p {threads} -x {params.indexPath} --no-mixed --no-discordant --dovetail -1 {input.trimmedFq1} -2 {input.trimmedFq2} -S {output.sam} 2> {output.log}
    """

For sake of clarity, and for practical reasons, we are going to create a “toy rule” that we will be able to run. Add the rule bowtie2 at the end of your snakeFile. Then add the output to rule all:

onstart:
    print("##### Creating profile pipeline #####\n") 
    print("\t Creating jobs output subfolders...\n")
    shell("mkdir -p jobs/printContent")
    shell("mkdir -p jobs/bowtie2")
rule all:
  input:
    expand("results/{sampleName}.txt", sampleName=["hello", "helloBis"]),
    expand("results/{sampleName}-bowtie2.txt", sampleName=["hello", "helloBis"])
rule printContent:
  input:
    "inputs/{sampleName}.txt"
  output:
    "results/{sampleName}.txt"
  shell:
    """
    cat {input} > {output}
    """
rule bowtie2:
  input:
    "inputs/{sampleName}.txt"
  output:
    "results/{sampleName}-bowtie2.txt"
  shell:
    """
    cat {input} > {output}
    """

As we usually do not like to repeat code, let’ set sampleName=["hello", "helloBis"]) in a SAMPLENAMES variable:


onstart:
    print("##### Creating profile pipeline #####\n") 
    print("\t Creating jobs output subfolders...\n")
    shell("mkdir -p jobs/printContent")
    shell("mkdir -p jobs/bowtie2")
SAMPLENAMES=["hello", "helloBis"]
rule all:
  input:
    expand("results/{sampleName}.txt", sampleName=SAMPLENAMES),
    expand("results/{sampleName}-bowtie2.txt", sampleName=SAMPLENAMES)
rule printContent:
  input:
    "inputs/{sampleName}.txt"
  output:
    "results/{sampleName}.txt"
  shell:
    """
    cat {input} > {output}
    """
rule bowtie2:
  input:
    "inputs/{sampleName}.txt"
  output:
    "results/{sampleName}-bowtie2.txt"
  shell:
    """
    cat {input} > {output}
    """

Copy the above content to your snakeFile and perform a dry run:

#!/usr/bin/bash
snakemake --profile profile/ -n

You should obtain:

Building DAG of jobs...
Job stats:
job             count    min threads    max threads
------------  -------  -------------  -------------
all                 1              1              1
bowtie2             2              1              1
printContent        2              1              1
total               5              1              1
[Sat Mar  5 17:33:42 2022]
rule bowtie2:
    input: inputs/hello.txt
    output: results/hello-bowtie2.txt
    jobid: 3
    reason: Missing output files: results/hello-bowtie2.txt
    wildcards: sampleName=hello
    resources: mem_mb=500, disk_mb=1000, tmpdir=/tmp, runtime=00:01:00
    cat inputs/hello.txt > results/hello-bowtie2.txt
[Sat Mar  5 17:33:42 2022]
rule bowtie2:
    input: inputs/helloBis.txt
    output: results/helloBis-bowtie2.txt
    jobid: 4
    reason: Missing output files: results/helloBis-bowtie2.txt
    wildcards: sampleName=helloBis
    resources: mem_mb=500, disk_mb=1000, tmpdir=/tmp, runtime=00:01:00
    cat inputs/helloBis.txt > results/helloBis-bowtie2.txt
[Sat Mar  5 17:33:42 2022]
rule printContent:
    input: inputs/helloBis.txt
    output: results/helloBis.txt
    jobid: 2
    reason: Missing output files: results/helloBis.txt
    wildcards: sampleName=helloBis
    resources: mem_mb=2000, disk_mb=1000, tmpdir=/tmp, runtime=00:01:00
    cat inputs/helloBis.txt > results/helloBis.txt
[Sat Mar  5 17:33:42 2022]
rule printContent:
    input: inputs/hello.txt
    output: results/hello.txt
    jobid: 1
    reason: Missing output files: results/hello.txt
    wildcards: sampleName=hello
    resources: mem_mb=2000, disk_mb=1000, tmpdir=/tmp, runtime=00:01:00
    cat inputs/hello.txt > results/hello.txt
[Sat Mar  5 17:33:42 2022]
localrule all:
    input: results/hello.txt, results/helloBis.txt, results/hello-bowtie2.txt, results/helloBis-bowtie2.txt
    jobid: 0
    reason: Input files updated by another job: results/helloBis-bowtie2.txt, results/hello.txt, results/helloBis.txt, results/hello-bowtie2.txt
    resources: mem_mb=500, disk_mb=<TBD>, tmpdir=/tmp, runtime=00:01:00
Job stats:
job             count    min threads    max threads
------------  -------  -------------  -------------
all                 1              1              1
bowtie2             2              1              1
printContent        2              1              1
total               5              1              1
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

Notice that bowtie2 uses the default resources that we defined in the previous section:

rule bowtie2:
    input: inputs/helloBis.txt
    output: results/helloBis-bowtie2.txt
    jobid: 4
    reason: Missing output files: results/helloBis-bowtie2.txt
    wildcards: sampleName=helloBis
    resources: mem_mb=500, disk_mb=1000, tmpdir=/tmp, runtime=00:01:00'

Define now the number of threads that the (fake) rule bowtie2 should use in a set-threads section. ATTENTION: This section might not work with a more recent version of snakemake. It has been fixed the 03/05/2022 in this pull request.

We also define the resources of bowtie2 after those of printContent. Also, remove cores in profile/config.yaml otherwise, all your jobs will use only one CPU:

---
snakefile: snakeFile
latency-wait: 60
reason: True
show-failed-logs: True
keep-going: True
printshellcmds: True
rerun-incomplete: True
restart-times: 3
# Cluster submission
jobname: "{rule}.{jobid}"              # Provide a custom name for the jobscript that is submitted to the cluster.
max-jobs-per-second: 1                 #Maximal number of cluster/drmaa jobs per second, default is 10, fractions allowed.
max-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10
jobs: 400                              #Use at most N CPU cluster/cloud jobs in parallel.
cluster: "sbatch --output=\"jobs/{rule}/slurm_%x_%j.out\" --error=\"jobs/{rule}/slurm_%x_%j.log\" --mem={resources.mem_mb} --time={resources.runtime}"
# Job resources
set-resources:
    - printContent:mem_mb=2000
    - printContent:runtime=00:01:00
    - bowtie2:mem_mb=1000
    - bowtie2:runtime=00:03:00
# For some reasons time needs quotes to be read by snakemake
default-resources:
  - mem_mb=500
  - runtime="00:01:00"
# Define the number of threads used by rules
set-threads:
  - bowtie2=3

We instruct snakemake to use three threads when running the rule bowtie2. Perform a dry run:

#!/usr/bin/bash
snakemake --profile profile/ -n

You should obtain:

Building DAG of jobs...
Job stats:
job             count    min threads    max threads
------------  -------  -------------  -------------
all                 1              1              1
bowtie2             2              1              1
printContent        2              1              1
total               5              1              1
[Sat Mar  5 17:41:24 2022]
rule bowtie2:
    input: inputs/hello.txt
    output: results/hello-bowtie2.txt
    jobid: 3
    reason: Missing output files: results/hello-bowtie2.txt
    wildcards: sampleName=hello
    resources: mem_mb=1000, disk_mb=1000, tmpdir=/tmp, runtime=00:03:00
    cat inputs/hello.txt > results/hello-bowtie2.txt
[Sat Mar  5 17:41:24 2022]
rule printContent:
    input: inputs/hello.txt
    output: results/hello.txt
    jobid: 1
    reason: Missing output files: results/hello.txt
    wildcards: sampleName=hello
    resources: mem_mb=2000, disk_mb=1000, tmpdir=/tmp, runtime=00:01:00
    cat inputs/hello.txt > results/hello.txt
[Sat Mar  5 17:41:24 2022]
rule printContent:
    input: inputs/helloBis.txt
    output: results/helloBis.txt
    jobid: 2
    reason: Missing output files: results/helloBis.txt
    wildcards: sampleName=helloBis
    resources: mem_mb=2000, disk_mb=1000, tmpdir=/tmp, runtime=00:01:00
    cat inputs/helloBis.txt > results/helloBis.txt
[Sat Mar  5 17:41:24 2022]
rule bowtie2:
    input: inputs/helloBis.txt
    output: results/helloBis-bowtie2.txt
    jobid: 4
    reason: Missing output files: results/helloBis-bowtie2.txt
    wildcards: sampleName=helloBis
    resources: mem_mb=1000, disk_mb=1000, tmpdir=/tmp, runtime=00:03:00
    cat inputs/helloBis.txt > results/helloBis-bowtie2.txt
[Sat Mar  5 17:41:24 2022]
localrule all:
    input: results/hello.txt, results/helloBis.txt, results/hello-bowtie2.txt, results/helloBis-bowtie2.txt
    jobid: 0
    reason: Input files updated by another job: results/hello-bowtie2.txt, results/hello.txt, results/helloBis.txt, results/helloBis-bowtie2.txt
    resources: mem_mb=500, disk_mb=<TBD>, tmpdir=/tmp, runtime=00:01:00
Job stats:
job             count    min threads    max threads
------------  -------  -------------  -------------
all                 1              1              1
bowtie2             2              1              1
printContent        2              1              1
total               5              1              1
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

As you can see, even if bowtie2 uses the new resources, nothing changed. This is because the number of threads needs to also be defined in each rule. Indeed, set-threads does not set threads but it overwrites the thread usage of rules. Hence, modify your snakeFile to add the number of threads to each rule. Set 1 thread for printContent and 2 threads for bowtie2:

onstart:
    print("##### Creating profile pipeline #####\n") 
    print("\t Creating jobs output subfolders...\n")
    shell("mkdir -p jobs/printContent")
    shell("mkdir -p jobs/bowtie2")
SAMPLENAMES=["hello", "helloBis"]
rule all:
  input:
    expand("results/{sampleName}.txt", sampleName=SAMPLENAMES),
    expand("results/{sampleName}-bowtie2.txt", sampleName=SAMPLENAMES)
rule printContent:
  input:
    "inputs/{sampleName}.txt"
  output:
    "results/{sampleName}.txt"
  threads: 1
  shell:
    """
    cat {input} > {output}
    """
rule bowtie2:
  input:
    "inputs/{sampleName}.txt"
  output:
    "results/{sampleName}-bowtie2.txt"
  threads: 2
  shell:
    """
    cat {input} > {output}
    """

Perform a real run and check the information about one bowtie2 job on the cluster with sinfo show jobid myjobid:

#!/usr/bin/bash
snakemake --profile profile/

You should obtain:

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 400
Job stats:
job             count    min threads    max threads
------------  -------  -------------  -------------
all                 1              1              1
bowtie2             2              3              3
printContent        2              1              1
total               5              1              3
##### Creating profile pipeline #####
         Creating jobs output subfolders...
mkdir -p jobs/printContent
mkdir -p jobs/bowtie2
Select jobs to execute...
[Wed Apr 27 21:47:24 2022]
rule bowtie2:
    input: inputs/helloBis.txt
    output: results/helloBis-bowtie2.txt
    jobid: 4
    reason: Missing output files: results/helloBis-bowtie2.txt
    wildcards: sampleName=helloBis
    threads: 3
    resources: mem_mb=1000, disk_mb=1000, tmpdir=/tmp, runtime=00:03:00
    cat inputs/helloBis.txt > results/helloBis-bowtie2.txt
Submitted job 4 with external jobid 'Submitted batch job 39158096'.
[Wed Apr 27 21:47:24 2022]
rule printContent:
    input: inputs/hello.txt
    output: results/hello.txt
    jobid: 1
    reason: Missing output files: results/hello.txt
    wildcards: sampleName=hello
    resources: mem_mb=2000, disk_mb=1000, tmpdir=/tmp, runtime=00:01:00
    cat inputs/hello.txt > results/hello.txt
Submitted job 1 with external jobid 'Submitted batch job 39158097'.
[Wed Apr 27 21:47:24 2022]
rule printContent:
    input: inputs/helloBis.txt
    output: results/helloBis.txt
    jobid: 2
    reason: Missing output files: results/helloBis.txt
    wildcards: sampleName=helloBis
    resources: mem_mb=2000, disk_mb=1000, tmpdir=/tmp, runtime=00:01:00
    cat inputs/helloBis.txt > results/helloBis.txt
Submitted job 2 with external jobid 'Submitted batch job 39158098'.
[Wed Apr 27 21:47:24 2022]
rule bowtie2:
    input: inputs/hello.txt
    output: results/hello-bowtie2.txt
    jobid: 3
    reason: Missing output files: results/hello-bowtie2.txt
    wildcards: sampleName=hello
    threads: 3
    resources: mem_mb=1000, disk_mb=1000, tmpdir=/tmp, runtime=00:03:00
    cat inputs/hello.txt > results/hello-bowtie2.txt
Submitted job 3 with external jobid 'Submitted batch job 39158099'.
Waiting at most 60 seconds for missing files.
[Wed Apr 27 21:47:54 2022]
Finished job 4.
1 of 5 steps (20%) done
[Wed Apr 27 21:47:54 2022]
Finished job 1.
2 of 5 steps (40%) done
[Wed Apr 27 21:47:54 2022]
Finished job 2.
3 of 5 steps (60%) done
[Wed Apr 27 21:47:54 2022]
Finished job 3.
4 of 5 steps (80%) done
Select jobs to execute...
[Wed Apr 27 21:47:54 2022]
localrule all:
    input: results/hello.txt, results/helloBis.txt, results/hello-bowtie2.txt, results/helloBis-bowtie2.txt
    jobid: 0
    reason: Input files updated by another job: results/helloBis-bowtie2.txt, results/helloBis.txt, results/hello.txt, results/hello-bowtie2.txt
    resources: mem_mb=500, disk_mb=1000, tmpdir=/tmp, runtime=00:01:00
[Wed Apr 27 21:47:54 2022]
Finished job 0.
5 of 5 steps (100%) done
Complete log: /g/romebioinfo/tmp/snakemake-profile-demo/.snakemake/log/2022-04-27T214724.373022.snakemake.log

In appearance, everything looks good. The rule bowtie2 indeed uses 3 threads. However, if you checked the information of one bowtie2 job on the cluster with sinfo show jobid myjobid you should have noticed that it uses only 1 CPU:

NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*

This is because we did not instruct how the cluster should handle threads. In the cluster section, add the instruction --cpus-per-task={threads}. Remember that {threads} is a snakemake wildcards that needs to be defined in each rule. The profile/config.yaml should look like this:

---
snakefile: snakeFile
latency-wait: 60
reason: True
show-failed-logs: True
keep-going: True
printshellcmds: True
rerun-incomplete: True
restart-times: 3
# Cluster submission
jobname: "{rule}.{jobid}"              # Provide a custom name for the jobscript that is submitted to the cluster.
max-jobs-per-second: 1                 #Maximal number of cluster/drmaa jobs per second, default is 10, fractions allowed.
max-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10
jobs: 400                              #Use at most N CPU cluster/cloud jobs in parallel.
cluster: "sbatch --output=\"jobs/{rule}/slurm_%x_%j.out\" --error=\"jobs/{rule}/slurm_%x_%j.log\" --mem={resources.mem_mb} --time={resources.runtime} --cpus-per-task={threads}"
# Job resources
set-resources:
    - printContent:mem_mb=2000
    - printContent:runtime=00:01:00
# For some reasons time needs quotes to be read by snakemake
default-resources:
  - mem_mb=500
  - runtime="00:01:00"
# Define the number of threads used by rules
set-threads:
  - bowtie2=3

Delete the results/ folder and perform again a real run:

#!/usr/bin/bash
rm -r results/
snakemake --profile profile/

You should now be able to see that bowtie2 jobs use the correct number of CPUs per task (sinfo show jobid myjobid):

NumNodes=1 NumCPUs=3 NumTasks=1 CPUs/Task=3 ReqB:S:C:T=0:0:*:*

Conclusion

We are now able to customize the resources and number of threads of the jobs submitted to the cluster. However if one of your jobs fails, with the current settings, snakemake will not be able to see it. In the next post, we will see how to handle errors.

Bioinformatics Services

Snakemake profile – 4: Defining resources and threads

Preparation of files

Defining resources

Defining threads

Conclusion

Share