{"id":318,"date":"2022-04-27T20:11:07","date_gmt":"2022-04-27T20:11:07","guid":{"rendered":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/?p=318"},"modified":"2022-05-09T19:27:06","modified_gmt":"2022-05-09T19:27:06","slug":"snakemake-profile-4-defining-resources-and-threads","status":"publish","type":"post","link":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/blog\/2022\/04\/snakemake-profile-4-defining-resources-and-threads\/","title":{"rendered":"Snakemake profile &#8211; 4: Defining resources and threads"},"content":{"rendered":"\n<div style=\"height:43px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>In the previous posts, we saw how to <a rel=\"noreferrer noopener\" href=\"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/blog\/2022\/03\/snakemake-profile-1-getting-started-with-snakemake\/\" target=\"_blank\">get started with snakemake<\/a>, <a rel=\"noreferrer noopener\" href=\"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/blog\/2022\/03\/snakemake-profile-2-reducing-command-line-options-with-profile\/\" target=\"_blank\">reduce command-line options<\/a>, and<a rel=\"noreferrer noopener\" href=\"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/blog\/2022\/04\/snakemake-profile-3-cluster-submission-defining-parameters\/\" target=\"_blank\"> submit your jobs to a cluster<\/a>. I ended the previous post by mentioning resources that were set by default. In this post, we will see how to adapt the amount of  RAM, time, and the number of threads to a particular job through the profile <code>config.yaml<\/code> file.<\/p>\n\n\n\n<p>If you followed the previous posts, you can skip the following section.<\/p>\n\n\n\n<div style=\"height:23px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Preparation of files<\/h2>\n\n\n\n<p>For more details about the steps described in this section, see the previous posts. Run the following script to create the folder structure:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n \n# Create the folder containing the files needed for this tutorial\nmkdir snakemake-profile-demo\n \n# Enter the created folder\ncd snakemake-profile-demo\n \n# Create an empty file containing the snakemake code\ntouch snakeFile\n \n# Create toy input files\nmkdir inputs\necho \"toto\" &gt; inputs\/hello.txt\necho \"totoBis\" &gt; inputs\/helloBis.txt\n\n# Create the folder containing the configuration file, it can be named differently\nmkdir profile\n\n# Create a config.yaml that will contain all the configuration parameters\ntouch profile\/config.yaml\n\n# Create an empty folder to create a conda environment\n# This is done to make sure that you use the same snakemake version as I do\nmkdir envs\ntouch envs\/environment.yaml\n<\/code><\/pre>\n\n\n\n<p>Copy the following content to&nbsp;<code>snakeFile<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>onstart:\n    print(\"##### Creating profile pipeline #####\\n\") \n    print(\"\\t Creating jobs output subfolders...\\n\")\n    shell(\"mkdir -p jobs\/printContent\")\n\nrule all:\n  input:\n    expand(\"results\/{sampleName}.txt\", sampleName=&#91;\"hello\", \"helloBis\"])\n\nrule printContent:\n  input:\n    \"inputs\/{sampleName}.txt\"\n  output:\n    \"results\/{sampleName}.txt\"\n  shell:\n    \"\"\"\n    cat {input} &gt; {output}\n    \"\"\"<\/code><\/pre>\n\n\n\n<p>Copy the following content to&nbsp;<code>envs\/environment.yaml<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>channels:\n  - bioconda\ndependencies:\n  - snakemake-minimal=6.15.1\n<\/code><\/pre>\n\n\n\n<p>Copy the following content to&nbsp;<code>profile\/config.yaml<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>---\nsnakefile: snakeFile\ncores: 1\nlatency-wait: 60\nreason: True\nshow-failed-logs: True\nkeep-going: True\nprintshellcmds: True\nrerun-incomplete: True\nrestart-times: 3\n# Cluster submission\njobname: \"{rule}.{jobid}\"              # Provide a custom name for the jobscript that is submitted to the cluster.\nmax-jobs-per-second: 1                 #Maximal number of cluster\/drmaa jobs per second, default is 10, fractions allowed.\nmax-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10\njobs: 400                              #Use at most N CPU cluster\/cloud jobs in parallel.\ncluster: \"sbatch --output=\\\"jobs\/{rule}\/slurm_%x_%j.out\\\" --error=\\\"jobs\/{rule}\/slurm_%x_%j.log\\\"\"<\/code><\/pre>\n\n\n\n<p>Create and activate the conda environment:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\nconda env create -p envs\/smake --file envs\/environment.yaml\nconda activate envs\/smake<\/code><\/pre>\n\n\n\n<div style=\"height:23px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Defining resources<\/h2>\n\n\n\n<p>When submitting a job to a cluster, you usually want to control the amount of time and RAM needed to perform your calculation. We will not bother with the <code>disk_mb<\/code> parameter here. In the <code>profile\/config.yaml<\/code> we add now the specifications of the job <code>printContent<\/code> in a <code>set-resources<\/code> section:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>---\n\nsnakefile: snakeFile\ncores: 1\n\nlatency-wait: 60\nreason: True\nshow-failed-logs: True\nkeep-going: True\nprintshellcmds: True\nrerun-incomplete: True\nrestart-times: 3\n\n# Cluster submission\njobname: \"{rule}.{jobid}\"              # Provide a custom name for the jobscript that is submitted to the cluster.\nmax-jobs-per-second: 1                 #Maximal number of cluster\/drmaa jobs per second, default is 10, fractions allowed.\nmax-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10\njobs: 400                              #Use at most N CPU cluster\/cloud jobs in parallel.\ncluster: \"sbatch --output=\\\"jobs\/{rule}\/slurm_%x_%j.out\\\" --error=\\\"jobs\/{rule}\/slurm_%x_%j.log\\\"\"\n\n# Job resources\nset-resources:\n    - printContent:mem_mb=2000\n    - printContent:runtime=00:01:00\n<\/code><\/pre>\n\n\n\n<p>We defined that the job <code>printContent<\/code> should run for one minute and it should use <strong>at most<\/strong> 2G of memory. We need now to take into account these parameters when calling <code>sbatch<\/code> in the <code>cluster:<\/code> section. We hence add the slurm options <code>--mem={resources.mem_mb}<\/code> and <code>--time={resources.runtime}<\/code>. Notice first that the brackets indicate that these are snakemake wildcards. The values <code>mem_mb<\/code> and <code>runtime<\/code> are accessed through the wildcards <code>resources<\/code>. It is important to have in mind that <a rel=\"noreferrer noopener\" href=\"https:\/\/snakemake.readthedocs.io\/en\/stable\/snakefiles\/rules.html?highlight=resources#resources\" target=\"_blank\">resources<\/a> can be defined directly in rules (as we did for input and output). If you do define <code>resources<\/code> within each rule, be aware that the values defined in the profile will overwrite those. Overall, <code>profile\/config.yaml<\/code> should look like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>---\n\nsnakefile: snakeFile\ncores: 1\n\nlatency-wait: 60\nreason: True\nshow-failed-logs: True\nkeep-going: True\nprintshellcmds: True\nrerun-incomplete: True\nrestart-times: 3\n\n# Cluster submission\njobname: \"{rule}.{jobid}\"              # Provide a custom name for the jobscript that is submitted to the cluster.\nmax-jobs-per-second: 1                 #Maximal number of cluster\/drmaa jobs per second, default is 10, fractions allowed.\nmax-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10\njobs: 400                              #Use at most N CPU cluster\/cloud jobs in parallel.\ncluster: \"sbatch --output=\\\"jobs\/{rule}\/slurm_%x_%j.out\\\" --error=\\\"jobs\/{rule}\/slurm_%x_%j.log\\\" --mem={resources.mem_mb} --time={resources.runtime}\"\n\n# Job resources\nset-resources:\n    - printContent:mem_mb=2000\n    - printContent:runtime=00:01:00\n<\/code><\/pre>\n\n\n\n<p>Perform a dry run (after removing <code>results\/<\/code> if necessary):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\n# Remove the results folder if necessary\nrm -r results\/\n\n# Perform dry run\nsnakemake --profile profile\/ -n\n<\/code><\/pre>\n\n\n\n<p>As you can see in your console, the newly defined resources were integrated by snakemake:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Building DAG of jobs...\nJob stats:\njob             count    min threads    max threads\n------------  -------  -------------  -------------\nall                 1              1              1\nprintContent        2              1              1\ntotal               3              1              1\n\n\n&#91;Fri Mar  4 16:03:41 2022]\nrule printContent:\n    input: inputs\/hello.txt\n    output: results\/hello.txt\n    jobid: 1\n    reason: Missing output files: results\/hello.txt\n    wildcards: sampleName=hello\n    resources: tmpdir=\/tmp, mem_mb=2000, runtime=00:01:00\n\n\n    cat inputs\/hello.txt &gt; results\/hello.txt\n    \n\n&#91;Fri Mar  4 16:03:41 2022]\nrule printContent:\n    input: inputs\/helloBis.txt\n    output: results\/helloBis.txt\n    jobid: 2\n    reason: Missing output files: results\/helloBis.txt\n    wildcards: sampleName=helloBis\n    resources: tmpdir=\/tmp, mem_mb=2000, runtime=00:01:00\n\n\n    cat inputs\/helloBis.txt &gt; results\/helloBis.txt\n    \n\n&#91;Fri Mar  4 16:03:41 2022]\nlocalrule all:\n    input: results\/hello.txt, results\/helloBis.txt\n    jobid: 0\n    reason: Input files updated by another job: results\/hello.txt, results\/helloBis.txt\n    resources: tmpdir=\/tmp\n\nJob stats:\njob             count    min threads    max threads\n------------  -------  -------------  -------------\nall                 1              1              1\nprintContent        2              1              1\ntotal               3              1              1\n\nThis was a dry-run (flag -n). The order of jobs does not reflect the order of execution.<\/code><\/pre>\n\n\n\n<p>Finally, define the number of resources that should be defined to jobs that do not have their resources defined in the <code>set-resources<\/code> section. For this, we can add a new section called <code>default-resources<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>---\n\nsnakefile: snakeFile\ncores: 1\n\nlatency-wait: 60\nreason: True\nshow-failed-logs: True\nkeep-going: True\nprintshellcmds: True\nrerun-incomplete: True\nrestart-times: 3\n\n# Cluster submission\njobname: \"{rule}.{jobid}\"              # Provide a custom name for the jobscript that is submitted to the cluster.\nmax-jobs-per-second: 1                 #Maximal number of cluster\/drmaa jobs per second, default is 10, fractions allowed.\nmax-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10\njobs: 400                              #Use at most N CPU cluster\/cloud jobs in parallel.\ncluster: \"sbatch --output=\\\"jobs\/{rule}\/slurm_%x_%j.out\\\" --error=\\\"jobs\/{rule}\/slurm_%x_%j.log\\\" --mem={resources.mem_mb} --time={resources.runtime}\"\n\n# Job resources\nset-resources:\n    - printContent:mem_mb=2000\n    - printContent:runtime=00:01:00\n\n# For some reasons time needs quotes to be read by snakemake\ndefault-resources:\n  - mem_mb=500\n  - runtime=\"00:01:00\"<\/code><\/pre>\n\n\n\n<div style=\"height:23px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Defining threads<\/h2>\n\n\n\n<p>Our rule <code>printContent<\/code> is very simple and does not need several CPUs to perform the calculation. This is however not the case with a lot of bioinformatic tools such as short read sequence aligners that need to use several <a href=\"https:\/\/snakemake.readthedocs.io\/en\/stable\/snakefiles\/rules.html?highlight=threads#threads\" target=\"_blank\" rel=\"noreferrer noopener\">threads<\/a>. A famous one is <a href=\"http:\/\/bowtie-bio.sourceforge.net\/bowtie2\/manual.shtml\" target=\"_blank\" rel=\"noreferrer noopener\">Bowtie2<\/a>. Below is what a Bowtie2 rule looks like in a real pipeline:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>rule bowtie2_best_paired:\n  input:\n    trimmedFq1 = \"ChIPSeq-subworkflow\/results\/data\/trimGalore\/paired\/{pairedEndName}_1_val_1.fq.gz\",\n    trimmedFq2 = \"ChIPSeq-subworkflow\/results\/data\/trimGalore\/paired\/{pairedEndName}_2_val_2.fq.gz\"\n  output:\n    sam = temp(\"ChIPSeq-subworkflow\/results\/bam\/paired\/bowtie2_results\/{genome}\/{pairedEndName}_trimmed_best.sam\"),\n    log = \"ChIPSeq-subworkflow\/results\/bam\/paired\/bowtie2_results\/{genome}\/{pairedEndName}_trimmed_best.log\"\n  threads: 20\n  singularity: \"ChIPSeq-subworkflow\/singularities-ChIPSeq\/bowtie2v241.sif\"\n  params:\n    indexPath = lambda wildcards: \"ChIPSeq-subworkflow\/data\/bowtie2_index\/mm39\/Mus_musculus.GRCm39\" if(wildcards.genome == \"mm39\") else \"ChIPSeq-subworkflow\/data\/bowtie2_index\/FVB\/Mus_musculus_fvbnj.FVB_NJ_v1\"\n  shell:\n    \"\"\"\n    bowtie2 -q -p {threads} -x {params.indexPath} --no-mixed --no-discordant --dovetail -1 {input.trimmedFq1} -2 {input.trimmedFq2} -S {output.sam} 2&gt; {output.log}\n    \"\"\"\n<\/code><\/pre>\n\n\n\n<p>For sake of clarity, and for practical reasons, we are going to create a &#8220;toy rule&#8221; that we will be able to run. Add the rule <code>bowtie2<\/code> at the end of your <code>snakeFile<\/code>. Then add the output to <code>rule all<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>onstart:\n    print(\"##### Creating profile pipeline #####\\n\") \n    print(\"\\t Creating jobs output subfolders...\\n\")\n    shell(\"mkdir -p jobs\/printContent\")\n    shell(\"mkdir -p jobs\/bowtie2\")\n\nrule all:\n  input:\n    expand(\"results\/{sampleName}.txt\", sampleName=&#91;\"hello\", \"helloBis\"]),\n    expand(\"results\/{sampleName}-bowtie2.txt\", sampleName=&#91;\"hello\", \"helloBis\"])\n\nrule printContent:\n  input:\n    \"inputs\/{sampleName}.txt\"\n  output:\n    \"results\/{sampleName}.txt\"\n  shell:\n    \"\"\"\n    cat {input} &gt; {output}\n    \"\"\"\n\nrule bowtie2:\n  input:\n    \"inputs\/{sampleName}.txt\"\n  output:\n    \"results\/{sampleName}-bowtie2.txt\"\n  shell:\n    \"\"\"\n    cat {input} &gt; {output}\n    \"\"\"\n<\/code><\/pre>\n\n\n\n<p>As we usually do not like to repeat code, let&#8217; set <code>sampleName=[\"hello\", \"helloBis\"])<\/code> in a <code>SAMPLENAMES<\/code> variable:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\n\nonstart:\n    print(\"##### Creating profile pipeline #####\\n\") \n    print(\"\\t Creating jobs output subfolders...\\n\")\n    shell(\"mkdir -p jobs\/printContent\")\n    shell(\"mkdir -p jobs\/bowtie2\")\n\nSAMPLENAMES=&#91;\"hello\", \"helloBis\"]\n\nrule all:\n  input:\n    expand(\"results\/{sampleName}.txt\", sampleName=SAMPLENAMES),\n    expand(\"results\/{sampleName}-bowtie2.txt\", sampleName=SAMPLENAMES)\n\nrule printContent:\n  input:\n    \"inputs\/{sampleName}.txt\"\n  output:\n    \"results\/{sampleName}.txt\"\n  shell:\n    \"\"\"\n    cat {input} &gt; {output}\n    \"\"\"\n\nrule bowtie2:\n  input:\n    \"inputs\/{sampleName}.txt\"\n  output:\n    \"results\/{sampleName}-bowtie2.txt\"\n  shell:\n    \"\"\"\n    cat {input} &gt; {output}\n    \"\"\"\n<\/code><\/pre>\n\n\n\n<p>Copy the above content to your <code>snakeFile<\/code> and perform a dry run:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\nsnakemake --profile profile\/ -n\n<\/code><\/pre>\n\n\n\n<p>You should obtain:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Building DAG of jobs...\nJob stats:\njob             count    min threads    max threads\n------------  -------  -------------  -------------\nall                 1              1              1\nbowtie2             2              1              1\nprintContent        2              1              1\ntotal               5              1              1\n\n\n&#91;Sat Mar  5 17:33:42 2022]\nrule bowtie2:\n    input: inputs\/hello.txt\n    output: results\/hello-bowtie2.txt\n    jobid: 3\n    reason: Missing output files: results\/hello-bowtie2.txt\n    wildcards: sampleName=hello\n    resources: mem_mb=500, disk_mb=1000, tmpdir=\/tmp, runtime=00:01:00\n\n\n    cat inputs\/hello.txt &gt; results\/hello-bowtie2.txt\n\n\n&#91;Sat Mar  5 17:33:42 2022]\nrule bowtie2:\n    input: inputs\/helloBis.txt\n    output: results\/helloBis-bowtie2.txt\n    jobid: 4\n    reason: Missing output files: results\/helloBis-bowtie2.txt\n    wildcards: sampleName=helloBis\n    resources: mem_mb=500, disk_mb=1000, tmpdir=\/tmp, runtime=00:01:00\n\n\n    cat inputs\/helloBis.txt &gt; results\/helloBis-bowtie2.txt\n\n\n&#91;Sat Mar  5 17:33:42 2022]\nrule printContent:\n    input: inputs\/helloBis.txt\n    output: results\/helloBis.txt\n    jobid: 2\n    reason: Missing output files: results\/helloBis.txt\n    wildcards: sampleName=helloBis\n    resources: mem_mb=2000, disk_mb=1000, tmpdir=\/tmp, runtime=00:01:00\n\n\n    cat inputs\/helloBis.txt &gt; results\/helloBis.txt\n\n\n&#91;Sat Mar  5 17:33:42 2022]\nrule printContent:\n    input: inputs\/hello.txt\n    output: results\/hello.txt\n    jobid: 1\n    reason: Missing output files: results\/hello.txt\n    wildcards: sampleName=hello\n    resources: mem_mb=2000, disk_mb=1000, tmpdir=\/tmp, runtime=00:01:00\n\n\n    cat inputs\/hello.txt &gt; results\/hello.txt\n\n\n&#91;Sat Mar  5 17:33:42 2022]\nlocalrule all:\n    input: results\/hello.txt, results\/helloBis.txt, results\/hello-bowtie2.txt, results\/helloBis-bowtie2.txt\n    jobid: 0\n    reason: Input files updated by another job: results\/helloBis-bowtie2.txt, results\/hello.txt, results\/helloBis.txt, results\/hello-bowtie2.txt\n    resources: mem_mb=500, disk_mb=&lt;TBD&gt;, tmpdir=\/tmp, runtime=00:01:00\n\nJob stats:\njob             count    min threads    max threads\n------------  -------  -------------  -------------\nall                 1              1              1\nbowtie2             2              1              1\nprintContent        2              1              1\ntotal               5              1              1\n\nThis was a dry-run (flag -n). The order of jobs does not reflect the order of execution.\n<\/code><\/pre>\n\n\n\n<p>Notice that <code>bowtie2<\/code> uses the default resources that we defined in the previous section:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>rule bowtie2:\n    input: inputs\/helloBis.txt\n    output: results\/helloBis-bowtie2.txt\n    jobid: 4\n    reason: Missing output files: results\/helloBis-bowtie2.txt\n    wildcards: sampleName=helloBis\n    resources: mem_mb=500, disk_mb=1000, tmpdir=\/tmp, runtime=00:01:00'<\/code><\/pre>\n\n\n\n<p>Define now the number of threads that the (fake) rule <code>bowtie2<\/code> should use in a <code>set-threads<\/code> section. <strong>ATTENTION: This section might not work with a more recent version of snakemake. It has been fixed the 03\/05\/2022 in this <a href=\"https:\/\/github.com\/snakemake\/snakemake\/pull\/1617\" target=\"_blank\" rel=\"noreferrer noopener\">pull request<\/a><\/strong>.<\/p>\n\n\n\n<p>We also define the resources of <code>bowtie2<\/code> after those of <code>printContent<\/code>. <strong>Also,<\/strong> remove cores in profile\/config.yaml otherwise,<strong> all your jobs will use only one CPU<\/strong>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>---\n\nsnakefile: snakeFile\n\nlatency-wait: 60\nreason: True\nshow-failed-logs: True\nkeep-going: True\nprintshellcmds: True\nrerun-incomplete: True\nrestart-times: 3\n\n# Cluster submission\njobname: \"{rule}.{jobid}\"              # Provide a custom name for the jobscript that is submitted to the cluster.\nmax-jobs-per-second: 1                 #Maximal number of cluster\/drmaa jobs per second, default is 10, fractions allowed.\nmax-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10\njobs: 400                              #Use at most N CPU cluster\/cloud jobs in parallel.\ncluster: \"sbatch --output=\\\"jobs\/{rule}\/slurm_%x_%j.out\\\" --error=\\\"jobs\/{rule}\/slurm_%x_%j.log\\\" --mem={resources.mem_mb} --time={resources.runtime}\"\n\n# Job resources\nset-resources:\n    - printContent:mem_mb=2000\n    - printContent:runtime=00:01:00\n    - bowtie2:mem_mb=1000\n    - bowtie2:runtime=00:03:00\n    \n# For some reasons time needs quotes to be read by snakemake\ndefault-resources:\n  - mem_mb=500\n  - runtime=\"00:01:00\"\n  \n# Define the number of threads used by rules\nset-threads:\n  - bowtie2=3<\/code><\/pre>\n\n\n\n<p>We instruct snakemake to use three threads when running the rule <code>bowtie2<\/code>. Perform a dry run:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\nsnakemake --profile profile\/ -n<\/code><\/pre>\n\n\n\n<p>You should obtain:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Building DAG of jobs...\nJob stats:\njob             count    min threads    max threads\n------------  -------  -------------  -------------\nall                 1              1              1\nbowtie2             2              1              1\nprintContent        2              1              1\ntotal               5              1              1\n\n\n&#91;Sat Mar  5 17:41:24 2022]\nrule bowtie2:\n    input: inputs\/hello.txt\n    output: results\/hello-bowtie2.txt\n    jobid: 3\n    reason: Missing output files: results\/hello-bowtie2.txt\n    wildcards: sampleName=hello\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat inputs\/hello.txt &gt; results\/hello-bowtie2.txt\n\n\n&#91;Sat Mar  5 17:41:24 2022]\nrule printContent:\n    input: inputs\/hello.txt\n    output: results\/hello.txt\n    jobid: 1\n    reason: Missing output files: results\/hello.txt\n    wildcards: sampleName=hello\n    resources: mem_mb=2000, disk_mb=1000, tmpdir=\/tmp, runtime=00:01:00\n\n\n    cat inputs\/hello.txt &gt; results\/hello.txt\n\n\n&#91;Sat Mar  5 17:41:24 2022]\nrule printContent:\n    input: inputs\/helloBis.txt\n    output: results\/helloBis.txt\n    jobid: 2\n    reason: Missing output files: results\/helloBis.txt\n    wildcards: sampleName=helloBis\n    resources: mem_mb=2000, disk_mb=1000, tmpdir=\/tmp, runtime=00:01:00\n\n\n    cat inputs\/helloBis.txt &gt; results\/helloBis.txt\n\n\n&#91;Sat Mar  5 17:41:24 2022]\nrule bowtie2:\n    input: inputs\/helloBis.txt\n    output: results\/helloBis-bowtie2.txt\n    jobid: 4\n    reason: Missing output files: results\/helloBis-bowtie2.txt\n    wildcards: sampleName=helloBis\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat inputs\/helloBis.txt &gt; results\/helloBis-bowtie2.txt\n\n\n&#91;Sat Mar  5 17:41:24 2022]\nlocalrule all:\n    input: results\/hello.txt, results\/helloBis.txt, results\/hello-bowtie2.txt, results\/helloBis-bowtie2.txt\n    jobid: 0\n    reason: Input files updated by another job: results\/hello-bowtie2.txt, results\/hello.txt, results\/helloBis.txt, results\/helloBis-bowtie2.txt\n    resources: mem_mb=500, disk_mb=&lt;TBD&gt;, tmpdir=\/tmp, runtime=00:01:00\n\nJob stats:\njob             count    min threads    max threads\n------------  -------  -------------  -------------\nall                 1              1              1\nbowtie2             2              1              1\nprintContent        2              1              1\ntotal               5              1              1\n\nThis was a dry-run (flag -n). The order of jobs does not reflect the order of execution.<\/code><\/pre>\n\n\n\n<p>As you can see, even if <code>bowtie2<\/code> uses the new resources, nothing changed. <strong>This is because the number of threads needs to also be defined in each rule<\/strong>. Indeed, <code>set-threads<\/code> does not set threads but it <strong>overwrites the thread usage of rules<\/strong>. Hence, modify your <code>snakeFile<\/code> to add the number of threads to each rule. Set 1 thread for <code>printContent<\/code> and 2 threads for <code>bowtie2<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>onstart:\n    print(\"##### Creating profile pipeline #####\\n\") \n    print(\"\\t Creating jobs output subfolders...\\n\")\n    shell(\"mkdir -p jobs\/printContent\")\n    shell(\"mkdir -p jobs\/bowtie2\")\n\nSAMPLENAMES=&#91;\"hello\", \"helloBis\"]\n\nrule all:\n  input:\n    expand(\"results\/{sampleName}.txt\", sampleName=SAMPLENAMES),\n    expand(\"results\/{sampleName}-bowtie2.txt\", sampleName=SAMPLENAMES)\n\nrule printContent:\n  input:\n    \"inputs\/{sampleName}.txt\"\n  output:\n    \"results\/{sampleName}.txt\"\n  threads: 1\n  shell:\n    \"\"\"\n    cat {input} &gt; {output}\n    \"\"\"\n\nrule bowtie2:\n  input:\n    \"inputs\/{sampleName}.txt\"\n  output:\n    \"results\/{sampleName}-bowtie2.txt\"\n  threads: 2\n  shell:\n    \"\"\"\n    cat {input} &gt; {output}\n    \"\"\"<\/code><\/pre>\n\n\n\n<p>Perform a <strong>real<\/strong> run and check the information about one <code>bowtie2<\/code> job on the cluster with <code>sinfo show jobid myjobid<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\nsnakemake --profile profile\/<\/code><\/pre>\n\n\n\n<p>You should obtain:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Building DAG of jobs...\nUsing shell: \/usr\/bin\/bash\nProvided cluster nodes: 400\nJob stats:\njob             count    min threads    max threads\n------------  -------  -------------  -------------\nall                 1              1              1\nbowtie2             2              3              3\nprintContent        2              1              1\ntotal               5              1              3\n\n##### Creating profile pipeline #####\n\n         Creating jobs output subfolders...\n\nmkdir -p jobs\/printContent\nmkdir -p jobs\/bowtie2\nSelect jobs to execute...\n\n&#91;Wed Apr 27 21:47:24 2022]\nrule bowtie2:\n    input: inputs\/helloBis.txt\n    output: results\/helloBis-bowtie2.txt\n    jobid: 4\n    reason: Missing output files: results\/helloBis-bowtie2.txt\n    wildcards: sampleName=helloBis\n    threads: 3\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat inputs\/helloBis.txt &gt; results\/helloBis-bowtie2.txt\n\nSubmitted job 4 with external jobid 'Submitted batch job 39158096'.\n\n&#91;Wed Apr 27 21:47:24 2022]\nrule printContent:\n    input: inputs\/hello.txt\n    output: results\/hello.txt\n    jobid: 1\n    reason: Missing output files: results\/hello.txt\n    wildcards: sampleName=hello\n    resources: mem_mb=2000, disk_mb=1000, tmpdir=\/tmp, runtime=00:01:00\n\n\n    cat inputs\/hello.txt &gt; results\/hello.txt\n\nSubmitted job 1 with external jobid 'Submitted batch job 39158097'.\n\n&#91;Wed Apr 27 21:47:24 2022]\nrule printContent:\n    input: inputs\/helloBis.txt\n    output: results\/helloBis.txt\n    jobid: 2\n    reason: Missing output files: results\/helloBis.txt\n    wildcards: sampleName=helloBis\n    resources: mem_mb=2000, disk_mb=1000, tmpdir=\/tmp, runtime=00:01:00\n\n\n    cat inputs\/helloBis.txt &gt; results\/helloBis.txt\n\nSubmitted job 2 with external jobid 'Submitted batch job 39158098'.\n\n&#91;Wed Apr 27 21:47:24 2022]\nrule bowtie2:\n    input: inputs\/hello.txt\n    output: results\/hello-bowtie2.txt\n    jobid: 3\n    reason: Missing output files: results\/hello-bowtie2.txt\n    wildcards: sampleName=hello\n    threads: 3\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat inputs\/hello.txt &gt; results\/hello-bowtie2.txt\n\nSubmitted job 3 with external jobid 'Submitted batch job 39158099'.\nWaiting at most 60 seconds for missing files.\n&#91;Wed Apr 27 21:47:54 2022]\nFinished job 4.\n1 of 5 steps (20%) done\n&#91;Wed Apr 27 21:47:54 2022]\nFinished job 1.\n2 of 5 steps (40%) done\n&#91;Wed Apr 27 21:47:54 2022]\nFinished job 2.\n3 of 5 steps (60%) done\n&#91;Wed Apr 27 21:47:54 2022]\nFinished job 3.\n4 of 5 steps (80%) done\nSelect jobs to execute...\n\n&#91;Wed Apr 27 21:47:54 2022]\nlocalrule all:\n    input: results\/hello.txt, results\/helloBis.txt, results\/hello-bowtie2.txt, results\/helloBis-bowtie2.txt\n    jobid: 0\n    reason: Input files updated by another job: results\/helloBis-bowtie2.txt, results\/helloBis.txt, results\/hello.txt, results\/hello-bowtie2.txt\n    resources: mem_mb=500, disk_mb=1000, tmpdir=\/tmp, runtime=00:01:00\n\n&#91;Wed Apr 27 21:47:54 2022]\nFinished job 0.\n5 of 5 steps (100%) done\nComplete log: \/g\/romebioinfo\/tmp\/snakemake-profile-demo\/.snakemake\/log\/2022-04-27T214724.373022.snakemake.log<\/code><\/pre>\n\n\n\n<p>In appearance, everything looks good. The rule <code>bowtie2<\/code> indeed uses 3 threads. However, if you checked the information of one <code>bowtie2<\/code> job on the cluster with <code>sinfo show jobid myjobid<\/code> you should have noticed that it uses only 1 CPU:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>NumNodes=1 NumCPUs=1 NumTasks=1 CPUs\/Task=1 ReqB:S:C:T=0:0:*:*<\/code><\/pre>\n\n\n\n<p>This is because we did not instruct how the cluster should handle threads. In the <code>cluster<\/code> section, add the instruction <code>--cpus-per-task={threads}<\/code>. Remember that <a href=\"https:\/\/snakemake.readthedocs.io\/en\/stable\/snakefiles\/rules.html?highlight=threads#threads\">{threads}<\/a> is a snakemake wildcards that needs to be defined in each rule. The <code>profile\/config.yaml<\/code> should look like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>---\n\nsnakefile: snakeFile\n\nlatency-wait: 60\nreason: True\nshow-failed-logs: True\nkeep-going: True\nprintshellcmds: True\nrerun-incomplete: True\nrestart-times: 3\n\n# Cluster submission\njobname: \"{rule}.{jobid}\"              # Provide a custom name for the jobscript that is submitted to the cluster.\nmax-jobs-per-second: 1                 #Maximal number of cluster\/drmaa jobs per second, default is 10, fractions allowed.\nmax-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10\njobs: 400                              #Use at most N CPU cluster\/cloud jobs in parallel.\ncluster: \"sbatch --output=\\\"jobs\/{rule}\/slurm_%x_%j.out\\\" --error=\\\"jobs\/{rule}\/slurm_%x_%j.log\\\" --mem={resources.mem_mb} --time={resources.runtime} --cpus-per-task={threads}\"\n\n# Job resources\nset-resources:\n    - printContent:mem_mb=2000\n    - printContent:runtime=00:01:00\n\n# For some reasons time needs quotes to be read by snakemake\ndefault-resources:\n  - mem_mb=500\n  - runtime=\"00:01:00\"\n\n# Define the number of threads used by rules\nset-threads:\n  - bowtie2=3\n<\/code><\/pre>\n\n\n\n<p>Delete the <code>results\/<\/code> folder and perform again a real run:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\nrm -r results\/\nsnakemake --profile profile\/\n<\/code><\/pre>\n\n\n\n<p>You should now be able to see that <code>bowtie2<\/code> jobs use the correct number of CPUs per task (<code>sinfo show jobid myjobid<\/code>):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>NumNodes=1 NumCPUs=3 NumTasks=1 CPUs\/Task=3 ReqB:S:C:T=0:0:*:*<\/code><\/pre>\n\n\n\n<div style=\"height:23px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>We are now able to customize the resources and number of threads of the jobs submitted to the cluster. However if one of your jobs fails,  with the current settings, snakemake will not be able to see it. In the next post, we will see how to handle errors.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the previous posts, we saw how to get started with snakemake, reduce command-line options, and submit your jobs to a cluster. I ended the previous post by mentioning resources that were set by default. In this post, we will see how to adapt the amount of RAM, time, and the number of threads to&hellip;<\/p>\n","protected":false},"author":5,"featured_media":346,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[4096],"tags":[5428,4098,5430],"embl_taxonomy":[],"class_list":["post-318","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technical","tag-resources","tag-snakemake","tag-threads"],"acf":[],"embl_taxonomy_terms":[],"featured_image_src":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-content\/uploads\/2022\/04\/hopper.jpg","_links":{"self":[{"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/posts\/318"}],"collection":[{"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/comments?post=318"}],"version-history":[{"count":16,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/posts\/318\/revisions"}],"predecessor-version":[{"id":474,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/posts\/318\/revisions\/474"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/media\/346"}],"wp:attachment":[{"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/media?parent=318"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/categories?post=318"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/tags?post=318"},{"taxonomy":"embl_taxonomy","embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/embl_taxonomy?post=318"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}