{"id":246,"date":"2022-03-31T08:26:58","date_gmt":"2022-03-31T08:26:58","guid":{"rendered":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/?p=246"},"modified":"2022-04-07T10:11:45","modified_gmt":"2022-04-07T10:11:45","slug":"snakemake-profile-2-reducing-command-line-options-with-profile","status":"publish","type":"post","link":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/blog\/2022\/03\/snakemake-profile-2-reducing-command-line-options-with-profile\/","title":{"rendered":"Snakemake profile &#8211; 2: Reducing command-line options with profile"},"content":{"rendered":"\n<div style=\"height:29px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>A <code>profile<\/code> is a folder that contains all the configuration parameters to successfully run your pipeline. Of note, if you have used a <code>cluster.json<\/code> file before, be aware that it has been <a href=\"https:\/\/snakemake.readthedocs.io\/en\/stable\/snakefiles\/configuration.html?highlight=cluster.json#cluster-configuration-deprecated\">deprecated<\/a>.<\/p>\n\n\n\n<div style=\"height:29px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Preparation of files (if you skipped the first <a href=\"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/blog\/2022\/03\/snakemake-profile-1-getting-started-with-snakemake\/\" target=\"_blank\" rel=\"noreferrer noopener\">post<\/a>)<\/h2>\n\n\n\n<p>Run the following script to create the folder structure:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\n# Create the folder containing the files needed for this tutorial\nmkdir snakemake-profile-demo\n\n# Enter the created folder\ncd snakemake-profile-demo\n\n# Create an empty file containing the snakemake code\ntouch snakeFile\n\n# Create toy input files\nmkdir inputs\necho \"toto\" &gt; inputs\/hello.txt\necho \"totoBis\" &gt; inputs\/helloBis.txt\n\n# Create an empty folder to create a conda environment\n# This is done to make sure that you use the same snakemake version as I do\nmkdir envs\ntouch envs\/environment.yaml<\/code><\/pre>\n\n\n\n<p>Copy the following content to <code>snakeFile<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>rule all:\n  input:\n    expand(\"results\/{sampleName}.txt\", sampleName=&#91;\"hello\", \"helloBis\"])\n\nrule printContent:\n  input:\n    \"inputs\/{sampleName}.txt\"\n  output:\n    \"results\/{sampleName}.txt\"\n  shell:\n    \"\"\"\n    cat {input} &gt; {output}\n    \"\"\"<\/code><\/pre>\n\n\n\n<p>Copy the following content to <code>environment.yaml<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>channels:\n  - bioconda\ndependencies:\n  - snakemake-minimal=6.15.1<\/code><\/pre>\n\n\n\n<p>Create and activate the conda environment:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\nconda env create -p envs\/smake --file envs\/environment.yaml\nconda activate envs\/smake<\/code><\/pre>\n\n\n\n<p>Test the pipeline:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\nsnakemake --snakefile snakeFile --cores=1<\/code><\/pre>\n\n\n\n<div style=\"height:27px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Snakemake options<\/h2>\n\n\n\n<p>In this section, I am going to detail the process of profile creation. This will increase progressively in complexity and we will need to add rules to the <code>snakeFile<\/code>. First create a <code>config.yaml<\/code> in a <code>profile<\/code> folder:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\n# Create the folder containing the configuration file, it can be named differently\nmkdir profile\n\n# Create a config.yaml that will contain all the configuration parameters\ntouch profile\/config.yaml<\/code><\/pre>\n\n\n\n<p>The first thing we are going to do is to define some general snakemake parameters. To get a complete list of them try <code>snakemake --help<\/code>. The choice of parameters is subjective and depends on what you want to achieve. However, I found the one below pretty useful on a daily basis. Let&#8217;s start with the parameters that we already used. Add the following content to <code>profile\/config.yaml<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>---\n\nsnakefile: snakeFile\ncores: 1<\/code><\/pre>\n\n\n\n<p>The <code>---<\/code> at the beginning of the file indicates the start of the document. This is not mandatory to use in our case, this is just a convention. Now run snakemake after deleting the <code>results<\/code> folder:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\n# Delete the results\/ folder if present\nrm -r results\/\n\n# Run snakemake with a dry run mode (option -n)\nsnakemake --profile profile\/ -n<\/code><\/pre>\n\n\n\n<p>A dry run means that the snakemake pipeline will be evaluated but that no files will be produced. You should obtain:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Building DAG of jobs...\nJob stats:\njob             count    min threads    max threads\n------------  -------  -------------  -------------\nall                 1              1              1\nprintContent        2              1              1\ntotal               3              1              1\n\n\n&#91;Fri Mar  4 08:44:12 2022]\nrule printContent:\n    input: inputs\/helloBis.txt\n    output: results\/helloBis.txt\n    jobid: 2\n    wildcards: sampleName=helloBis\n    resources: tmpdir=\/tmp\n\n\n&#91;Fri Mar  4 08:44:12 2022]\nrule printContent:\n    input: inputs\/hello.txt\n    output: results\/hello.txt\n    jobid: 1\n    wildcards: sampleName=hello\n    resources: tmpdir=\/tmp\n\n\n&#91;Fri Mar  4 08:44:12 2022]\nlocalrule all:\n    input: results\/hello.txt, results\/helloBis.txt\n    jobid: 0\n    resources: tmpdir=\/tmp\n\nJob stats:\njob             count    min threads    max threads\n------------  -------  -------------  -------------\nall                 1              1              1\nprintContent        2              1              1\ntotal               3              1              1\n\nThis was a dry-run (flag -n). The order of jobs does not reflect the order of execution.<\/code><\/pre>\n\n\n\n<p>As you observe, we were able to reduce the snakemake call from <code>snakemake --snakefile snakeFile --cores=1<\/code> to <code>snakemake --profile profile\/<\/code>. Therefore, the profile enables the definition of all the snakemake options (and more). <\/p>\n\n\n\n<p>Let&#8217;s now add more options to <code>profile\/config.yaml<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>---\n\nsnakefile: snakeFile\ncores: 1\n\nlatency-wait: 60\nreason: True\nshow-failed-logs: True\nkeep-going: True\nprintshellcmds: True\nrerun-incomplete: True\nrestart-times: 3<\/code><\/pre>\n\n\n\n<p><code>latency-wait<\/code> is useful as your system can sometimes be &#8220;slower&#8221; than snakemake. This means that even if an output file is created, snakemake might not see it. The default value is of 5 seconds, I usually set it to 60. <\/p>\n\n\n\n<p>If a job fails, for whatever reason, it is possible to ask snakemake to try it again by setting <code>re-run-incomplete<\/code> to True. If a job is run, it can be because the file it produces does not exist yet or because a file on which the job depends (i.e. the input file) was created yet neither. Indeed, the point of using snakemake is to write pipelines. Therefore, you will design a series of jobs that depend on one another. <\/p>\n\n\n\n<p>You can see the reason why a job is triggered by setting <code>reason<\/code> to True. <code>show-failed-logs<\/code> will display logs of failed jobs. <code>keep-going<\/code> tells snakemake to continue with independent jobs if one fails. In other words, snakemake will run as many rules as it can before terminating the pipeline. <code>printshellcmds<\/code> will print the code that you introduced in the <code>shell<\/code> section of your rules. Finally, with experience, you will notice that even if you define well the resources needed for each job (covered in the next post), the process can be prone to hiccups. By setting <code>re-run-incomplete<\/code> and <code>restart-times<\/code>, you minimize the chance of your pipeline failing even if well coded.<\/p>\n\n\n\n<p>Replace now the content of <code>profile\/config.yaml<\/code> with the above code and perform a dry run:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\n# Run snakemake with a dry run mode (option -n)\nsnakemake --profile profile\/ -n<\/code><\/pre>\n\n\n\n<p>You can see below that the <code>cat<\/code> instruction now appears in your terminal with the sampleName wildcards replaced by the actual values:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Building DAG of jobs...\nJob stats:\njob             count    min threads    max threads\n------------  -------  -------------  -------------\nall                 1              1              1\nprintContent        2              1              1\ntotal               3              1              1\n\n\n&#91;Fri Mar  4 09:34:13 2022]\nrule printContent:\n    input: inputs\/helloBis.txt\n    output: results\/helloBis.txt\n    jobid: 2\n    reason: Missing output files: results\/helloBis.txt\n    wildcards: sampleName=helloBis\n    resources: tmpdir=\/tmp\n\n\n    cat inputs\/helloBis.txt &gt; results\/helloBis.txt\n\n\n&#91;Fri Mar  4 09:34:13 2022]\nrule printContent:\n    input: inputs\/hello.txt\n    output: results\/hello.txt\n    jobid: 1\n    reason: Missing output files: results\/hello.txt\n    wildcards: sampleName=hello\n    resources: tmpdir=\/tmp\n\n\n    cat inputs\/hello.txt &gt; results\/hello.txt\n\n\n&#91;Fri Mar  4 09:34:13 2022]\nlocalrule all:\n    input: results\/hello.txt, results\/helloBis.txt\n    jobid: 0\n    reason: Input files updated by another job: results\/helloBis.txt, results\/hello.txt\n    resources: tmpdir=\/tmp\n\nJob stats:\njob             count    min threads    max threads\n------------  -------  -------------  -------------\nall                 1              1              1\nprintContent        2              1              1\ntotal               3              1              1\n\nThis was a dry-run (flag -n). The order of jobs does not reflect the order of execution.<\/code><\/pre>\n\n\n\n<p>Overall, we reduced the snakemake command from <code>snakemake --snakefile snakeFile --cores 1 --latency-wait 60 --restart-times 3 --rerun-incomplete --reason --show-failed-logs --keep-going --printshellcmds<\/code><br>to a shorter call <code>snakemake --profile profile\/<\/code>.<\/p>\n\n\n\n<p>Next week, we will see how to submit your jobs to a cluster. Stay tuned! (<a href=\"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/blog\/2022\/04\/snakemake-profile-3-cluster-submission-defining-parameters\/\" target=\"_blank\" rel=\"noreferrer noopener\">Next post<\/a>)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A profile is a folder that contains all the configuration parameters to successfully run your pipeline. Of note, if you have used a cluster.json file before, be aware that it has been deprecated. Preparation of files (if you skipped the first post) Run the following script to create the folder&hellip;<\/p>\n","protected":false},"author":5,"featured_media":252,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[4096],"tags":[4102,4466,4100,4098],"embl_taxonomy":[],"class_list":["post-246","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technical","tag-beginner","tag-introduction","tag-profile","tag-snakemake"],"acf":[],"embl_taxonomy_terms":[],"featured_image_src":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-content\/uploads\/2022\/03\/picasso.jpg","_links":{"self":[{"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/posts\/246"}],"collection":[{"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/comments?post=246"}],"version-history":[{"count":3,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/posts\/246\/revisions"}],"predecessor-version":[{"id":314,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/posts\/246\/revisions\/314"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/media\/252"}],"wp:attachment":[{"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/media?parent=246"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/categories?post=246"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/tags?post=246"},{"taxonomy":"embl_taxonomy","embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/embl_taxonomy?post=246"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}