{"id":1622,"date":"2022-07-15T11:03:26","date_gmt":"2022-07-15T11:03:26","guid":{"rendered":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/?p=1622"},"modified":"2022-07-22T09:47:15","modified_gmt":"2022-07-22T09:47:15","slug":"dealing-with-subworkflows-in-snakemake","status":"publish","type":"post","link":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/blog\/2022\/07\/dealing-with-subworkflows-in-snakemake\/","title":{"rendered":"Dealing with subworkflows in Snakemake"},"content":{"rendered":"\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p>Subdividing your pipeline into subworkflows is a powerful mean to achieve complex analysis. It also enables to avoid code repetition. Typically, if one has to<br>analyse different types of data such as ChIP-seq and RNA-seq, and then perform multi-omics data integration, it becomes handy to create 2 subworflows (ChIPseq, RNAseq) and<br>connect the different types of data in a MultiOmics workflow.<\/p>\n\n\n\n<p>In this tutorial, we will use toy examples that will lead to creating the subworkflows ChIPSeq and RNAseq. These subworkflows will be connected into a third one. Check the previous posts on &#8220;<a href=\"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/blog\/2022\/03\/snakemake-profile-1-getting-started-with-snakemake\/\" target=\"_blank\" rel=\"noreferrer noopener\">getting started with snakemake<\/a>&#8220;, &#8220;<a href=\"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/blog\/2022\/03\/snakemake-profile-2-reducing-command-line-options-with-profile\/\" target=\"_blank\" rel=\"noreferrer noopener\">creating a profile<\/a>&#8220;, and<br>&#8220;<a href=\"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/blog\/2022\/04\/snakemake-profile-3-cluster-submission-defining-parameters\/\" target=\"_blank\" rel=\"noreferrer noopener\">submitting jobs to a cluster<\/a>&#8221; if you are not familiar with these concepts.<\/p>\n\n\n\n<p>We start by preparing the folder structure, testing the subworkflows one after the other, and finish by coordinating the merging of the data into one Snakefile.<\/p>\n\n\n\n<div style=\"height:22px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Preparing the files<\/h2>\n\n\n\n<p>Run the following script to create the folder structure:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\n## Create the project folder\nmkdir demo-subworkflows\ncd demo-subworkflows\n\n## create the folder and the file for the conda environment\nmkdir envs\ntouch envs\/environment.yaml\n\n## create the folder and the file for the profile\nmkdir profile\ntouch profile\/config.yaml\n\n## create the Snakefile that will connect the subworkflows ChIPSeq and RNASeq\ntouch Snakefile\n\n## Create the ChIPSeq workflow files\nmkdir ChIPSeq\ntouch ChIPSeq\/Snakefile-ChIPSeq\nmkdir ChIPSeq\/profileChIPSeq\ntouch ChIPSeq\/profileChIPSeq\/config.yaml\ntouch ChIPSeq\/profileChIPSeq\/status-sacct.sh\nmkdir ChIPSeq\/inputs\ntouch ChIPSeq\/inputs\/exp1.fastq  # This is a toy example, the file is empty\ntouch ChIPSeq\/inputs\/exp2.fastq  # This is a toy example, the file is empty\nmkdir ChIPSeq\/external_rules\ntouch ChIPSeq\/external_rules\/mockrule.smk\ntouch ChIPSeq\/external_rules\/callscript.smk\nmkdir ChIPSeq\/scripts\ntouch ChIPSeq\/scripts\/mockscript.R\n\n## Create the RNASeq workflow files\nmkdir RNASeq\ntouch RNASeq\/Snakefile-RNASeq\nmkdir RNASeq\/profileRNASeq\ntouch RNASeq\/profileRNASeq\/config.yaml\ntouch RNASeq\/profileRNASeq\/status-sacct.sh\nmkdir RNASeq\/inputs\ntouch RNASeq\/inputs\/exp1.fastq  # This is a toy example, the file is empty\ntouch RNASeq\/inputs\/exp2.fastq  # This is a toy example, the file is empty\nmkdir RNASeq\/external_rules\ntouch RNASeq\/external_rules\/mockrule.smk\ntouch RNASeq\/external_rules\/callscript.smk\nmkdir RNASeq\/scripts\ntouch RNASeq\/scripts\/mockscript.R\n\n## Create the MultiOmics workflow files\nmkdir MultiOmics\ntouch MultiOmics\/Snakefile-MultiOmics\nmkdir MultiOmics\/profileMultiOmics\ntouch MultiOmics\/profileMultiOmics\/config.yaml\nmkdir MultiOmics\/external_rules\ntouch MultiOmics\/external_rules\/mockrule.smk\n<\/code><\/pre>\n\n\n\n<p>For more information about the <code>YAML<\/code> format see <a rel=\"noreferrer noopener\" href=\"https:\/\/www.cloudbees.com\/blog\/yaml-tutorial-everything-you-need-get-started\" target=\"_blank\">here<\/a>. For the next step, you need to have<br><a rel=\"noreferrer noopener\" href=\"https:\/\/docs.conda.io\/projects\/conda\/en\/latest\/user-guide\/install\/index.html\" target=\"_blank\">conda<\/a> installed on your computer. Copy the following content to <code>envs\/environment.yaml<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>channels:\n  - bioconda\ndependencies:\n  - snakemake=7.8.2<\/code><\/pre>\n\n\n\n<p>Then execute the following command to create a conda environment containing snakemake v7.8.2:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\nconda env create -p envs\/smake --file envs\/environment.yaml<\/code><\/pre>\n\n\n\n<p>If you get stuck with the installation at <code>Solving environment<\/code>, follow these steps:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\nconda create -p envs\/smake python=3.9.13\nconda activate envs\/smake\nconda install -c bioconda snakemake=7.8.2\n<\/code><\/pre>\n\n\n\n<p>You can verify that the correct version of snakemake has been installed with:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\n## If you did not already activate the environment: conda activate envs\/smake\nsnakemake --version\nconda deactivate\n<\/code><\/pre>\n\n\n\n<p>In the next section, we are going to build and test the ChIPSeq subworkflow.<\/p>\n\n\n\n<div style=\"height:22px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">The ChIPSeq subworkflow<\/h2>\n\n\n\n<p>If you are not familiar with the different steps to process ChIP-seq data, read this <a href=\"https:\/\/training.galaxyproject.org\/training-material\/topics\/epigenetics\/tutorials\/formation_of_super-structures_on_xi\/tutorial.html\" target=\"_blank\" rel=\"noreferrer noopener\">tutorial<\/a><br>by the galaxy training network.<\/p>\n\n\n\n<p>Copy the following content into <code>ChIPSeq\/Snakefile-ChIPSeq<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Load pipeline: snakemake --profile profileChIPSeq\/ -n\n\nonstart:\n    print(\"##### ChIPSeq workflow #####\\n\") \n    print(\"\\t Creating jobs output subfolders...\\n\")\n    shell(\"mkdir -p jobs\/mockAlignment\")\n    shell(\"mkdir -p jobs\/mockpeakDetection\")\n\n#########\n# Variable definition\n########\n\nFILESNAMES=&#91;\"exp1\", \"exp2\"]\n\n#########\n# Rules\n########\n\nrule all:\n    input:\n        expand(\"results\/alignment\/{sampleName}.bam\", sampleName=FILESNAMES),\n        expand(\"results\/peaks\/{sampleName}.peaks\", sampleName=FILESNAMES)\n\ninclude: \"external_rules\/mockrule.smk\"\ninclude: \"external_rules\/callscript.smk\"<\/code><\/pre>\n\n\n\n<p>The rule to copy in <code>ChIPSeq\/external_rules\/mockrule.smk<\/code> is:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>rule mockAlignment:\n  input:\n    \"inputs\/{sampleName}.fastq\"\n  output:\n    \"results\/alignment\/{sampleName}.bam\"\n  threads: 1\n  shell:\n    \"\"\"\n    cat {input} &gt; {output}\n    sleep 10s\n    \"\"\"<\/code><\/pre>\n\n\n\n<p>In a real context, the shell section of the code above will be more complex. It would contain instructions to align your fastq files to a reference genome. Your fastq files would be in <code>gz<\/code> format and would have been trimmed. Above is just a toy example that creates an empty file.<\/p>\n\n\n\n<p>The second rule calls an R script from which a mock peak detection is performed. In a real ChIPSeq pipeline, more <a rel=\"noreferrer noopener\" href=\"https:\/\/training.galaxyproject.org\/training-material\/topics\/epigenetics\/tutorials\/formation_of_super-structures_on_xi\/tutorial.html\" target=\"_blank\">steps<\/a> would be required. <strong>Important:<\/strong> Note that in the code below, the path to the script is relative to the position of <code>ChIPSeq\/external_rules\/callscript.smk<\/code>. Add this rule to<br><code>ChIPSeq\/external_rules\/callscript.smk<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>rule mockpeakDetection:\n  input:\n    \"results\/alignment\/{sampleName}.bam\"\n  output:\n    \"results\/peaks\/{sampleName}.peaks\"\n  threads: 1\n  script:\n    \"..\/scripts\/mockscript.R\"<\/code><\/pre>\n\n\n\n<p>The input of the rule <code>mockpeakDetection<\/code> is the output of the rule <code>mockAlignment<\/code>. It is said that <code>mockPeakDetection<\/code> depends on <code>mockAlignment<\/code>. In other words, in order to perform the peak detection (on the bam files), you first need to generate the bam files by performing the alignment of the fastq files. Let&#8217;s visualize the <a rel=\"noreferrer noopener\" href=\"https:\/\/snakemake.readthedocs.io\/en\/stable\/tutorial\/basics.html?highlight=dag#step-4-indexing-read-alignments-and-visualizing-the-dag-of-jobs\" target=\"_blank\">directed acyclic graph<\/a> of the jobs. It gives a representation of the relationships between the rules of a workflow:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/sh\n\nconda activate envs\/smake\ncd ChIPSeq\nsnakemake --snakefile Snakefile-ChIPSeq --cores 1 -n --rulegraph | dot -Tpng &gt; dag.png<\/code><\/pre>\n\n\n\n<p>You should obtain the following graph:<\/p>\n\n\n\n<figure class=\"vf-figure wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" class=\"vf-figure__image\" src=\"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-content\/uploads\/2022\/07\/dag.png\" alt=\"\" class=\"wp-image-1642\" width=\"104\" height=\"126\"\/><\/figure>\n\n\n\n<p>The <code>scripts\/mockscript.R<\/code> must contain:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>outputFile &lt;- snakemake@output&#91;&#91;1]]\nmessage(\"Writing to \", outputFile)\ntowrite &lt;- \"hello\"\nwrite(towrite, file=outputFile, ncolumns=1)<\/code><\/pre>\n\n\n\n<p>Before running the workflow, we need to create the profile in order to submit the jobs to a cluster. For more information, see the links at the top of this page. If you are not using slurm, you will need to modify the code below. I also provide below the command to run the workflow in local. Copy the following content in <code>profileChIPSeq\/config.yaml<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>---\n\nsnakefile: Snakefile-ChIPSeq\n\nlatency-wait: 60\nreason: True\nshow-failed-logs: True\nkeep-going: True\nprintshellcmds: True\n\n# Cluster submission\njobname: \"{rule}.{jobid}\"              # Provide a custom name for the jobscript that is submitted to the cluster.\nmax-jobs-per-second: 1                 #Maximal number of cluster\/drmaa jobs per second, default is 10, fractions allowed.\nmax-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10\njobs: 400                              #Use at most N CPU cluster\/cloud jobs in parallel.\ncluster: \"sbatch --output=\\\"jobs\/{rule}\/slurm_%x_%j.out\\\" --error=\\\"jobs\/{rule}\/slurm_%x_%j.log\\\" --mem={resources.mem_mb} --time={resources.runtime} --parsable\"\ncluster-status: \".\/profileChIPSeq\/status-sacct.sh\" #  Use to handle timeout exception, do not forget to chmod +x\n\n# Job resources\nset-resources:\n  - mockAlignment:mem_mb=1000\n  - mockAlignment:runtime=00:03:00\n  - mockpeakDetection:mem_mb=1000\n  - mockpeakDetection:runtime=00:03:00\n    \n# For some reasons time needs quotes to be read by snakemake\ndefault-resources:\n  - mem_mb=500\n  - runtime=\"00:01:00\"\n  \n# Define the number of threads used by rules\nset-threads:\n  - mockAlignment=1\n  - mockpeakDetection=1<\/code><\/pre>\n\n\n\n<p>Add <code>profileChIPSeq\/status-sacct.sh<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/env bash\n\n# Check status of Slurm job\n\njobid=\"$1\"\n\nif &#91;&#91; \"$jobid\" == Submitted ]]\nthen\n  echo smk-simple-slurm: Invalid job ID: \"$jobid\" &gt;&amp;2\n  echo smk-simple-slurm: Did you remember to add the flag --parsable to your sbatch call? &gt;&amp;2\n  exit 1\nfi\n\noutput=`sacct -j \"$jobid\" --format State --noheader | head -n 1 | awk '{print $1}'`\n\nif &#91;&#91; $output =~ ^(COMPLETED).* ]]\nthen\n  echo success\nelif &#91;&#91; $output =~ ^(RUNNING|PENDING|COMPLETING|CONFIGURING|SUSPENDED).* ]]\nthen\n  echo running\nelse\n  echo failed\nfi<\/code><\/pre>\n\n\n\n<p>Now run the pipeline, either using the profile to submit to the cluster, or the second command to run it locally. You can also remove everything from the <code># cluster submission<\/code> line in the profile to run the workflow in local and using the first command:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\n## Make `profile\/status-sacct.sh` executable if you submit to a cluster\nchmod +x profileChIPSeq\/status-sacct.sh\n \n## Submitting to the cluster (or in local if you deleted the relevant lines in the profile)\nsnakemake --profile profileChIPSeq\/\n\n## OR\n## Running in local\nsnakemake --snakefile Snakefile-ChIPSeq --cores=1<\/code><\/pre>\n\n\n\n<p>You should see:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Building DAG of jobs...\nUsing shell: \/usr\/bin\/bash\nProvided cluster nodes: 400\nJob stats:\njob                  count    min threads    max threads\n-----------------  -------  -------------  -------------\nall                      1              1              1\nmockAlignment            2              1              1\nmockpeakDetection        2              1              1\ntotal                    5              1              1\n  \n##### ChIPSeq workflow #####\n\n     Creating jobs output subfolders...\n\nmkdir -p jobs\/mockAlignment\nmkdir -p jobs\/mockpeakDetection\nSelect jobs to execute...\n\n&#91;Thu Mar 31 16:31:49 2022]\nrule mockAlignment:\n    input: inputs\/exp1.fastq\n    output: results\/alignment\/exp1.bam\n    jobid: 1\n    reason: Missing output files: results\/alignment\/exp1.bam\n    wildcards: sampleName=exp1\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat inputs\/exp1.fastq &gt; results\/alignment\/exp1.bam\n    sleep 10s\n    \nSubmitted job 1 with external jobid '37058084'.\n\n&#91;Thu Mar 31 16:31:49 2022]\nrule mockAlignment:\n    input: inputs\/exp2.fastq\n    output: results\/alignment\/exp2.bam\n    jobid: 2\n    reason: Missing output files: results\/alignment\/exp2.bam\n    wildcards: sampleName=exp2\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat inputs\/exp2.fastq &gt; results\/alignment\/exp2.bam\n    sleep 10s\n    \nSubmitted job 2 with external jobid '37058085'.\nWaiting at most 60 seconds for missing files.\n&#91;Thu Mar 31 16:32:40 2022]\nFinished job 1.\n1 of 5 steps (20%) done\nSelect jobs to execute...\n\n&#91;Thu Mar 31 16:32:40 2022]\nrule mockpeakDetection:\n    input: results\/alignment\/exp1.bam\n    output: results\/peaks\/exp1.peaks\n    jobid: 3\n    reason: Missing output files: results\/peaks\/exp1.peaks; Input files updated by another job: results\/alignment\/exp1.bam\n    wildcards: sampleName=exp1\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\nSubmitted job 3 with external jobid '37058227'.\n&#91;Thu Mar 31 16:32:40 2022]\nFinished job 2.\n2 of 5 steps (40%) done\nSelect jobs to execute...\n\n&#91;Thu Mar 31 16:32:40 2022]\nrule mockpeakDetection:\n    input: results\/alignment\/exp2.bam\n    output: results\/peaks\/exp2.peaks\n    jobid: 4\n    reason: Missing output files: results\/peaks\/exp2.peaks; Input files updated by another job: results\/alignment\/exp2.bam\n    wildcards: sampleName=exp2\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\nSubmitted job 4 with external jobid '37058228'.\nWaiting at most 60 seconds for missing files.\n&#91;Thu Mar 31 16:33:10 2022]\nFinished job 3.\n3 of 5 steps (60%) done\n&#91;Thu Mar 31 16:33:10 2022]\nFinished job 4.\n4 of 5 steps (80%) done\nSelect jobs to execute...\n\n&#91;Thu Mar 31 16:33:10 2022]\nlocalrule all:\n    input: results\/alignment\/exp1.bam, results\/alignment\/exp2.bam, results\/peaks\/exp1.peaks, results\/peaks\/exp2.peaks\n    jobid: 0\n    reason: Input files updated by another job: results\/peaks\/exp2.peaks, results\/peaks\/exp1.peaks, results\/alignment\/exp2.bam, results\/alignment\/exp1.bam\n    resources: mem_mb=500, disk_mb=1000, tmpdir=\/tmp, runtime=00:01:00\n\n&#91;Thu Mar 31 16:33:10 2022]\nFinished job 0.\n5 of 5 steps (100%) done\nComplete log: demo-subworkflows\/ChIPSeq\/.snakemake\/log\/2022-03-31T163148.828799.snakemake.log<\/code><\/pre>\n\n\n\n<p>Let&#8217;s verify that <code>ChIPSeq\/external_rules\/mockrule.smk<\/code> created correctly the bam files:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\n## It should print: results\/alignment\/exp1.bam  results\/alignment\/exp2.bam\nls results\/alignment\/*<\/code><\/pre>\n\n\n\n<p>Now check that the &#8220;peak detection&#8221; created the files correctly. Each file should contain &#8220;hello&#8221; that was added by <code>scripts\/mockscript.R<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\n## Checking that the files were created\n## It should print: results\/peaks\/exp1.peaks  results\/peaks\/exp2.peaks\nls results\/peaks\/*\n\n## Checking the content of each file, You should see the string \"hello\" in each file.\nmore results\/peaks\/*<\/code><\/pre>\n\n\n\n<p>Change the current shell location to <code>RNASeq<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\ncd ..\/RNASeq<\/code><\/pre>\n\n\n\n<div style=\"height:22px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">The RNA-Seq subworkflow<\/h2>\n\n\n\n<p>If you are not familiar with the different steps to process RNA-seq data, read this <a href=\"https:\/\/training.galaxyproject.org\/training-material\/topics\/transcriptomics\/tutorials\/ref-based\/tutorial.html\" target=\"_blank\" rel=\"noreferrer noopener\">tutorial<\/a><br>by the galaxy training network. Below we will create two toy rules <code>mockFeatureCounts<\/code> and <code>mockDifferentialAnalysis<\/code> that will be contained in <code>external_rules\/mockrule.smk<\/code> and<br><code>external_rules\/callscript.smk<\/code>.<\/p>\n\n\n\n<p>Copy the following content into <code>Snakefile-RNASeq<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Load pipeline: snakemake --profile profileRNASeq\/ -n\n\nonstart:\n    print(\"##### RNASeq workflow #####\\n\") \n    print(\"\\t Creating jobs output subfolders...\\n\")\n    shell(\"mkdir -p jobs\/mockFeatureCounts\")\n    shell(\"mkdir -p jobs\/mockDifferentialAnalysis\")\n\n#########\n# Variable definition\n########\n\nFILESNAMES=&#91;\"exp1\", \"exp2\"]\n\n#########\n# Rules\n########\n\nrule all:\n    input:\n        expand(\"results\/featureCounts\/{sampleName}.txt\", sampleName=FILESNAMES),\n        expand(\"results\/differentialAnalysis\/{sampleName}.csv\", sampleName=FILESNAMES)\n\ninclude: \"external_rules\/mockrule.smk\"\ninclude: \"external_rules\/callscript.smk\"<\/code><\/pre>\n\n\n\n<p>The rule to copy in <code>external_rules\/mockrule.smk<\/code> is:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>rule mockFeatureCounts:\n  input:\n    \"inputs\/{sampleName}.fastq\"\n  output:\n    \"results\/featureCounts\/{sampleName}.txt\"\n  threads: 1\n  shell:\n    \"\"\"\n    cat {input} &gt; {output}\n    sleep 10s\n    \"\"\"<\/code><\/pre>\n\n\n\n<p>The rule to copy in <code>external_rules\/callscript.smk<\/code> is:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>rule mockDifferentialAnalysis:\n  input:\n    \"results\/featureCounts\/{sampleName}.txt\"\n  output:\n    \"results\/differentialAnalysis\/{sampleName}.csv\"\n  threads: 1\n  script:\n    \"..\/scripts\/mockscript.R\"<\/code><\/pre>\n\n\n\n<p>Add the following content to <code>scripts\/mockscript.R<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>outputFile &lt;- snakemake@output&#91;&#91;1]]\nmessage(\"Writing to \", outputFile)\ntowrite &lt;- \"This is a differential gene expression analysis normally performed with a package such as deseq2\"\nwrite(towrite, file=outputFile, ncolumns=1)<\/code><\/pre>\n\n\n\n<p>As we did in the previous section, create <code>profileRNASeq\/config.yaml<\/code> with the following content:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>---\n\nsnakefile: Snakefile-RNASeq\n\nlatency-wait: 60\nreason: True\nshow-failed-logs: True\nkeep-going: True\nprintshellcmds: True\n\n# Cluster submission\njobname: \"{rule}.{jobid}\"              # Provide a custom name for the jobscript that is submitted to the cluster.\nmax-jobs-per-second: 1                 #Maximal number of cluster\/drmaa jobs per second, default is 10, fractions allowed.\nmax-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10\njobs: 400                              #Use at most N CPU cluster\/cloud jobs in parallel.\ncluster: \"sbatch --output=\\\"jobs\/{rule}\/slurm_%x_%j.out\\\" --error=\\\"jobs\/{rule}\/slurm_%x_%j.log\\\" --mem={resources.mem_mb} --time={resources.runtime} --parsable\"\ncluster-status: \".\/profileRNASeq\/status-sacct.sh\" #  Use to handle timeout exception, do not forget to chmod +x\n\n# Job resources\nset-resources:\n  - mockFeatureCounts:mem_mb=1000\n  - mockFeatureCounts:runtime=00:03:00\n  - mockDifferentialAnalysis:mem_mb=1000\n  - mockDifferentialAnalysis:runtime=00:03:00\n    \n# For some reasons time needs quotes to be read by snakemake\ndefault-resources:\n  - mem_mb=500\n  - runtime=\"00:01:00\"\n  \n# Define the number of threads used by rules\nset-threads:\n  - mockFeatureCounts=1\n  - mockDifferentialAnalysis=1<\/code><\/pre>\n\n\n\n<p>Exactly as before, add <code>profileRNASeq\/status-sacct.sh<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/env bash\n\n# Check status of Slurm job\n\njobid=\"$1\"\n\nif &#91;&#91; \"$jobid\" == Submitted ]]\nthen\n  echo smk-simple-slurm: Invalid job ID: \"$jobid\" &gt;&amp;2\n  echo smk-simple-slurm: Did you remember to add the flag --parsable to your sbatch call? &gt;&amp;2\n  exit 1\nfi\n\noutput=`sacct -j \"$jobid\" --format State --noheader | head -n 1 | awk '{print $1}'`\n\nif &#91;&#91; $output =~ ^(COMPLETED).* ]]\nthen\n  echo success\nelif &#91;&#91; $output =~ ^(RUNNING|PENDING|COMPLETING|CONFIGURING|SUSPENDED).* ]]\nthen\n  echo running\nelse\n  echo failed\nfi<\/code><\/pre>\n\n\n\n<p>Make it executable and run the workflow either on the cluster or locally:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\n## Make `profile\/status-sacct.sh` executable if you submit to a cluster\nchmod +x profileRNASeq\/status-sacct.sh\n\n## Submitting to the cluster (or in local if you deleted the relevant lines in the profile)\nsnakemake --profile profileRNASeq\/\n\n## OR\n## Running in local without profile\nsnakemake --snakefile Snakefile-RNASeq --cores=1\n<\/code><\/pre>\n\n\n\n<p>You should see in your terminal:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Building DAG of jobs...\nUsing shell: \/usr\/bin\/bash\nProvided cluster nodes: 400\nJob stats:\njob                         count    min threads    max threads\n------------------------  -------  -------------  -------------\nall                             1              1              1\nmockDifferentialAnalysis        2              1              1\nmockFeatureCounts               2              1              1\ntotal                           5              1              1\n\n##### RNASeq workflow #####\n\n     Creating jobs output subfolders...\n\nmkdir -p jobs\/mockFeatureCounts\nmkdir -p jobs\/mockDifferentialAnalysis\nSelect jobs to execute...\n\n&#91;Thu Jun 16 16:06:27 2022]\nrule mockFeatureCounts:\n    input: inputs\/exp1.fastq\n    output: results\/featureCounts\/exp1.txt\n    jobid: 1\n    reason: Missing output files: results\/featureCounts\/exp1.txt\n    wildcards: sampleName=exp1\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat inputs\/exp1.fastq &gt; results\/featureCounts\/exp1.txt\n    sleep 10s\n    \nSubmitted job 1 with external jobid '42024859'.\n\n&#91;Thu Jun 16 16:06:27 2022]\nrule mockFeatureCounts:\n    input: inputs\/exp2.fastq\n    output: results\/featureCounts\/exp2.txt\n    jobid: 2\n    reason: Missing output files: results\/featureCounts\/exp2.txt\n    wildcards: sampleName=exp2\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat inputs\/exp2.fastq &gt; results\/featureCounts\/exp2.txt\n    sleep 10s\n    \nSubmitted job 2 with external jobid '42024860'.\n&#91;Thu Jun 16 16:06:57 2022]\nFinished job 1.\n1 of 5 steps (20%) done\nSelect jobs to execute...\n\n&#91;Thu Jun 16 16:06:57 2022]\nrule mockDifferentialAnalysis:\n    input: results\/featureCounts\/exp1.txt\n    output: results\/differentialAnalysis\/exp1.csv\n    jobid: 3\n    reason: Missing output files: results\/differentialAnalysis\/exp1.csv; Input files updated by another job: results\/featureCounts\/exp1.txt\n    wildcards: sampleName=exp1\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\nSubmitted job 3 with external jobid '42024877'.\n&#91;Thu Jun 16 16:06:57 2022]\nFinished job 2.\n2 of 5 steps (40%) done\nSelect jobs to execute...\n\n&#91;Thu Jun 16 16:06:57 2022]\nrule mockDifferentialAnalysis:\n    input: results\/featureCounts\/exp2.txt\n    output: results\/differentialAnalysis\/exp2.csv\n    jobid: 4\n    reason: Missing output files: results\/differentialAnalysis\/exp2.csv; Input files updated by another job: results\/featureCounts\/exp2.txt\n    wildcards: sampleName=exp2\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\nSubmitted job 4 with external jobid '42024878'.\nWaiting at most 60 seconds for missing files.\n&#91;Thu Jun 16 16:07:28 2022]\nFinished job 3.\n3 of 5 steps (60%) done\n&#91;Thu Jun 16 16:07:28 2022]\nFinished job 4.\n4 of 5 steps (80%) done\nSelect jobs to execute...\n\n&#91;Thu Jun 16 16:07:28 2022]\nlocalrule all:\n    input: results\/featureCounts\/exp1.txt, results\/featureCounts\/exp2.txt, results\/differentialAnalysis\/exp1.csv, results\/differentialAnalysis\/exp2.csv\n    jobid: 0\n    reason: Input files updated by another job: results\/differentialAnalysis\/exp2.csv, results\/differentialAnalysis\/exp1.csv, results\/featureCounts\/exp2.txt, results\/featureCounts\/exp1.txt\n    resources: mem_mb=500, disk_mb=1000, tmpdir=\/tmp, runtime=00:01:00\n\n&#91;Thu Jun 16 16:07:28 2022]\nFinished job 0.\n5 of 5 steps (100%) done\nComplete log: .snakemake\/log\/2022-06-16T160626.178736.snakemake.log<\/code><\/pre>\n\n\n\n<p>Let&#8217;s verify that <code>RNASeq\/external_rules\/mockrule.smk<\/code> created correctly the featureCounts files:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\n## It should print: results\/featureCounts\/exp1.txt  results\/featureCounts\/exp2.txt\nls results\/featureCounts\/*<\/code><\/pre>\n\n\n\n<p>Now check that the &#8220;differential analysis&#8221; created the files correctly. Each file should contain the sentence &#8220;This is a differential gene expression analysis normally performed with a package such as deseq2&#8221; that was added by <code>scripts\/mockscript.R<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\n## Checking that the files were created\n## It should print: results\/differentialAnalysis\/exp1.csv  results\/differentialAnalysis\/exp2.csv\nls results\/differentialAnalysis\/*\n\n## Checking the content of each file\nmore results\/differentialAnalysis\/*<\/code><\/pre>\n\n\n\n<p>Go to the <code>Multiomics<\/code> folder of the project:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\ncd ..\/<span style=\"background-color: inherit; font-size: inherit; color: initial;\">Multiomics<\/span>\n<\/code><\/pre>\n\n\n\n<div style=\"height:22px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Connecting the ChIP-Seq and RNA-Seq subworkflow<\/h2>\n\n\n\n<p>Connecting the subworkflows in one Snakefile enables us to run them in one call but also to develop extra rules on top of them. One could thus build re-usable blocks that can be plugged into new projects and pipelines. We first run the connecting workflow locally. We will then see how to modify the files to run it on an HPC.<\/p>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Connecting subworkflows locally<\/h3>\n\n\n\n<p>With Snakemake 6.0 and later, it is possible to define external workflows as modules, from which rules can be used by explicitly \u201cimporting\u201d them. First, make sure that the user is using a correct version of snakemake by enforcing a minimum version (keep reading before adding to <code>Snakefile-MultiOmics<\/code>):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from snakemake.utils import min_version\nmin_version(\"6.0.0\")\n\nonstart:\n    print(\"##### ChIP-seq and RNA-seq workflows #####\\n\") \n    print(\"\\t Reading samples and metadata....\\n\")<\/code><\/pre>\n\n\n\n<p>Import the <code>ChIPSeq<\/code> and <code>RNASeq<\/code> subworkflows as modules (keep reading before adding):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from snakemake.utils import min_version\nmin_version(\"6.0.0\")\n\nonstart:\n    print(\"##### ChIP-seq and RNA-seq workflows #####\\n\") \n    print(\"\\t Reading samples and metadata....\\n\")\n\nmodule RNASeq:\n    snakefile:\n        \"..\/RNASeq\/Snakefile-RNASeq\"\n\n## This instruction should come right after the module\nuse rule * from RNASeq as RNASeq_*\n\nmodule ChIPSeq:\n    snakefile:\n        \"..\/ChIPSeq\/Snakefile-ChIPSeq\"\n\n## This instruction should come right after the module\nuse rule * from ChIPSeq as ChIPSeq_*<\/code><\/pre>\n\n\n\n<p>Let&#8217;s add the <code>rule all<\/code> (you can now copy the content to <code>Snakefile-MultiOmics<\/code>):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from snakemake.utils import min_version\nmin_version(\"6.0.0\")\n\nonstart:\n    print(\"##### ChIP-seq and RNA-seq workflows #####\\n\") \n    print(\"\\t Reading samples and metadata....\\n\")\n\nmodule RNASeq:\n    snakefile:\n        \"..\/RNASeq\/Snakefile-RNASeq\"\n\n## This instruction should come right after the module\nuse rule * from RNASeq as RNASeq_*\n\nmodule ChIPSeq:\n    snakefile:\n        \"..\/ChIPSeq\/Snakefile-ChIPSeq\"\n\n## This instruction should come right after the module\nuse rule * from ChIPSeq as ChIPSeq_*\n\nrule all:\n    input:\n        rules.RNASeq_all.input,\n        rules.ChIPSeq_all.input<\/code><\/pre>\n\n\n\n<p>Now try a dry-run:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\nsnakemake --snakefile Snakefile-MultiOmics --cores 1 -n\n<\/code><\/pre>\n\n\n\n<p>You should get:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Building DAG of jobs...\nMissingInputException in line 1 of \/g\/romebioinfo\/tmp\/demo-subworkflows\/RNASeq\/external_rules\/mockrule.smk:\nMissing input files for rule RNASeq_mockFeatureCounts:\n    output: results\/featureCounts\/exp1.txt\n    wildcards: sampleName=exp1\n    affected files:\n        inputs\/exp1.fastq\n<\/code><\/pre>\n\n\n\n<p>Snakemake indicates that it cannot find the file <code>inputs\/exp1.fastq<\/code> that corresponds to the input of the rule <code>RNASeq\/external_rules\/mockrule.smk<\/code>. Indeed, since we are currently in the <code>Multiomics<\/code> folder, the path is not correct. In order for snakemake to find the correct path, we must use the directive prefix. The string given to prefix will be added to every file path built by the snakemake process. Modify <code>Snakefile-MultiOmics<\/code> as follows:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from snakemake.utils import min_version\nmin_version(\"6.0.0\")\n\nonstart:\n    print(\"##### ChIP-seq and RNA-seq workflows #####\\n\") \n    print(\"\\t Reading samples and metadata....\\n\")\n\nmodule RNASeq:\n    prefix: \"..\/RNASeq\"\n    snakefile:\n        \"..\/RNASeq\/Snakefile-RNASeq\"\n\n## This instruction should come right after the module\nuse rule * from RNASeq as RNASeq_*\n\nmodule ChIPSeq:\n    prefix: \"..\/ChIPSeq\"\n    snakefile:\n        \"..\/ChIPSeq\/Snakefile-ChIPSeq\"\n\n## This instruction should come right after the module\nuse rule * from ChIPSeq as ChIPSeq_*\n\n\nrule all:\n    input:\n        rules.RNASeq_all.input,\n        rules.ChIPSeq_all.input<\/code><\/pre>\n\n\n\n<p>Try again a dry-run:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n snakemake --snakefile Snakefile-MultiOmics --cores 1 -n\n<\/code><\/pre>\n\n\n\n<p>You should get:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Building DAG of jobs...\nNothing to be done (all requested files are present and up to date).<\/code><\/pre>\n\n\n\n<p>The call to snakemake is now working. But since we already run the subworkflows <code>ChIPSeq<\/code> and <code>RNASeq<\/code> nothing has to be done. Delete the <code>jobs<\/code> and <code>results<\/code>folders of each subworkflows and perform a dry-run:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\nrm -r ..\/RNASeq\/jobs\nrm -r ..\/RNASeq\/results\nrm -r ..\/ChIPSeq\/jobs\nrm -r ..\/ChIPSeq\/results\n\nsnakemake --snakefile Snakefile-MultiOmics --cores 1 -n\n<\/code><\/pre>\n\n\n\n<p>You should get:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Building DAG of jobs...\nJob stats:\njob                                count    min threads    max threads\n-------------------------------  -------  -------------  -------------\nRNASeq_all                             1              1              1\nRNASeq_mockDifferentialAnalysis        2              1              1\nRNASeq_mockFeatureCounts               2              1              1\ntotal                                  5              1              1\n\n\n&#91;Wed Jul 13 18:19:50 2022]\nrule RNASeq_mockFeatureCounts:\n    input: ..\/RNASeq\/inputs\/exp2.fastq\n    output: ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    jobid: 2\n    reason: Missing output files: ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    wildcards: sampleName=exp2\n    resources: tmpdir=\/tmp\n\n\n&#91;Wed Jul 13 18:19:50 2022]\nrule RNASeq_mockFeatureCounts:\n    input: ..\/RNASeq\/inputs\/exp1.fastq\n    output: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    jobid: 1\n    reason: Missing output files: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    wildcards: sampleName=exp1\n    resources: tmpdir=\/tmp\n\n\n&#91;Wed Jul 13 18:19:50 2022]\nrule RNASeq_mockDifferentialAnalysis:\n    input: ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    output: ..\/RNASeq\/results\/differentialAnalysis\/exp2.csv\n    jobid: 4\n    reason: Missing output files: ..\/RNASeq\/results\/differentialAnalysis\/exp2.csv; Input files updated by another job: ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    wildcards: sampleName=exp2\n    resources: tmpdir=\/tmp\n\n&#91;Wed Jul 13 18:19:50 2022]\nrule RNASeq_mockDifferentialAnalysis:\n    input: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    output: ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv\n    jobid: 3\n    reason: Missing output files: ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv; Input files updated by another job: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    wildcards: sampleName=exp1\n    resources: tmpdir=\/tmp\n\n&#91;Wed Jul 13 18:19:50 2022]\nlocalrule RNASeq_all:\n    input: ..\/RNASeq\/results\/featureCounts\/exp1.txt, ..\/RNASeq\/results\/featureCounts\/exp2.txt, ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv, ..\/RNASeq\/results\/differentialAnalysis\/exp2.csv\n    jobid: 0\n    reason: Input files updated by another job: ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv, ..\/RNASeq\/results\/differentialAnalysis\/exp2.csv, ..\/RNASeq\/results\/featureCounts\/exp2.txt, ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    resources: tmpdir=\/tmp\n\nJob stats:\njob                                count    min threads    max threads\n-------------------------------  -------  -------------  -------------\nRNASeq_all                             1              1              1\nRNASeq_mockDifferentialAnalysis        2              1              1\nRNASeq_mockFeatureCounts               2              1              1\ntotal                                  5              1              1\n\n\nThis was a dry-run (flag -n). The order of jobs does not reflect the order of execution.<\/code><\/pre>\n\n\n\n<p>You see that only the RNAseq subworkflow was run. To collect both the targets of <code>RNAseq<\/code> and <code>ChIPSeq<\/code>, we need to add the instruction <code>default_target: True<\/code> to <code>rule all<\/code>. Modify <code>Snakefile-MultiOmics<\/code> as follows:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from snakemake.utils import min_version\nmin_version(\"6.0.0\")\n\nonstart:\n    print(\"##### ChIP-seq and RNA-seq workflows #####\\n\") \n    print(\"\\t Reading samples and metadata....\\n\")\n\nmodule RNASeq:\n    prefix: \"..\/RNASeq\"\n    snakefile:\n        \"..\/RNASeq\/Snakefile-RNASeq\"\n\n## This instruction should come right after the module\nuse rule * from RNASeq as RNASeq_*\n\nmodule ChIPSeq:\n    prefix: \"..\/ChIPSeq\"\n    snakefile:\n        \"..\/ChIPSeq\/Snakefile-ChIPSeq\"\n\n## This instruction should come right after the module\nuse rule * from ChIPSeq as ChIPSeq_*\n\n\nrule all:\n    input:\n        rules.RNASeq_all.input,\n        rules.ChIPSeq_all.input\n    default_target: True<\/code><\/pre>\n\n\n\n<p>Perform a real run:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n snakemake --snakefile Snakefile-MultiOmics --cores 1\n<\/code><\/pre>\n\n\n\n<p>You should obtain:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Building DAG of jobs...\nUsing shell: \/usr\/bin\/bash\nProvided cores: 1 (use --cores to define parallelism)\nRules claiming more threads will be scaled down.\nJob stats:\njob                                count    min threads    max threads\n-------------------------------  -------  -------------  -------------\nChIPSeq_mockAlignment                  2              1              1\nChIPSeq_mockpeakDetection              2              1              1\nRNASeq_mockDifferentialAnalysis        2              1              1\nRNASeq_mockFeatureCounts               2              1              1\nall                                    1              1              1\ntotal                                  9              1              1\n\n##### ChIP-seq and RNA-seq workflows #####\n\n         Reading samples and metadata....\n\nSelect jobs to execute...\n\n&#91;Wed Jul 13 18:22:54 2022]\nrule RNASeq_mockFeatureCounts:\n    input: ..\/RNASeq\/inputs\/exp1.fastq\n    output: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    jobid: 1\n    reason: Missing output files: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    wildcards: sampleName=exp1\n    resources: tmpdir=\/tmp\n\n&#91;Wed Jul 13 18:23:04 2022]\nFinished job 1.\n1 of 9 steps (11%) done\nSelect jobs to execute...\n\n&#91;Wed Jul 13 18:23:04 2022]\nrule RNASeq_mockDifferentialAnalysis:\n    input: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    output: ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv\n    jobid: 3\n    reason: Missing output files: ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv; Input files updated by another job: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    wildcards: sampleName=exp1\n    resources: tmpdir=\/tmp\n\n&#91;Wed Jul 13 18:23:04 2022]\nError in rule RNASeq_mockDifferentialAnalysis:\n    jobid: 3\n    output: ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv\n\nRuleException:\nNameError in line 8 of \/g\/romebioinfo\/tmp\/demo-subworkflows\/RNASeq\/external_rules\/callscript.smk:\nname 'config' is not defined\n  File \"\/g\/romebioinfo\/tmp\/demo-subworkflows\/RNASeq\/external_rules\/callscript.smk\", line 8, in __rule_mockDifferentialAnalysis\n  File \"\/g\/romebioinfo\/tmp\/demo-subworkflows\/envs\/smake\/lib\/python3.9\/concurrent\/futures\/thread.py\", line 58, in run\nShutting down, this might take some time.\nExiting because a job execution failed. Look above for error message\nComplete log: .snakemake\/log\/2022-07-13T182254.046552.snakemake.log<\/code><\/pre>\n\n\n\n<p>Notice the error:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>RuleException:\nNameError in line 8 of \/g\/romebioinfo\/tmp\/demo-subworkflows\/RNASeq\/external_rules\/callscript.smk:\nname 'config' is not defined\n  File \"\/g\/romebioinfo\/tmp\/demo-subworkflows\/RNASeq\/external_rules\/callscript.smk\", line 8, in __rule_mockDifferentialAnalysis\n  File \"\/g\/romebioinfo\/tmp\/demo-subworkflows\/envs\/smake\/lib\/python3.9\/concurrent\/futures\/thread.py\", line 58, in run<\/code><\/pre>\n\n\n\n<p>For some reasons, when calling a script, snakemake is looking for a config file even if it is not used in the subworkflows per se. For this reason, create a <code>config.yaml<\/code> in the current folder with the following empty fields:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>RNASeq:\n    summaryFile: \"\"\n\nChIPSeq:\n    summaryFile: \"\"<\/code><\/pre>\n\n\n\n<p>Now add the <code>config<\/code> instruction to <code>Snakefile-MultiOmics<\/code> as follows:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from snakemake.utils import min_version\nmin_version(\"6.0.0\")\n\nonstart:\n    print(\"##### ChIP-seq and RNA-seq workflows #####\\n\") \n    print(\"\\t Reading samples and metadata....\\n\")\n\nmodule RNASeq:\n    prefix: \"..\/RNASeq\"\n    snakefile:\n        \"..\/RNASeq\/Snakefile-RNASeq\"\n    config:\n        config&#91;\"RNASeq\"]\n\n## This instruction should come right after the module\nuse rule * from RNASeq as RNASeq_*\n\nmodule ChIPSeq:\n    prefix: \"..\/ChIPSeq\"\n    snakefile:\n        \"..\/ChIPSeq\/Snakefile-ChIPSeq\"\n    config:\n        config&#91;\"ChIPSeq\"]\n\n## This instruction should come right after the module\nuse rule * from ChIPSeq as ChIPSeq_*\n\n\nrule all:\n    input:\n        rules.RNASeq_all.input,\n        rules.ChIPSeq_all.input\n    default_target: True\n<\/code><\/pre>\n\n\n\n<p>Perform a real run giving the configuration file as a parameter:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n snakemake --snakefile Snakefile-MultiOmics --cores 1 --configfile config.yaml\n<\/code><\/pre>\n\n\n\n<p>You should get:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Building DAG of jobs...\nUsing shell: \/usr\/bin\/bash\nProvided cores: 1 (use --cores to define parallelism)\nRules claiming more threads will be scaled down.\nJob stats:\njob                                count    min threads    max threads\n-------------------------------  -------  -------------  -------------\nChIPSeq_mockAlignment                  2              1              1\nChIPSeq_mockpeakDetection              2              1              1\nRNASeq_mockDifferentialAnalysis        2              1              1\nRNASeq_mockFeatureCounts               1              1              1\nall                                    1              1              1\ntotal                                  8              1              1\n\n##### ChIP-seq and RNA-seq workflows #####\n\n         Reading samples and metadata....\n\nSelect jobs to execute...\n\n&#91;Wed Jul 13 20:11:17 2022]\nrule RNASeq_mockDifferentialAnalysis:\n    input: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    output: ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv\n    jobid: 3\n    reason: Missing output files: ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv\n    wildcards: sampleName=exp1\n    resources: tmpdir=\/tmp\n\nWriting to ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv\n&#91;Wed Jul 13 20:11:22 2022]\nFinished job 3.\n1 of 8 steps (12%) done\nSelect jobs to execute...\n\n&#91;Wed Jul 13 20:11:22 2022]\nrule RNASeq_mockFeatureCounts:\n    input: ..\/RNASeq\/inputs\/exp2.fastq\n    output: ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    jobid: 2\n    reason: Missing output files: ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    wildcards: sampleName=exp2\n    resources: tmpdir=\/tmp\n\n&#91;Wed Jul 13 20:11:32 2022]\nFinished job 2.\n2 of 8 steps (25%) done\nSelect jobs to execute...\n\n&#91;Wed Jul 13 20:11:33 2022]\nrule ChIPSeq_mockAlignment:\n    input: ..\/ChIPSeq\/inputs\/exp2.fastq\n    output: ..\/ChIPSeq\/results\/alignment\/exp2.bam\n    jobid: 6\n    reason: Missing output files: ..\/ChIPSeq\/results\/alignment\/exp2.bam\n    wildcards: sampleName=exp2\n    resources: tmpdir=\/tmp\n\n&#91;Wed Jul 13 20:11:43 2022]\nFinished job 6.\n3 of 8 steps (38%) done\nSelect jobs to execute...\n\n&#91;Wed Jul 13 20:11:43 2022]\nrule ChIPSeq_mockpeakDetection:\n    input: ..\/ChIPSeq\/results\/alignment\/exp2.bam\n    output: ..\/ChIPSeq\/results\/peaks\/exp2.peaks\n    jobid: 8\n    reason: Missing output files: ..\/ChIPSeq\/results\/peaks\/exp2.peaks; Input files updated by another job: ..\/ChIPSeq\/results\/alignment\/exp2.bam\n    wildcards: sampleName=exp2\n    resources: tmpdir=\/tmp\n\nWriting to ..\/ChIPSeq\/results\/peaks\/exp2.peaks\n&#91;Wed Jul 13 20:11:46 2022]\nFinished job 8.\n4 of 8 steps (50%) done\nSelect jobs to execute...\n\n&#91;Wed Jul 13 20:11:46 2022]\nrule RNASeq_mockDifferentialAnalysis:\n    input: ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    output: ..\/RNASeq\/results\/differentialAnalysis\/exp2.csv\n    jobid: 4\n    reason: Missing output files: ..\/RNASeq\/results\/differentialAnalysis\/exp2.csv; Input files updated by another job: ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    wildcards: sampleName=exp2\n    resources: tmpdir=\/tmp\n\nWriting to ..\/RNASeq\/results\/differentialAnalysis\/exp2.csv\n&#91;Wed Jul 13 20:11:48 2022]\nFinished job 4.\n5 of 8 steps (62%) done\nSelect jobs to execute...\n\n&#91;Wed Jul 13 20:11:48 2022]\nrule ChIPSeq_mockAlignment:\n    input: ..\/ChIPSeq\/inputs\/exp1.fastq\n    output: ..\/ChIPSeq\/results\/alignment\/exp1.bam\n    jobid: 5\n    reason: Missing output files: ..\/ChIPSeq\/results\/alignment\/exp1.bam\n    wildcards: sampleName=exp1\n    resources: tmpdir=\/tmp\n\n&#91;Wed Jul 13 20:11:59 2022]\nFinished job 5.\n6 of 8 steps (75%) done\nSelect jobs to execute...\n\n&#91;Wed Jul 13 20:11:59 2022]\nrule ChIPSeq_mockpeakDetection:\n    input: ..\/ChIPSeq\/results\/alignment\/exp1.bam\n    output: ..\/ChIPSeq\/results\/peaks\/exp1.peaks\n    jobid: 7\n    reason: Missing output files: ..\/ChIPSeq\/results\/peaks\/exp1.peaks; Input files updated by another job: ..\/ChIPSeq\/results\/alignment\/exp1.bam\n    wildcards: sampleName=exp1\n    resources: tmpdir=\/tmp\n\nWriting to ..\/ChIPSeq\/results\/peaks\/exp1.peaks\n&#91;Wed Jul 13 20:12:01 2022]\nFinished job 7.\n7 of 8 steps (88%) done\nSelect jobs to execute...\n\n&#91;Wed Jul 13 20:12:01 2022]\nlocalrule all:\n    input: ..\/RNASeq\/results\/featureCounts\/exp1.txt, ..\/RNASeq\/results\/featureCounts\/exp2.txt, ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv, ..\/RNASeq\/results\/differentialAnalysis\/exp2.csv, ..\/ChIPSeq\/results\/alignment\/exp1.bam, ..\/ChIPSeq\/results\/alignment\/exp2.bam, ..\/ChIPSeq\/results\/peaks\/exp1.peaks, ..\/ChIPSeq\/results\/peaks\/exp2.peaks\n    jobid: 0\n    reason: Input files updated by another job: ..\/ChIPSeq\/results\/alignment\/exp1.bam, ..\/RNASeq\/results\/differentialAnalysis\/exp2.csv, ..\/ChIPSeq\/results\/alignment\/exp2.bam, ..\/ChIPSeq\/results\/peaks\/exp1.peaks, ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv, ..\/ChIPSeq\/results\/peaks\/exp2.peaks, ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    resources: tmpdir=\/tmp\n\n&#91;Wed Jul 13 20:12:01 2022]\nFinished job 0.\n8 of 8 steps (100%) done\nComplete log: .snakemake\/log\/2022-07-13T201116.342815.snakemake.log\n<\/code><\/pre>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Running MultiOmics on a HPC<\/h3>\n\n\n\n<p>Let&#8217;s start by deleting the previously created results to be able to run the workflow again:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\nrm -r ..\/RNASeq\/results\nrm -r ..\/ChIPSeq\/results\n<\/code><\/pre>\n\n\n\n<p>As we did for the other workflows, copy the following content to <code>profileMultiOmics\/config.yaml<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>---\n\nsnakefile: Snakefile-MultiOmics\n\nlatency-wait: 60\nreason: True\nshow-failed-logs: True\nkeep-going: True\nprintshellcmds: True\nconfigfile: \"config.yaml\"\n\n# Cluster submission\njobname: \"{rule}.{jobid}\"              # Provide a custom name for the jobscript that is submitted to the cluster.\nmax-jobs-per-second: 1                 #Maximal number of cluster\/drmaa jobs per second, default is 10, fractions allowed.\nmax-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10\njobs: 400                              #Use at most N CPU cluster\/cloud jobs in parallel.\ncluster: \"sbatch --output=\\\"jobs\/{rule}\/slurm_%x_%j.out\\\" --error=\\\"jobs\/{rule}\/slurm_%x_%j.log\\\" --mem={resources.mem_mb} --time={resources.runtime} --parsable\"\ncluster-status: \".\/profileMultiOmics\/status-sacct.sh\" #  Use to handle timeout exception, do not forget to chmod +x\n<\/code><\/pre>\n\n\n\n<p>Exactly as before, add <code>profileMultiOmics\/status-sacct.sh<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/env bash\n\n# Check status of Slurm job\n\njobid=\"$1\"\n\nif &#91;&#91; \"$jobid\" == Submitted ]]\nthen\n  echo smk-simple-slurm: Invalid job ID: \"$jobid\" &gt;&amp;2\n  echo smk-simple-slurm: Did you remember to add the flag --parsable to your sbatch call? &gt;&amp;2\n  exit 1\nfi\n\noutput=`sacct -j \"$jobid\" --format State --noheader | head -n 1 | awk '{print $1}'`\n\nif &#91;&#91; $output =~ ^(COMPLETED).* ]]\nthen\n  echo success\nelif &#91;&#91; $output =~ ^(RUNNING|PENDING|COMPLETING|CONFIGURING|SUSPENDED).* ]]\nthen\n  echo running\nelse\n  echo failed\nfi\n<\/code><\/pre>\n\n\n\n<p>Make it executable:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n\nchmod +x profileMultiOmics\/status-sacct.sh\n<\/code><\/pre>\n\n\n\n<p>Perform a real run:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n snakemake --profile profileMultiOmics\/\n<\/code><\/pre>\n\n\n\n<p>You should get the error:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>WorkflowError:\nError when formatting 'sbatch --output=\"jobs\/{rule}\/slurm_%x_%j.out\" --error=\"jobs\/{rule}\/slurm_%x_%j.log\" --mem={resources.mem_mb} --time={resources.runtime} --parsable' for rule ChIPSeq_mockAlignment. 'Resources' object has no attribute 'runtime'\n<\/code><\/pre>\n\n\n\n<p>It cannot find the <code>runtime<\/code> required for the rule <code>ChIPSeq_mockAlignment<\/code>. Indeed we did not define the resources used for each job as we did in the previous parts. Attention: The prefixes must be added to the job&#8217;s names. Copy the resources definition of each subworkflow to profileMultiOmics\/config.yaml. Take care of adding the corresponding prefixes:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>---\n\nsnakefile: Snakefile-MultiOmics\n\nlatency-wait: 60\nreason: True\nshow-failed-logs: True\nkeep-going: True\nprintshellcmds: True\nconfigfile: \"config.yaml\"\n\n# Cluster submission\njobname: \"{rule}.{jobid}\"              # Provide a custom name for the jobscript that is submitted to the cluster.\nmax-jobs-per-second: 1                 #Maximal number of cluster\/drmaa jobs per second, default is 10, fractions allowed.\nmax-status-checks-per-second: 10       #Maximal number of job status checks per second, default is 10\njobs: 400                              #Use at most N CPU cluster\/cloud jobs in parallel.\ncluster: \"sbatch --output=\\\"jobs\/{rule}\/slurm_%x_%j.out\\\" --error=\\\"jobs\/{rule}\/slurm_%x_%j.log\\\" --mem={resources.mem_mb} --time={resources.runtime} --parsable\"\ncluster-status: \".\/profileMultiOmics\/status-sacct.sh\" #  Use to handle timeout exception, do not forget to chmod +x\n\n# Job resources\nset-resources:\n  - RNASeq_mockFeatureCounts:mem_mb=1000\n  - RNASeq_mockFeatureCounts:runtime=00:03:00\n  - RNASeq_mockDifferentialAnalysis:mem_mb=1000\n  - RNASeq_mockDifferentialAnalysis:runtime=00:03:00\n  - ChIPSeq_mockAlignment:mem_mb=1000\n  - ChIPSeq_mockAlignment:runtime=00:03:00\n  - ChIPSeq_mockpeakDetection:mem_mb=1000\n  - ChIPSeq_mockpeakDetection:runtime=00:03:00\n\n# For some reasons time needs quotes to be read by snakemake\ndefault-resources:\n  - mem_mb=500\n  - runtime=\"00:01:00\"\n\n# Define the number of threads used by rules\nset-threads:\n  - RNASeq_mockFeatureCounts=1\n  - RNASeq_mockDifferentialAnalysis=1\n  - ChIPSeq_mockAlignment=1\n  - ChIPSeq_mockpeakDetection=1\n<\/code><\/pre>\n\n\n\n<p>Perform a real run:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n snakemake --profile profileMultiOmics\/\n<\/code><\/pre>\n\n\n\n<p>You should get:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Building DAG of jobs...\nUsing shell: \/usr\/bin\/bash\nProvided cluster nodes: 400\nJob stats:\njob                                count    min threads    max threads\n-------------------------------  -------  -------------  -------------\nChIPSeq_mockAlignment                  2              1              1\nChIPSeq_mockpeakDetection              2              1              1\nRNASeq_mockDifferentialAnalysis        2              1              1\nRNASeq_mockFeatureCounts               2              1              1\nall                                    1              1              1\ntotal                                  9              1              1\n\n##### ChIP-seq and RNA-seq workflows #####\n\n         Reading samples and metadata....\n\nSelect jobs to execute...\n\n&#91;Wed Jul 13 20:45:22 2022]\nrule ChIPSeq_mockAlignment:\n    input: ..\/ChIPSeq\/inputs\/exp1.fastq\n    output: ..\/ChIPSeq\/results\/alignment\/exp1.bam\n    jobid: 5\n    reason: Missing output files: ..\/ChIPSeq\/results\/alignment\/exp1.bam\n    wildcards: sampleName=exp1\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat ..\/ChIPSeq\/inputs\/exp1.fastq &gt; ..\/ChIPSeq\/results\/alignment\/exp1.bam\n    sleep 10s\n\nSubmitted job 5 with external jobid '44401149'.\n\n&#91;Wed Jul 13 20:45:22 2022]\nrule ChIPSeq_mockAlignment:\n    input: ..\/ChIPSeq\/inputs\/exp2.fastq\n    output: ..\/ChIPSeq\/results\/alignment\/exp2.bam\n    jobid: 6\n    reason: Missing output files: ..\/ChIPSeq\/results\/alignment\/exp2.bam\n    wildcards: sampleName=exp2\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat ..\/ChIPSeq\/inputs\/exp2.fastq &gt; ..\/ChIPSeq\/results\/alignment\/exp2.bam\n    sleep 10s\n\nSubmitted job 6 with external jobid '44401150'.\n\n&#91;Wed Jul 13 20:45:22 2022]\nrule RNASeq_mockFeatureCounts:\n    input: ..\/RNASeq\/inputs\/exp2.fastq\n    output: ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    jobid: 2\n    reason: Missing output files: ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    wildcards: sampleName=exp2\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat ..\/RNASeq\/inputs\/exp2.fastq &gt; ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    sleep 10s\n\nSubmitted job 2 with external jobid '44401151'.\n\n&#91;Wed Jul 13 20:45:22 2022]\nrule RNASeq_mockFeatureCounts:\n    input: ..\/RNASeq\/inputs\/exp1.fastq\n    output: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    jobid: 1\n    reason: Missing output files: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    wildcards: sampleName=exp1\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat ..\/RNASeq\/inputs\/exp1.fastq &gt; ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    sleep 10s\n\nSubmitted job 1 with external jobid '44401152'.\n&#91;Wed Jul 13 20:46:24 2022]\nError in rule ChIPSeq_mockAlignment:\n    jobid: 5\n    output: ..\/ChIPSeq\/results\/alignment\/exp1.bam\n    shell:\n\n    cat ..\/ChIPSeq\/inputs\/exp1.fastq &gt; ..\/ChIPSeq\/results\/alignment\/exp1.bam\n    sleep 10s\n\n        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)\n    cluster_jobid: 44401149\n\nError executing rule ChIPSeq_mockAlignment on cluster (jobid: 5, external: 44401149, jobscript: \/g\/romebioinfo\/tmp\/demo-subworkflows\/MultiOmics\/.snakemake\/tmp.lvuuylhl\/ChIPSeq_mockAlignment.5). For error details see the cluster log and the log files of the involved rule(s).\n&#91;Wed Jul 13 20:46:24 2022]\nError in rule ChIPSeq_mockAlignment:\n    jobid: 6\n    output: ..\/ChIPSeq\/results\/alignment\/exp2.bam\n    shell:\n\n    cat ..\/ChIPSeq\/inputs\/exp2.fastq &gt; ..\/ChIPSeq\/results\/alignment\/exp2.bam\n    sleep 10s\n\n        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)\n    cluster_jobid: 44401150\n\nError executing rule ChIPSeq_mockAlignment on cluster (jobid: 6, external: 44401150, jobscript: \/g\/romebioinfo\/tmp\/demo-subworkflows\/MultiOmics\/.snakemake\/tmp.lvuuylhl\/ChIPSeq_mockAlignment.6). For error details see the cluster log and the log files of the involved rule(s).\n&#91;Wed Jul 13 20:46:24 2022]\nError in rule RNASeq_mockFeatureCounts:\n    jobid: 2\n    output: ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    shell:\n\n    cat ..\/RNASeq\/inputs\/exp2.fastq &gt; ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    sleep 10s\n\n        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)\n    cluster_jobid: 44401151\n\nError executing rule RNASeq_mockFeatureCounts on cluster (jobid: 2, external: 44401151, jobscript: \/g\/romebioinfo\/tmp\/demo-subworkflows\/MultiOmics\/.snakemake\/tmp.lvuuylhl\/RNASeq_mockFeatureCounts.2). For error details see the cluster log and the log files of the involved rule(s).\n&#91;Wed Jul 13 20:46:24 2022]\nError in rule RNASeq_mockFeatureCounts:\n    jobid: 1\n    output: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    shell:\n\n    cat ..\/RNASeq\/inputs\/exp1.fastq &gt; ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    sleep 10s\n\n        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)\n    cluster_jobid: 44401152\n\nError executing rule RNASeq_mockFeatureCounts on cluster (jobid: 1, external: 44401152, jobscript: \/g\/romebioinfo\/tmp\/demo-subworkflows\/MultiOmics\/.snakemake\/tmp.lvuuylhl\/RNASeq_mockFeatureCounts.1). For error details see the cluster log and the log files of the involved rule(s).\nExiting because a job execution failed. Look above for error message\nComplete log: .snakemake\/log\/2022-07-13T204521.378559.snakemake.log\n<\/code><\/pre>\n\n\n\n<p>You can see several errors thrown by snakemake. It indicates checking the log files or the cluster logs. However, you will notice that the <code>jobs<\/code> folder was not created. This is causing snakemake to not be able to write log files and causes it to fail. Delete the <code>on start:<\/code> sections of <code>..\/RNASeq\/snakemake-RNASeq<\/code> and <code>..\/ChIPSeq\/snakemake-ChIPSeq<\/code>. Add the creation of the relevant folders to the <code>on start:<\/code> section in <code>Snakefile-MultiOmics<\/code> as follows. <strong>Note that you must add the prefixes to the output folders<\/strong>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from snakemake.utils import min_version\nmin_version(\"6.0.0\")\n\nonstart:\n    print(\"##### ChIP-seq and RNA-seq workflows #####\\n\") \n    print(\"\\t Reading samples and metadata....\\n\")\n    shell(\"mkdir -p jobs\/ChIPSeq_mockAlignment\")\n    shell(\"mkdir -p jobs\/ChIPSeq_mockpeakDetection\")\n    shell(\"mkdir -p jobs\/RNASeq_mockFeatureCounts\")\n    shell(\"mkdir -p jobs\/RNASeq_mockDifferentialAnalysis\")\n\nmodule RNASeq:\n    prefix: \"..\/RNASeq\"\n    snakefile:\n        \"..\/RNASeq\/Snakefile-RNASeq\"\n    config:\n        config&#91;\"RNASeq\"]\n\n## This instruction should come right after the module\nuse rule * from RNASeq as RNASeq_*\n\nmodule ChIPSeq:\n    prefix: \"..\/ChIPSeq\"\n    snakefile:\n        \"..\/ChIPSeq\/Snakefile-ChIPSeq\"\n    config:\n        config&#91;\"ChIPSeq\"]\n\n## This instruction should come right after the module\nuse rule * from ChIPSeq as ChIPSeq_*\n\n\nrule all:\n    input:\n        rules.RNASeq_all.input,\n        rules.ChIPSeq_all.input\n    default_target: True\n<\/code><\/pre>\n\n\n\n<p>Perform a real run:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/bash\n snakemake --profile profileMultiOmics\/\n<\/code><\/pre>\n\n\n\n<p>The pipeline should now terminate without errors. You should get:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Building DAG of jobs...\nUsing shell: \/usr\/bin\/bash\nProvided cluster nodes: 400\nJob stats:\njob                                count    min threads    max threads\n-------------------------------  -------  -------------  -------------\nChIPSeq_mockAlignment                  2              1              1\nChIPSeq_mockpeakDetection              2              1              1\nRNASeq_mockDifferentialAnalysis        2              1              1\nRNASeq_mockFeatureCounts               2              1              1\nall                                    1              1              1\ntotal                                  9              1              1\n\n##### ChIP-seq and RNA-seq workflows #####\n\n         Reading samples and metadata....\n\nmkdir -p jobs\/ChIPSeq_mockAlignment\nmkdir -p jobs\/ChIPSeq_mockpeakDetection\nmkdir -p jobs\/RNASeq_mockFeatureCounts\nmkdir -p jobs\/RNASeq_mockDifferentialAnalysis\nSelect jobs to execute...\n\n&#91;Wed Jul 13 21:02:50 2022]\nrule ChIPSeq_mockAlignment:\n    input: ..\/ChIPSeq\/inputs\/exp1.fastq\n    output: ..\/ChIPSeq\/results\/alignment\/exp1.bam\n    jobid: 5\n    reason: Missing output files: ..\/ChIPSeq\/results\/alignment\/exp1.bam\n    wildcards: sampleName=exp1\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat ..\/ChIPSeq\/inputs\/exp1.fastq &gt; ..\/ChIPSeq\/results\/alignment\/exp1.bam\n    sleep 10s\n\nSubmitted job 5 with external jobid '44401764'.\n\n&#91;Wed Jul 13 21:02:51 2022]\nrule ChIPSeq_mockAlignment:\n    input: ..\/ChIPSeq\/inputs\/exp2.fastq\n    output: ..\/ChIPSeq\/results\/alignment\/exp2.bam\n    jobid: 6\n    reason: Missing output files: ..\/ChIPSeq\/results\/alignment\/exp2.bam\n    wildcards: sampleName=exp2\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat ..\/ChIPSeq\/inputs\/exp2.fastq &gt; ..\/ChIPSeq\/results\/alignment\/exp2.bam\n    sleep 10s\n\nSubmitted job 6 with external jobid '44401766'.\n\n&#91;Wed Jul 13 21:02:51 2022]\nrule RNASeq_mockFeatureCounts:\n    input: ..\/RNASeq\/inputs\/exp2.fastq\n    output: ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    jobid: 2\n    reason: Missing output files: ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    wildcards: sampleName=exp2\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat ..\/RNASeq\/inputs\/exp2.fastq &gt; ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    sleep 10s\n\nSubmitted job 2 with external jobid '44401767'.\n\n&#91;Wed Jul 13 21:02:51 2022]\nrule RNASeq_mockFeatureCounts:\n    input: ..\/RNASeq\/inputs\/exp1.fastq\n    output: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    jobid: 1\n    reason: Missing output files: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    wildcards: sampleName=exp1\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\n\n    cat ..\/RNASeq\/inputs\/exp1.fastq &gt; ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    sleep 10s\n\nSubmitted job 1 with external jobid '44401768'.\n&#91;Wed Jul 13 21:03:41 2022]\nFinished job 5.\n1 of 9 steps (11%) done\nSelect jobs to execute...\n\n&#91;Wed Jul 13 21:03:41 2022]\nrule ChIPSeq_mockpeakDetection:\n    input: ..\/ChIPSeq\/results\/alignment\/exp1.bam\n    output: ..\/ChIPSeq\/results\/peaks\/exp1.peaks\n    jobid: 7\n    reason: Missing output files: ..\/ChIPSeq\/results\/peaks\/exp1.peaks; Input files updated by another job: ..\/ChIPSeq\/results\/alignment\/exp1.bam\n    wildcards: sampleName=exp1\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\nSubmitted job 7 with external jobid '44401828'.\n&#91;Wed Jul 13 21:03:42 2022]\nFinished job 6.\n2 of 9 steps (22%) done\nWaiting at most 60 seconds for missing files.\n&#91;Wed Jul 13 21:03:51 2022]\nFinished job 2.\n3 of 9 steps (33%) done\n&#91;Wed Jul 13 21:03:51 2022]\nFinished job 1.\n4 of 9 steps (44%) done\nSelect jobs to execute...\n\n&#91;Wed Jul 13 21:03:52 2022]\nrule RNASeq_mockDifferentialAnalysis:\n    input: ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    output: ..\/RNASeq\/results\/differentialAnalysis\/exp2.csv\n    jobid: 4\n    reason: Missing output files: ..\/RNASeq\/results\/differentialAnalysis\/exp2.csv; Input files updated by another job: ..\/RNASeq\/results\/featureCounts\/exp2.txt\n    wildcards: sampleName=exp2\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\nSubmitted job 4 with external jobid '44401835'.\n\n&#91;Wed Jul 13 21:03:53 2022]\nrule ChIPSeq_mockpeakDetection:\n    input: ..\/ChIPSeq\/results\/alignment\/exp2.bam\n    output: ..\/ChIPSeq\/results\/peaks\/exp2.peaks\n    jobid: 8\n    reason: Missing output files: ..\/ChIPSeq\/results\/peaks\/exp2.peaks; Input files updated by another job: ..\/ChIPSeq\/results\/alignment\/exp2.bam\n    wildcards: sampleName=exp2\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\nSubmitted job 8 with external jobid '44401837'.\n\n&#91;Wed Jul 13 21:03:53 2022]\nrule RNASeq_mockDifferentialAnalysis:\n    input: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    output: ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv\n    jobid: 3\n    reason: Missing output files: ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv; Input files updated by another job: ..\/RNASeq\/results\/featureCounts\/exp1.txt\n    wildcards: sampleName=exp1\n    resources: mem_mb=1000, disk_mb=1000, tmpdir=\/tmp, runtime=00:03:00\n\nSubmitted job 3 with external jobid '44401838'.\n&#91;Wed Jul 13 21:04:34 2022]\nFinished job 3.\n5 of 9 steps (56%) done\n&#91;Wed Jul 13 21:04:44 2022]\nFinished job 7.\n6 of 9 steps (67%) done\n&#91;Wed Jul 13 21:04:44 2022]\nFinished job 4.\n7 of 9 steps (78%) done\n&#91;Wed Jul 13 21:04:44 2022]\nFinished job 8.\n8 of 9 steps (89%) done\nSelect jobs to execute...\n\n&#91;Wed Jul 13 21:04:44 2022]\nlocalrule all:\n    input: ..\/RNASeq\/results\/featureCounts\/exp1.txt, ..\/RNASeq\/results\/featureCounts\/exp2.txt, ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv, ..\/RNASeq\/results\/differentialAnalysis\/exp2.csv, ..\/ChIPSeq\/results\/alignment\/exp1.bam, ..\/ChIPSeq\/results\/alignment\/exp2.bam, ..\/ChIPSeq\/results\/peaks\/exp1.peaks, ..\/ChIPSeq\/results\/peaks\/exp2.peaks\n    jobid: 0\n    reason: Input files updated by another job: ..\/ChIPSeq\/results\/alignment\/exp2.bam, ..\/RNASeq\/results\/differentialAnalysis\/exp1.csv, ..\/RNASeq\/results\/featureCounts\/exp2.txt, ..\/ChIPSeq\/results\/alignment\/exp1.bam, ..\/ChIPSeq\/results\/peaks\/exp1.peaks, ..\/RNASeq\/results\/differentialAnalysis\/exp2.csv, ..\/RNASeq\/results\/featureCounts\/exp1.txt, ..\/ChIPSeq\/results\/peaks\/exp2.peaks\n    resources: mem_mb=500, disk_mb=1000, tmpdir=\/tmp, runtime=00:01:00\n\n&#91;Wed Jul 13 21:04:44 2022]\nFinished job 0.\n9 of 9 steps (100%) done\nComplete log: .snakemake\/log\/2022-07-13T210249.161330.snakemake.log\n<\/code><\/pre>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h4 class=\"wp-block-heading\">Acknowledgements<\/h4>\n\n\n\n<p>Thank you to Thomas Weber for his comments and suggestions.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Subdividing your pipeline into subworkflows is a powerful mean to achieve complex analysis. It also enables to avoid code repetition. Typically, if one has toanalyse different types of data such as ChIP-seq and RNA-seq, and then perform multi-omics data integration, it becomes handy to&hellip;<\/p>\n","protected":false},"author":5,"featured_media":1678,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[4096],"tags":[4098,5438],"embl_taxonomy":[],"class_list":["post-1622","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technical","tag-snakemake","tag-subworkflow"],"acf":[],"embl_taxonomy_terms":[],"featured_image_src":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-content\/uploads\/2022\/07\/charlie-chaplin-in-modern-times.jpg","_links":{"self":[{"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/posts\/1622"}],"collection":[{"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/comments?post=1622"}],"version-history":[{"count":32,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/posts\/1622\/revisions"}],"predecessor-version":[{"id":1796,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/posts\/1622\/revisions\/1796"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/media\/1678"}],"wp:attachment":[{"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/media?parent=1622"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/categories?post=1622"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/tags?post=1622"},{"taxonomy":"embl_taxonomy","embeddable":true,"href":"https:\/\/www.embl.org\/groups\/bioinformatics-rome\/wp-json\/wp\/v2\/embl_taxonomy?post=1622"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}