#!/usr/bin/env python # coding: utf-8 # # for SO https://stackoverflow.com/q/78892547/8508004 Snakemake - adapt the checkpoints example pipeline to the new version 8.18.1 # # Set-up for situation presented. # # (Used session launched from [here](https://gist.github.com/fomightez/6773dedf6d5132795dd4245a18c066eb); go there and click on '`launch binder`' to get started. That was because when I tried with an older Python version from sessions launched from [binder's requirements.txt example repo](https://github.com/binder-examples/requirements), `pip` reported for Snakemake 8.18.1, I needed 3.11(?). [Although it seemed to work to install with `pip` on Anaconda cloud with Python 3.10.]) # # In[1]: get_ipython().run_line_magic('pip', 'install snakemake==8.18.1') # In[2]: get_ipython().system('snakemake --version') # Worked! Installed specified version. # # # ---------------- # # # Now to prepare for the Snakemake snakefile put forth... # # Code put forth as example is old and was from [bottom here at 'EdwardsLab's post: 'Snakemake - How to use snakemake checkpoints'](https://edwards.flinders.edu.au/how-to-use-snakemake-checkpoints/). # # Issue is that the code there has flag 'directory' used under `input` directive for the rule `make_third_files` and so you get `The flag 'directory' used in rule make_third_files is only valid for outputs, not inputs.`. # # The changes needed are not using `directory()` flag with `input` directives as covered [here](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#directories-as-outputs) or else you get the error `The flag 'directory' used in rule make_third_files is only valid for outputs, not inputs.`. **However, directories themseleves are allowed as `input` for Snakemake so just use the path to them without the flag.** # Plus to get this to work, you need the two output directories as `input` for the default rule. (I'm not sure how it ever worked before because I don't see how rules `make_some_files` & `make_more_files` would get triggered otherwise.) # # To work this out, I originally put the original provided code between the `'''` below and then edit it and re-ran the next two cells, iterating on that until it worked. (I then later started a fresh session with what I worked out and re-ran to get the final-run 'clean' version seen here.) # In[3]: ns='''OUTDIR = "first_directory" SNDDIR = "second_directory" THRDIR = "third_directory" def combine(wildcards): # read the first set of outputs ck_output = checkpoints.make_some_files.get(**wildcards).output[0] FIRSTS, = glob_wildcards(os.path.join(ck_output, "{sample}.txt")) # read the second set of outputs sn_output = checkpoints.make_more_files.get(**wildcards).output[0] SECONDS, = glob_wildcards(os.path.join(sn_output, "{smpl}.txt")) return expand(os.path.join(THRDIR, "{first}.{second}.tsv"), first=FIRSTS, second=SECONDS) rule all: input: OUTDIR, SNDDIR, combine checkpoint make_some_files: output: directory(OUTDIR) shell: """ mkdir {output}; N=$(((RANDOM%5)+1)); for D in $(seq $N); do touch {output}/$RANDOM.txt done """ checkpoint make_more_files: output: directory(SNDDIR) shell: """ mkdir {output}; N=$(((RANDOM%5)+1)); for D in $(seq $N); do touch {output}/$RANDOM.txt done """ rule make_third_files: input: OUTDIR, SNDDIR, output: os.path.join(THRDIR, "{first}.{second}.tsv") shell: """ touch {output} """ ''' get_ipython().run_line_magic('store', 'ns >Snakefile') # In[4]: get_ipython().system('snakemake -c 1') # Works. # # Let's show the contents of the directories as verificaiton. # In[5]: ls first_directory/ # In[6]: ls second_directory/ # In[7]: ls third_directory/ # The content of the working `Snakefile` is below **for easier copying and pasting**: # In[8]: cat Snakefile # ------ # # Change type of the next cell to '`Code`' and run cell below to clean up if want to test with running cell containing `!snakemake -c 1` above again: !rm -rf first_directory !rm -rf second_directory !rm -rf third_directory # In[ ]: