✨ Miscellanous useful things ✨

🐍 Snakemake 🐍

📝 Log/progress parsing setup 📝

I am really happy with my log parsing setup for running snakemake jobs. You might have your own setup, but I wanted to share it anyways, maybe something useful for someone - maybe I get nice feedback how to make it even better . This works well if you

use conductor jobs (i.e., don't run snakemake interactively but via sbatch - not supposed to be like this but really useful imo)
always have the same structure, in my case, all conductor logs are named ~/projects/*/results/logs/snake_sbatch/*conductor*.out

Here, my useful aliases:

find and show the errors in an easily parseable manner (this is my favourite)

alias finderr='python ~/projects/useful_scripts/src/snakemake/find_error_logs_in_conductor.py > ~/tmp/finderr.txt; less ~/tmp/finderr.txt'

optimized for Snakemake 9 (the log structures sometimes change between versions)
counts the types of errors & sorts the file paths by them in an easily copy-pasteable way → no more guessing if the 300 error files are all the same error or 20 different errors

| Category                                                                   | Count |
+----------------------------------------------------------------------------+-------+
| Out Of Memory (OOM)                                                        |     0 |
| Killed (not OOM)                                                           |     0 |
| KeyError: 'cell_type'                                                      |     3 |
| Error in .subset(x, j) : invalid subscript type 'list'                     |     5 |
| AssertionError: This script only works for groups being cell types for now |     2 |
+----------------------------------------------------------------------------+-------+

Out Of Memory (OOM)
-------------------

Killed (not OOM)
----------------

KeyError: 'cell_type'
---------------------
fig1_marker_plot_selected_genes 2025-10-06 22:30:36 |||||
  /path/to/repo/.snakemake/slurm_logs/rule_fig1_marker_plot_selected_genes/ATAC_TSS_1000_500/10552392.log
fig1_marker_plot_selected_genes 2025-10-06 22:30:36 |||||
  /path/to/repo/.snakemake/slurm_logs/rule_fig1_marker_plot_selected_genes/ATAC_TSS_500_100/10552394.log
fig1_marker_plot_selected_genes 2025-10-06 22:30:36 |||||
  /path/to/repo/.snakemake/slurm_logs/rule_fig1_marker_plot_selected_genes/ATAC_TSS_100_100/10552396.log

Error in .subset(x, j) : invalid subscript type 'list'
------------------------------------------------------
confounding_factor_quantification_stat_test 2025-10-06 22:29:46 |||||
  /path/to/repo/.snakemake/slurm_logs/rule_confounding_factor_quantification_stat_test/a
ll_L4_RNA/10552408.log
confounding_factor_quantification_stat_test 2025-10-06 22:29:45 |||||
  /path/to/repo/.snakemake/slurm_logs/rule_confounding_factor_quantification_stat_test/s
ample_type__not_all_metadata__False_L4_RNA/10552409.log

print my queue including job names

alias myq="squeue -u rbednarsky -o '%.12i %.4P %.5j %.80k %.8M %.4C %.9m %.6D %R'"

give me the most recent log across projects

alias log_cond='LASTLOG=$(ls -Atd ~/projects/*/results/logs/snake_sbatch/*conductor*.{out,log} | head -1); echo $LASTLOG; tail -100 $LASTLOG; echo $LASTLOG; echo "----------------------------------------"'

continuously print how many errors there are in your pipeline (this is useful to notice early if something doesn't work)

alias ccounterr='while true; do LASTLOG=$(ls -Atd ~/projects/*/results/logs/snake_sbatch/*conductor*.{out,log} | head -1); echo ........................................................; ls -l "$LASTLOG" | awk '\''{print $6, $7, $8, $9}'\''; grep "Error" "$LASTLOG" | sort | uniq -c; sleep 5; done'

🧑‍💻 🤝 🐍 Interactive coding with Snakemake 🧑‍💻 🤝 🐍

This relates to code in src/snakemake/interactive_snakemake_object.py

My aim here is to work interactively, while developing a workflow, in two ways:

When writing a script for the first time, I want to already write it in a way that makes it easy to adapt the script to be run in the snakemake workflow.
Once I think the script is working, and it is run by the workflow already once, but I find out I want to change something, I want to be able to start a session that looks as if the script is just now being run by snakemake, i.e., there is a object in memory that is called snakemake that contains the input, output, wildcards, etc.

Here is how I do it:

During development, I use the SnakelikeObject class to work interactively with the snakemake object. It takes a nested dictionary, where first keys are input, output, wildcards, etc., and second keys are the names of the input/output files/directories with values being the paths to the files/directories.

snakemake = SnakelikeObject({
  "input": {
    "adata_superset": "/path/to/adata_superset.h5ad",
    "marker_genes": "/path/to/marker_genes.csv"
  },
  "output": {
    "fig1_marker_plot_selected_genes": "/path/to/fig1_marker_plot_selected_genes.png"
  },
  "wildcards": {
    "cell_type": "L4_RNA",
    "tss_distance": "1000_500"
  },
})

You can then use this object just as snakemake would use it, accessing attributes like this snakemake.input['adata_superset'] etc.
Once you are ready to run via snakemake, this structure is easy to transfer into a rule.
At the top of your script, save the object that snakemake injects into your environment as a json file, so you can load it for interactive coding later.

# save snakemake object content as json
snakemake_object_to_json(snakemake)

Once snakemake was run, if you want to code interactively, you can load the snakemake object from the json file.

snakemake = read_json_into_smk_obj(PROJECT_ROOT / 'results/smk_objects/rule_name/wildcards.json')

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ Miscellanous useful things ✨

🐍 Snakemake 🐍

📝 Log/progress parsing setup 📝

🧑‍💻 🤝 🐍 Interactive coding with Snakemake 🧑‍💻 🤝 🐍

About

Uh oh!

Releases

Packages

Languages

License

bednarsky/useful_scripts

Folders and files

Latest commit

History

Repository files navigation

✨ Miscellanous useful things ✨

🐍 Snakemake 🐍

📝 Log/progress parsing setup 📝

🧑‍💻 🤝 🐍 Interactive coding with Snakemake 🧑‍💻 🤝 🐍

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages