Microbe-CRISPR-Library is a Python toolkit for CRISPR library design across microbial genomes, including bacterial and fungal workflows. It contains multiple design scripts for Cas9 and CASTs applications, with versioned pipelines that preserve reproducibility while enabling iterative optimization.
This repository is designed for batch library generation rather than one-by-one guide picking. Typical outputs are:
- a success CSV containing final oligo-ready designs
- a failure/partial CSV summarizing genes that did not meet target design count
The toolkit supports:
- knockout library design
- knockdown library design
- promoter replacement workflows
- C-terminal fusion workflows
- CASTs insertion workflows
The codebase uses a modular, script-per-mode architecture.
- Keep each biological task in a dedicated script
- Keep versioned files to preserve old behavior
- Add new constraints as forward-compatible layers
Bact-CRISPR-Library.pyMain dispatcher for multi-mode usageCas9_knockout_designer_v11.pyCurrent Cas9 knockout entry point (V11, dynamic spacing adaptive version)Cas9_knockdown_designer_v1.pyCas9_PromoterChange_designer_v2.pyCas9_Cfusion_designer_v1.pyCASTs_designer_v3.pyCRISPR_knockin_v6_standalone.pySelf-contained dual-mode knockin designer (N_startandC_stop) with inlined logic and no dependency on externalknockin_J23119RBS_V*.pyfiles
For fungal knockout library generation, use:
Cas9_knockout_designer_v11.py
CRISPR_knockin_v6_standalone.py is a fully self-contained knockin designer that supports two insertion models in one script:
--model N_start- insert payload before the gene start codon
- V46-equivalent start-codon targeting workflow
--model C_stop- insert payload before the gene stop codon
- intended for C-terminal fusion design (tag fusion use case)
The script accepts --payload as either a literal sequence or a FASTA file, and writes oligo-ready CSV output directly from a single standalone file.
Both modes share the same high-level pipeline:
- parse CDS features from GBFF
- build strand-aware junction-centered sequence context
- scan PAMs around
junction ± search_window - generate candidates by strategy priority
- apply arm sanitization and RE filtering
- perform mutation-aware HA balancing and oligo assembly
- rank and select top designs per gene
- Junction: gene start codon boundary
- LHA: upstream region ending at the start-codon boundary
- RHA: coding-side region starting from start codon
- Strategy family:
- Priority1 (deletion)
- Priority2 (bridge)
- Priority3 (RHA mutation)
- Junction: stop codon boundary
- LHA: coding-tail region ending before stop codon
- RHA: downstream region after stop codon
- Strategy family:
CStop_P1_Del_DownstreamCStop_P2_BridgeCStop_P2_Bridge_MutCStop_P3_Mut_LHA
- Additional post-CDS validation:
- stop codon must be one of
TAA/TAG/TGA
- stop codon must be one of
- Overlap-aware handling keeps the fusion open while protecting neighboring CDS sequence when adjacent genes share bases
Mutation logic is applied to both LHA and RHA in both modes:
- Level 1: silent/synonymous mutation first
- Level 2: conservative amino-acid-group substitution fallback
This keeps PAM-breaking behavior consistent while minimizing coding impact.
- all knockin logic is inlined into one file for easier reuse and repository distribution
- genome-wide sgRNA specificity filtering is available through
--max_offtargets --rank2_sim_maxcontrols diversity-aware Rank2 selection inC_stopmode--barcode_seedcontrols deterministic barcode generation- same inputs + same seed produce reproducible outputs
- balanced HA trimming, flexible HA lengths, and restriction-site exclusion are retained in the standalone workflow
python CRISPR_knockin_v6_standalone.py \
--model N_start \
--gbff MG1655_genomic.gbff \
--payload J23119_RBS \
--template Knockin_J23100RBS_library_oligo_template.fasta \
--output CRISPR_Nstart_v6.csv \
--num_designs 2 \
--lha_len 70 --rha_len 70 \
--barcode_seed 42 \
--max_offtargets 0 \
--restriction_site GGTCTC --restriction_site GAAGACpython CRISPR_knockin_v6_standalone.py \
--model C_stop \
--gbff MG1655_genomic.gbff \
--payload J23119_RBS \
--template Cfusion_library_oligo_template.fasta \
--output CRISPR_Cstop_v6.csv \
--num_designs 2 \
--lha_len 70 --rha_len 70 \
--barcode_seed 42 \
--rank2_sim_max 50 \
--max_offtargets 0 \
--restriction_site GGTCTC --restriction_site GAAGACpython CRISPR_knockin_v6_standalone.py \
--model C_stop \
--target_gene b0002 \
--gbff MG1655_genomic.gbff \
--payload J23119_RBS \
--template Cfusion_library_oligo_template.fasta \
--output debug_b0002_cstop.csvCas9_knockout_designer_v11.py wraps and extends the evolutionary chain from v9/v10.
- Parse input genome and annotations
- FASTA+GFF3 or GBFF mode
- Build gene/CDS coordinate model
- strand-aware 5' information
- Enumerate sgRNA candidates
- PAM scanning
- strand-aware reverse complement handling
- restriction-site filtering
- Compute cut site per sgRNA
- Generate deletion candidates
- strategy depends on
--dele_model
- strategy depends on
- Build homology arms
- constrained by sequence availability and oligo budget
- Generate barcodes
- uniqueness + GC + repeat + restriction constraints
- Assemble final synthesis oligo
- via template placeholders
- Rank candidates and select per-gene top designs
- Write success and failure CSV outputs
V9 introduces a strategy router:
--dele_model {normal, Mt}
- default:
normal Mt: force Mt deletion strategy (PAM-direction cut-window logic)normal: force legacy/V7-style length-constrained deletion logic
Startup audit lines are printed for traceability:
[V9审计] 当前采用 Mt 删除策略[V9审计] 当前采用 normal 删除策略
V11 builds on the V10 global-candidate architecture with additional optimizations.
V11 introduces CDS-length-aware spacing between multiple designs for the same gene:
- CDS length
>= 2 × min_design_spacing: keep the standard spacing constraint - CDS length
< 2 × min_design_spacing: automatically switch spacing to0 bp
With the default --min_design_spacing 100, this means:
- CDS
>= 200 bp: target100 bpspacing between designs - CDS
< 200 bp: allow overlapping designs to maximize two-design coverage
For genes that still cannot satisfy the preferred spacing, V11 keeps the staged fallback logic:
- first pass: target dynamic spacing
- second pass: target half spacing
- final pass:
0 bp
When spacing falls back to 0, V11 still tries to maximize dispersion across already selected cut sites rather than simply taking adjacent top-ranked candidates.
Recommended:
- Python 3.8+
- Dependencies:
- pandas
- biopython
- gffutils
pip install pandas biopython gffutilsChoose one mode:
- FASTA + GFF3
--input_fna--input_gff
- GBFF
--input_gbff
Do not mix FASTA/GFF with GBFF in the same run.
For --output X.csv, the pipeline writes:
X.csvsuccessful designsX_failed.csvfailed/partial genes
Status definitions:
Failed: no valid design foundPartial: fewer designs than--sgRNA_num
--output--species--synthesis_template--sgRNA_num--barcode_len--restriction_site--max_oligo_length--num_workers--dele_model {normal,Mt}--deletion_mode {auto,cut_window,legacy_length}--min_design_spacing(V11: short CDS genes auto-relax to0 bp)
Deletion-specific:
- Mt / cut_window route:
--cut_window(e.g.20:100)
- normal / legacy_length route:
--del_length_per(e.g.10%:80%)--del_length_bp(e.g.300:1000)
python Cas9_knockout_designer_v11.py \
--dele_model Mt \
--input_gbff Mt_genomic.gbff \
--output Mt_V11_Mt_KO.csv \
--synthesis_template Mt_knockout_library_oligo_template.txt \
--species M_thermophila \
--barcode_len 11 \
--max_oligo_length 300 \
--restriction_site GGTCTC GAAGACpython Cas9_knockout_designer_v11.py \
--dele_model normal \
--input_gbff Mt_genomic.gbff \
--output Mt_V11_normal_KO.csv \
--synthesis_template Mt_knockout_library_oligo_template.txt \
--species M_thermophila \
--barcode_len 11 \
--max_oligo_length 300 \
--restriction_site GGTCTC GAAGAC \
--del_length_per 10%:80% \
--del_length_bp 300:1000python Cas9_knockout_designer_v11.py \
--input_gbff Mt_genomic.gbff \
--output Mt_V11_spacing80.csv \
--synthesis_template Mt_knockout_library_oligo_template.txt \
--species M_thermophila \
--dele_model Mt \
--min_design_spacing 80With --min_design_spacing 80, genes with CDS shorter than 160 bp will automatically run with 0 bp spacing.