This folder contains a small Sanger AB1 assembly workflow with two entrypoints:
assemble.py: process one sample folder that normally contains one forward read and one reverse read, with an optional single-strand fallback mode.main.py: batch-run the same assembly logic across all subfolders of a parent folder with multiprocessing, a progress bar, and optional combined FASTA output.
In this README, AB1 file and ABI file mean the same Sanger trace file format.
Given paired ABI trace files, the toolkit:
- reads sequence, Phred quality values, ABI trace channels, base positions, and selected metadata from
.ab1files - trims low-quality ends from each read
- reverse-complements the reverse read and reverse trace into forward orientation
- aligns the trimmed forward and reverse-complement reads in overlap style
- builds a consensus sequence with conservative per-position rules
- optionally uses paired-read IUPAC mixture calling
- always detects single-strand candidate mixtures for review, and can optionally apply selected confidence levels back into consensus
- writes per-sample FASTA, QA HTML, alignment HTML, and
Warning.htmlwhen assembly cannot complete for that sample - batch-processes many sample folders in parallel, shows progress, cleans each sample folder before assembly, can write one combined FASTA, and can generate categorized batch warning index HTML files
Run assemble.py on a folder that contains:
- one
.ab1file whose stem ends withF - one
.ab1file whose stem ends withR
With --allow-single-strand, the folder may instead contain only one of those files. The available read is mirrored into the missing strand orientation so the normal QA and consensus pipeline can still run.
Example:
sample_001/
isolate123F.ab1
isolate123R.ab1
Run main.py on a parent folder where each subfolder is one sample:
batch_run/
sample_001/
isolate123F.ab1
isolate123R.ab1
sample_002/
isolate456F.ab1
isolate456R.ab1
Install dependencies:
uv syncAssemble one sample folder:
uv run python assemble.py /path/to/sample_folderEnable paired-read mixture calling:
uv run python assemble.py --use-paired-mixture /path/to/sample_folderApply only high-confidence single-strand mixtures back into consensus:
uv run python assemble.py --use-single-mixture high /path/to/sample_folderApply high and medium single-strand mixtures back into consensus:
uv run python assemble.py --use-single-mixture medium /path/to/sample_folderUse different forward and reverse Phred thresholds:
uv run python assemble.py --min-phred-score-per-base 18:25 /path/to/sample_folderTune trimming and overlap thresholds:
uv run python assemble.py \
--min-phred-score-per-base 20:20 \
--min-consecutive-high-quality-bases 10 \
--min-overlap 40 \
/path/to/sample_folderCleanup one sample folder and exit:
uv run python assemble.py --clean /path/to/sample_folderAllow a single forward-only or reverse-only file:
uv run python assemble.py --allow-single-strand /path/to/sample_folderBatch-run all sample subfolders:
uv run python main.py /path/to/parent_folderBatch-run with a specific worker count:
uv run python main.py --processes 8 /path/to/parent_folderCleanup all sample subfolders and exit:
uv run python main.py --clean /path/to/parent_folderBatch-run and remap combined FASTA headers with an Excel sheet:
uv run python main.py --mapping-xlsx /path/to/mapping.xlsx /path/to/parent_folderBatch-run with single-strand fallback enabled:
uv run python main.py --allow-single-strand /path/to/parent_folderThe mapping file is interpreted as:
- column 1: sample ID
- column 2: sample name
Only the combined FASTA headers are remapped. Per-sample FASTA headers are left unchanged.
For each sample folder, successful assembly writes:
sample_001/
sample_001.fasta
forward_trimmed.fasta
reverse_rc_trimmed.fasta
sample_001_alignment.html
sample_001_QA.html
If trimming fails, the folder instead gets:
sample_001/
Warning.html
Batch mode may also write:
parent_folder/
parent_folder_combined.fasta
parent_folder_trim_warning.html
parent_folder_assembly_warning.html
<folder_name>_QA.html includes:
- a single-strand warning banner at the top when
--allow-single-strandmirrored one read into the missing strand Quality Plot: forward and reverse-complement quality plots with threshold, median, and trim-boundary markersChromatogram: forward and reverse-complement trace plots with shared x-axis controls- forward QA table
- reverse QA table
AB1 / ABI: a short explanation of AB1 files, instrument metadata, key ABI sections, and an ASCII hierarchy view
<folder_name>_alignment.html includes:
- a single-strand warning banner at the top when
--allow-single-strandmirrored one read into the missing strand - forward and reverse trimming summaries
- overlap and alignment parameters
resolve_consensus_baserule summary- the aligned forward, reverse-complement, consensus, read-position, and Phred rows
- a low-quality table
- a merged
Single Strand Mixturepanel with:- parameter summary
- table-header explanation
- forward candidate mixture table
- reverse-complement candidate mixture table
All HTML tables support client-side sorting and CSV download.
Consensus calling is intentionally conservative:
- if both aligned bases match and both are at least
--min-phred-score-for-paired-base, accept that base - if aligned bases match and only one strand is above its own per-read threshold, accept that base
- if only one strand contributes a base, accept it only if that strand is above its own per-read threshold
- if bases disagree and both are above their own per-read thresholds, emit an IUPAC mixture only when
--use-paired-mixtureis enabled - otherwise fall back to
N
When --use-single-mixture high or --use-single-mixture medium is enabled, selected single-strand mixture calls are mapped back onto trimmed read positions before consensus resolution.
folder: sample folder containing one*F.ab1and one*R.ab1--clean: remove all non-.ab1outputs in the sample folder and exit--use-paired-mixture: allow two-strand IUPAC mixture calls for high-confidence disagreements--use-single-mixture {high,medium}: apply selected-confidence single-strand mixtures back into consensus; default is disabled--allow-single-strand: allow a single forward-only or reverse-only.ab1file and mirror it into the missing strand--min-phred-score-per-base: per-read threshold inforward:reverseformat; default20:20--min-phred-score-for-paired-base: minimum Phred score accepted when both strands agree on the same base; default10--min-consecutive-high-quality-bases: run length used to define trim boundaries; default10--min-overlap: minimum aligned overlap after trimming; default40--detect-single-strand-mixture: currently enabled by default in code
Single-strand mixture detection and QA use these internal defaults:
min_peak_ratio=0.25min_secondary_snr=5.0noise_window_radius=20noise_window_exclude_radius=4high_phred_quality=30high_peak_ratio=0.33high_secondary_snr=8.0
main.py exposes the same assembly parameters plus:
--clean: clean each sample subfolder and continue to the next one without assembly--mapping-xlsx: remap headers in the combined FASTA using column 1 = ID and column 2 = name--processes: number of worker processes used for batch assembly; default8
Normal batch runs also clean each sample subfolder before assembly starts.
At the end of a batch run it prints:
Consensus ready: X/YWarning: N
During the batch run it also shows a progress bar on stderr.
If warnings exist, main.py writes one HTML file per warning category, for example <parent_folder>_trim_warning.html and <parent_folder>_assembly_warning.html, each with:
- the total warning count for that category
- one section per warning file
- a relative link to each
Warning.html - an embedded view of each warning page
- The default workflow still expects two reads per sample unless
--allow-single-strandis enabled. - File pairing depends on filename stem suffixes
FandR. - The single-strand detector thresholds are still code defaults, not CLI parameters.
- In single-strand mode, the missing strand is synthesized from the available read, which keeps the pipeline uniform but does not add new experimental evidence.
--detect-single-strand-mixtureis effectively redundant at the moment because the current parser default is already enabled.- Batch warning detection is based on whether
Warning.htmlexists.