barQC is a powerful tool for processing Split-seq reads, focusing on generating detailed statistical reports and visualizations. It provides comprehensive insights into barcode distributions, including heatmaps and UMI statistics, to facilitate quality control and data analysis. Additionally, barQC can extract, correct, and tag barcodes for further downstream analysis.
barQC enables efficient processing of Split-seq reads by:
- Extracting, correcting, and tagging barcodes from FASTQ files.
- Generating heatmaps of barcode distributions.
- Providing detailed statistics on barcodes and UMI (Unique Molecular Identifier) counts.
The tool is designed for high-performance environments, supporting multithreading and memory management to handle large datasets.
Clone the repository and navigate to the project directory:
git clone https://github.com/mayflylab/barQC
cd barQCInstall the required dependencies using the provided Conda environment file:
conda env create -f barQC_conda_recipe.yml
conda activate barQCbbmap is a required dependency for alignment. Unfortunately, it is not available via Conda. Follow the steps below to install it:
-
Download the latest version of BBTools from the official BBTools website.
-
Extract the downloaded archive to a directory of your choice.
-
Add the
bbmapdirectory to your system’sPATH. For example:export PATH=/path/to/bbmap:$PATH
-
Test the installation by running:
bbmap.sh
If the command runs successfully, bbmap is installed and ready to use.
For detailed installation instructions, refer to the BBTools Installation Guide.
Run the tool with the following command:
python barQC.py -f1 <read1 fastq> -f2 <read2 fastq> -o <output name> [optional arguments]Example:
python barQC.py -f1 data/clean_reads_1.fastq -f2 data/clean_reads_2.fastq -o results/output -b barcode_list -t 24 --statsRequired Arguments
-f1, --read1_fastq: Path to the FASTQ file for read 1 (mRNA reads without adapters).-f2, --read2_fastq: Path to the FASTQ file for read 2 (barcode reads without adapters).-o, --output_name: Path and prefix for the output files.
Optional Arguments
-b, --bc_dir: Directory with expected barcode files (default: current directory).-q, --qval: Quality threshold for barcode evaluation (default: 10).-t, --threads: Number of threads for parallel processing (default: 20).-s, --stats: Save statistics to output files.-v, --verbose: Enable verbose logging for debugging.--skip_tagging: Skip barcode tagging and only generate statistics.--memory_limit: Limit memory usage in GB (default: system estimate).
<output_name>.bam: BAM file with tagged barcodes.<output_name>.log: Log file with processing details.<output_name>_debug.log: Debug log file (if--verbosemode is enabled).<output_name>_barcode_stats.csv: Detailed barcode statistics (if--statsmode is enabled).<output_name>_umi_stats.csv: UMI statistics with counts per unique barcode combination (if--statsmode is enabled).<output_name>_heatmap_<barcode>.png: Heatmap visualizations of barcode distributions per Split-seq plate.<output_name>_UMI_counts_distribution.png: Histogram of UMI counts.<output_name>_UMI_percell_distribution.png: Distribution of mean UMI counts per unique barcode combination.
-
Barcode Files: Expected barcode files and invariant sequences must be provided in the
barcode_listdirectory. These files are crucial for the correct functioning of barQC.Important:
- The CSV files for expected barcodes must maintain a specific structure:
- Columns:
WellPosition,Name, andBarcode. - All barcodes in each file must have the same length.
- Columns:
- Ensure the filenames match the expected format:
expected_barcodes_1.csv,expected_barcodes_2.csv,expected_barcodes_3.csv. - Include the invariant linker sequences in
invariable-linker-sequences.fasta.
Any deviation from the required structure or file naming may lead to errors during processing.
- The CSV files for expected barcodes must maintain a specific structure:
-
Conda Environment File:
barQC_conda_recipe.ymlcontains the dependencies required to run barQC. -
Test Data Directory: Example FASTQ files are provided in the
test_datadirectory for validating the installation.
This project is licensed under the MIT License. See the LICENSE file for details.
Developed by Maria Rossello.
For support or feedback, email Maria at mariarossello@ub.edu.
Found a bug or have a feature request? Feel free to open an issue on the GitHub repository.