piRNA Workflow Project

A bioinformatics workflow for piRNA and ChIP-seq analysis, using reproducible Snakemake pipelines and shared resources. This project is a work in progress that builds upon and extends the original methodologies from the Peng He Lab.

🚀 Project Overview

This repository contains a bioinformatics workflow system that is converting all workflows from the Peng-He-Lab/Luo_2025_piRNA repository from shell scripts to Snakemake:

Pipeline	Description	Status
CHIP-seq	ChIP-seq analysis from raw FASTQ to BigWig visualization	✅ Converted
TotalRNA-seq	Total RNA-seq processing with rRNA removal and alignment	✅ Converted
piRNA-seq	Specialized piRNA analysis pipeline	📋 Next Priority
Fusion Reads	Detection and analysis of fusion reads	📋 Planned
RIP-seq	RNA immunoprecipitation sequencing	📋 Planned

Shared Resources: Common scripts, genomes, and data files used by all workflows

🎯 Quick Start

# Interactive mode - guided setup with smart resource detection
./run_workflow.sh

# Or use numeric shortcuts
./run_workflow.sh 1    # Run ChIP-seq workflow
./run_workflow.sh 4    # Run totalRNA-seq workflow

Key Features:

✅ Interactive workflow selection and core allocation
✅ Smart resource detection and optimization
✅ Automatic error recovery and lock management
✅ Input validation and overwrite protection

Advanced Users: You can also run workflows directly with Snakemake. See the individual workflow READMEs (CHIP-seq or totalRNA-seq) for direct Snakemake usage examples.

📖 For detailed usage and troubleshooting, see WORKFLOW_MANAGER.md

🔄 Relationship to Original Work

This project builds upon and enhances the original work by Luo et al. 2025 and the Peng-He-Lab/Luo_2025_piRNA repository.

Key Improvements

Shell Scripts → Snakemake: Converted original shell-based pipelines to reproducible Snakemake workflows
Manual Dependencies → Conda: Automated environment management with conda/mamba
Hardcoded Paths → Variables: Centralized path management for better maintainability
Single-threaded → Parallel: Added parallel processing capabilities
Flexible Configuration: Easy customization for different datasets
Performance Optimization: Resource-aware execution and monitoring
Documentation: Comprehensive READMEs and setup guides

📁 Project Structure

piRNA_workflow/
├── CHIP-seq/                 # ✅ ChIP-seq analysis pipeline (Production Ready)
│   ├── Snakefile            # Main workflow definition
│   ├── config.yaml          # Configuration file
│   ├── envs/                # Conda environment definitions (13 files)
│   ├── results/             # Analysis outputs
│   └── README.md            # Detailed ChIP-seq documentation
├── totalRNA-seq/            # ✅ Total RNA-seq processing pipeline (Production Ready)
│   ├── Snakefile            # Main workflow definition
│   ├── config.yaml          # Configuration file
│   ├── envs/                # Conda environment definitions (9 files)
│   ├── results/             # Analysis outputs
│   └── README.md            # Detailed RNA-seq documentation
├── Shared/                  # Common resources for all workflows
│   ├── Scripts/             # Shared Python, shell, and Mermaid diagrams
│   ├── DataFiles/           # Common genome files and datasets
│   │   ├── genome/          # Reference genomes and indexes
│   │   └── datasets/        # Input FASTQ files
│   └── README.md            # Shared resources documentation
├── run_workflow.sh          # Unified workflow manager script
├── WORKFLOW_MANAGER.md      # Workflow manager documentation
└── README.md                # This file

🎯 Key Features

Reproducibility

Snakemake workflows for reproducible analysis
Conda environments for dependency management
Version-controlled configurations and parameters

Scalability

Parallel processing with configurable core usage
Modular design for easy customization
Resource-aware execution

Quality Control

Multi-step QC with FastQC integration
Adapter trimming and quality filtering
Comprehensive reporting at each step

Analysis Capabilities

ChIP-seq: Peak detection, enrichment analysis, BigWig generation
RNA-seq: rRNA removal, transcriptome alignment, vector mapping
Coverage analysis at multiple resolutions
Transposon-specific analysis

Workflow Enhancements

Conversion from shell scripts to Snakemake workflows
Standardized config.yaml files for easy parameter management
Individual conda environments for reliable dependency management
Updated software versions and best practices
Enhanced reproducibility and scalability

🚀 Setup

Prerequisites

Install Miniconda (if not already installed):

# Linux
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

# macOS
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
bash Miniconda3-latest-MacOSX-x86_64.sh

Install mamba (recommended for faster dependency resolution):
```
conda install mamba -n base -c conda-forge
```

Getting Started

Clone the repository:

git clone <repository-url>
cd piRNA_workflow

Create the snakemake environment (one-time setup):

conda create -n snakemake_env -c bioconda -c conda-forge snakemake

Run a workflow:
```
./run_workflow.sh
```

The script will guide you through workflow selection, path configuration, and resource allocation.

📖 For platform requirements, detailed commands, and troubleshooting, see WORKFLOW_MANAGER.md

🔧 Configuration

Environment Management

Automatic environment creation with --use-conda
Tool-specific environments for optimal performance
mamba support for faster dependency resolution

Sample Configuration

Flexible sample naming in Snakefiles
Configurable parameters for analysis steps
Easy customization for different datasets

📚 Documentation

CHIP-seq README: Comprehensive ChIP-seq pipeline documentation
TotalRNA-seq README: RNA-seq processing documentation
Shared Resources README: Common resources and scripts
Dataset Recommendations: Data quality guidelines
Workflow Manager Guide: Detailed usage and troubleshooting

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📖 Citation

If you use this workflow in your research, please cite:

Original Research

Luo et al. 2025: Paper Title - Original methodology and findings

Original Repository

Peng-He-Lab/Luo_2025_piRNA: https://github.com/Peng-He-Lab/Luo_2025_piRNA - Source of original scripts and methodology

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
CHIP-seq		CHIP-seq
Shared		Shared
totalRNA-seq		totalRNA-seq
.gitignore		.gitignore
FIXES_IMPLEMENTED.md		FIXES_IMPLEMENTED.md
MINOR_ISSUES_ANALYSIS.md		MINOR_ISSUES_ANALYSIS.md
README.md		README.md
WIGTOBIGWIG_CLIP_FLAG_EXPLANATION.md		WIGTOBIGWIG_CLIP_FLAG_EXPLANATION.md
WORKFLOW_MANAGER.md		WORKFLOW_MANAGER.md
YICHENG_ISSUES_CHECK.md		YICHENG_ISSUES_CHECK.md
YICHENG_ISSUES_VERIFICATION_LATEST.md		YICHENG_ISSUES_VERIFICATION_LATEST.md
example_command.sh		example_command.sh
run_workflow.sh		run_workflow.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

piRNA Workflow Project

🚀 Project Overview

🎯 Quick Start

🔄 Relationship to Original Work

Key Improvements

📁 Project Structure

🎯 Key Features

Reproducibility

Scalability

Quality Control

Analysis Capabilities

Workflow Enhancements

🚀 Setup

Prerequisites

Getting Started

🔧 Configuration

Environment Management

Sample Configuration

📚 Documentation

📄 License

📖 Citation

Original Research

Original Repository

About

Uh oh!

Releases

Packages

Languages

stevehanstudio/piRNA_workflow

Folders and files

Latest commit

History

Repository files navigation

piRNA Workflow Project

🚀 Project Overview

🎯 Quick Start

🔄 Relationship to Original Work

Key Improvements

📁 Project Structure

🎯 Key Features

Reproducibility

Scalability

Quality Control

Analysis Capabilities

Workflow Enhancements

🚀 Setup

Prerequisites

Getting Started

🔧 Configuration

Environment Management

Sample Configuration

📚 Documentation

📄 License

📖 Citation

Original Research

Original Repository

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages