tfbpmodeling

A Python package for transcription factor binding and perturbation (TFBP) modeling that analyzes relationships between transcription factor binding and gene expression perturbations using LASSO.

NOTE: this documentation is produced by AI and hasn't had significant human revision. Please, if you find problems or there is anything confusing or missing, open an issue and just explain where the docs stopped being helpful.

Quick Start

Installation

python -m pip install git+https://github.com/BrentLab/tfbpmodeling.git

or, for the development branch dev:

python -m pip install git+https://github.com/BrentLab/tfbpmodeling.git@dev

Basic Usage

Run the main modeling command:

python -m tfbpmodeling linear_perturbation_binding_modeling \
    --response_file response_data.csv \
    --predictors_file binding_data.csv \
    --perturbed_tf YourTF

Command Line Interface

The package provides a single main command with comprehensive options for modeling transcription factor binding and perturbation data.

Main Command

python -m tfbpmodeling linear_perturbation_binding_modeling [OPTIONS]

This command executes a sequential 4-stage workflow:

All Data Modeling: Models perturbation ~ binding on complete dataset using bootstrap resampling
Top-N Modeling: Extracts significant predictors and models on top-performing data subset
Interactor Significance: Evaluates surviving interaction terms against corresponding main effects
Output Generation: Produces comprehensive results with confidence intervals and statistics

Required Arguments

Argument	Description
`--response_file`	Path to response CSV file (gene expression data)
`--predictors_file`	Path to predictors CSV file (binding data)
`--perturbed_tf`	Name of perturbed TF (must match response file column)

Key Options

Input Control

--blacklist_file: Exclude specific features from analysis
--n_bootstraps: Number of bootstrap samples (default: 1000)
--random_state: Set seed for reproducible results
--top_n: Features to retain in second modeling round (default: 600)

Feature Engineering

--row_max: Include row maximum as predictor
--squared_pTF: Include squared perturbed TF term
--cubic_pTF: Include cubic perturbed TF term
--ptf_main_effect: Include perturbed TF main effect
--exclude_interactor_variables: Exclude variables from interaction terms
--add_model_variables: Add custom variables to model

Model Parameters

--all_data_ci_level: Confidence interval for first round (default: 98.0%)
--topn_ci_level: Confidence interval for second round (default: 90.0%)
--max_iter: Maximum LassoCV iterations (default: 10000)
--iterative_dropout: Enable iterative variable dropout
--stage4_lasso: Use LassoCV for Stage 4 significance testing

Data Processing

--normalize_sample_weights: Normalize bootstrap weights to sum to 1
--scale_by_std: Scale model matrix by standard deviation (without centering)
--bins: Bin edges for data stratification (default: "0,8,12,np.inf")

Output Control

--output_dir: Results directory (default: "./linear_perturbation_binding_modeling_results")
--output_suffix: Suffix for output subdirectory naming

System Options

--n_cpus: CPU cores for parallel processing (default: 4)
--log-level: Logging verbosity (DEBUG, INFO, WARNING, ERROR, CRITICAL)
--log-handler: Log output destination (console, file)

Example Commands

Basic Analysis

python -m tfbpmodeling linear_perturbation_binding_modeling \
    --response_file data/expression.csv \
    --predictors_file data/binding.csv \
    --perturbed_tf YPD1

Advanced Analysis with Custom Parameters

python -m tfbpmodeling linear_perturbation_binding_modeling \
    --response_file data/expression.csv \
    --predictors_file data/binding.csv \
    --perturbed_tf YPD1 \
    --n_bootstraps 2000 \
    --top_n 500 \
    --all_data_ci_level 95.0 \
    --topn_ci_level 85.0 \
    --squared_pTF \
    --ptf_main_effect \
    --iterative_dropout \
    --stage4_lasso \
    --output_dir ./results \
    --output_suffix _custom_run \
    --n_cpus 8 \
    --random_state 42

Reproducible Analysis with Feature Engineering

poetry run python -m tfbpmodeling linear_perturbation_binding_modeling \
    --response_file data/expression.csv \
    --predictors_file data/binding.csv \
    --perturbed_tf YPD1 \
    --blacklist_file data/exclude_genes.txt \
    --random_state 12345 \
    --row_max \
    --cubic_pTF \
    --add_model_variables "red_median,green_median" \
    --exclude_interactor_variables "batch_effect" \
    --bins "0,5,10,15,np.inf" \
    --normalize_sample_weights \
    --scale_by_std

Input File Formats

Response File (expression data)

CSV format with genes/features as rows
First column: gene identifiers (matching predictor file)
Subsequent columns: sample expression values
Must contain column matching --perturbed_tf argument

Predictors File (binding data)

CSV format with genes/features as rows
First column: gene identifiers (matching response file)
Subsequent columns: binding measurements for different TFs

Blacklist File (optional)

Plain text file with one feature identifier per line
Features listed will be excluded from analysis

Output

Results are saved in the specified output directory with subdirectories for each analysis run. Output includes:

Model coefficients and confidence intervals
Bootstrap statistics and distributions
Significance testing results
Diagnostic plots and summaries
Log files with detailed execution information

Documentation

For detailed documentation and API reference, see https://brentlab.github.io/tfbpmodeling/

Development

See CLAUDE.md for development setup, testing commands, and contribution guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
.vscode		.vscode
docs		docs
tfbpmodeling		tfbpmodeling
tmp		tmp
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
configure_logger.py		configure_logger.py
mkdocs.yml		mkdocs.yml
mkdocs_requirements.txt		mkdocs_requirements.txt
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tfbpmodeling

Quick Start

Installation

Basic Usage

Command Line Interface

Main Command

Required Arguments

Key Options

Input Control

Feature Engineering

Model Parameters

Data Processing

Output Control

System Options

Example Commands

Basic Analysis

Advanced Analysis with Custom Parameters

Reproducible Analysis with Feature Engineering

Input File Formats

Response File (expression data)

Predictors File (binding data)

Blacklist File (optional)

Output

Documentation

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

BrentLab/tfbpmodeling

Folders and files

Latest commit

History

Repository files navigation

tfbpmodeling

Quick Start

Installation

Basic Usage

Command Line Interface

Main Command

Required Arguments

Key Options

Input Control

Feature Engineering

Model Parameters

Data Processing

Output Control

System Options

Example Commands

Basic Analysis

Advanced Analysis with Custom Parameters

Reproducible Analysis with Feature Engineering

Input File Formats

Response File (expression data)

Predictors File (binding data)

Blacklist File (optional)

Output

Documentation

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages