A Python package for transcription factor binding and perturbation (TFBP) modeling that analyzes relationships between transcription factor binding and gene expression perturbations using LASSO.
NOTE: this documentation is produced by AI and hasn't had significant human revision. Please, if you find problems or there is anything confusing or missing, open an issue and just explain where the docs stopped being helpful.
python -m pip install git+https://github.com/BrentLab/tfbpmodeling.gitor, for the development branch dev:
python -m pip install git+https://github.com/BrentLab/tfbpmodeling.git@devRun the main modeling command:
python -m tfbpmodeling linear_perturbation_binding_modeling \
--response_file response_data.csv \
--predictors_file binding_data.csv \
--perturbed_tf YourTFThe package provides a single main command with comprehensive options for modeling transcription factor binding and perturbation data.
python -m tfbpmodeling linear_perturbation_binding_modeling [OPTIONS]This command executes a sequential 4-stage workflow:
- All Data Modeling: Models
perturbation ~ bindingon complete dataset using bootstrap resampling - Top-N Modeling: Extracts significant predictors and models on top-performing data subset
- Interactor Significance: Evaluates surviving interaction terms against corresponding main effects
- Output Generation: Produces comprehensive results with confidence intervals and statistics
| Argument | Description |
|---|---|
--response_file |
Path to response CSV file (gene expression data) |
--predictors_file |
Path to predictors CSV file (binding data) |
--perturbed_tf |
Name of perturbed TF (must match response file column) |
--blacklist_file: Exclude specific features from analysis--n_bootstraps: Number of bootstrap samples (default: 1000)--random_state: Set seed for reproducible results--top_n: Features to retain in second modeling round (default: 600)
--row_max: Include row maximum as predictor--squared_pTF: Include squared perturbed TF term--cubic_pTF: Include cubic perturbed TF term--ptf_main_effect: Include perturbed TF main effect--exclude_interactor_variables: Exclude variables from interaction terms--add_model_variables: Add custom variables to model
--all_data_ci_level: Confidence interval for first round (default: 98.0%)--topn_ci_level: Confidence interval for second round (default: 90.0%)--max_iter: Maximum LassoCV iterations (default: 10000)--iterative_dropout: Enable iterative variable dropout--stage4_lasso: Use LassoCV for Stage 4 significance testing
--normalize_sample_weights: Normalize bootstrap weights to sum to 1--scale_by_std: Scale model matrix by standard deviation (without centering)--bins: Bin edges for data stratification (default: "0,8,12,np.inf")
--output_dir: Results directory (default: "./linear_perturbation_binding_modeling_results")--output_suffix: Suffix for output subdirectory naming
--n_cpus: CPU cores for parallel processing (default: 4)--log-level: Logging verbosity (DEBUG, INFO, WARNING, ERROR, CRITICAL)--log-handler: Log output destination (console, file)
python -m tfbpmodeling linear_perturbation_binding_modeling \
--response_file data/expression.csv \
--predictors_file data/binding.csv \
--perturbed_tf YPD1python -m tfbpmodeling linear_perturbation_binding_modeling \
--response_file data/expression.csv \
--predictors_file data/binding.csv \
--perturbed_tf YPD1 \
--n_bootstraps 2000 \
--top_n 500 \
--all_data_ci_level 95.0 \
--topn_ci_level 85.0 \
--squared_pTF \
--ptf_main_effect \
--iterative_dropout \
--stage4_lasso \
--output_dir ./results \
--output_suffix _custom_run \
--n_cpus 8 \
--random_state 42poetry run python -m tfbpmodeling linear_perturbation_binding_modeling \
--response_file data/expression.csv \
--predictors_file data/binding.csv \
--perturbed_tf YPD1 \
--blacklist_file data/exclude_genes.txt \
--random_state 12345 \
--row_max \
--cubic_pTF \
--add_model_variables "red_median,green_median" \
--exclude_interactor_variables "batch_effect" \
--bins "0,5,10,15,np.inf" \
--normalize_sample_weights \
--scale_by_std- CSV format with genes/features as rows
- First column: gene identifiers (matching predictor file)
- Subsequent columns: sample expression values
- Must contain column matching
--perturbed_tfargument
- CSV format with genes/features as rows
- First column: gene identifiers (matching response file)
- Subsequent columns: binding measurements for different TFs
- Plain text file with one feature identifier per line
- Features listed will be excluded from analysis
Results are saved in the specified output directory with subdirectories for each analysis run. Output includes:
- Model coefficients and confidence intervals
- Bootstrap statistics and distributions
- Significance testing results
- Diagnostic plots and summaries
- Log files with detailed execution information
For detailed documentation and API reference, see https://brentlab.github.io/tfbpmodeling/
See CLAUDE.md for development setup, testing commands, and contribution guidelines.