Python implementation of the algorithms from:
"Matrix Factorization with Binary Components" Martin Slawski, Matthias Hein, Pavlo Lutsik NeurIPS 2013 | arXiv:1401.6024
Given a data matrix D ∈ ℝ^(m×n), find:
- T ∈ {0,1}^(m×r) — binary factor matrix
- A ∈ ℝ^(r×n) — coefficient matrix
such that D = T·A
Optional constraint: columns of A sum to 1 (convex combinations).
- Exact factorization (Algorithms 1 & 2): Finds all binary vertices in aff(D) using the Littlewood-Offord lemma
- Approximate factorization (Algorithm 3): Handles noisy data via SVD-based initialization
- Block optimization (Algorithm 4): Alternating refinement with optional non-negativity and simplex constraints
- O(m·2^(r-1)) complexity — tractable for small rank r
- DNA methylation unmixing (cell type deconvolution)
- Binary classification ensembles
- Topic modeling with hard assignments
from binary_matrix_factorization import binary_factorization_exact, binary_factorization_approximate
# Exact factorization (noiseless data)
T, A = binary_factorization_exact(D, affine=True, verbose=True)
# Approximate factorization (noisy data)
T, A = binary_factorization_approximate(D, r=4, nonnegative_A=True, sum_to_one=True)python binary_matrix_factorization.pyRuns demos for:
- Exact factorization on synthetic data
- Approximate factorization with noise
- Separable case (unique solution)
- DNA methylation unmixing simulation
- NumPy
- SciPy
- Slawski, Hein, Lutsik. "Matrix Factorization with Binary Components". NeurIPS 2013.
- Lutsik et al. "MeDeCom: Discovery and quantification of latent components of heterogeneous methylomes". BMC Bioinformatics 2017.