localQTL is a pure-Python library for local-ancestry-aware xQTL mapping that lets researchers run end-to-end analyses on large cohorts without any R/rpy2 dependencies. It preserves the familiar tensorQTL data model with GPU-first execution paths, flexible genotype loaders, and streaming outputs for large-scale workflows, while adding ancestry-aware use cases.
- GPU-accelerated workflows powered by PyTorch, with automatic fallbacks to CPU execution when CUDA is unavailable.
- Modular cis-QTL mapping API exposed through functional helpers such as
map_nominal,map_permutations, andmap_independent, or via the convenience wrapperCisMapper. - Multiple genotype backends including PLINK (BED/BIM/FAM), PLINK 2 (PGEN/PSAM/PVAR), and BED/Parquet inputs, with helpers to stream data in manageable windows.
- Local ancestry integration by pairing genotype windows with haplotype panels
produced by RFMix through
RFMixReaderandInputGeneratorCisWithHaps. - Parquet streaming sinks that make it easy to materialise association statistics without loading the entire result set in memory.
- Pure-Python statistics (no R/rpy2 required): tensorQTL's
rfunccalls have been refactored to scipy.stats for p-values and q-values are computed with the Python port of Storey'sqvalue(py-qvalue).
Install the latest release from PyPI:
pip install localqtlThe project uses Poetry for dependency management. Clone the repository and install the package into a virtual environment:
poetry installIf you prefer pip, you can install the library in editable mode after exporting
Poetry's dependency specification:
pip install -e .Note: GPU acceleration relies on PyTorch, CuPy, and cuDF. Make sure you use versions that match the CUDA toolkit available on your system. The versions in
pyproject.tomltarget CUDA 12.
Below is a minimal example that runs a nominal cis-QTL scan against PLINK-formatted genotypes and BED-formatted phenotypes. The example mirrors the data layout expected by tensorQTL, so existing preprocessing pipelines can be reused.
from localqtl import PlinkReader, read_phenotype_bed
from localqtl.cis import map_nominal
# Load genotypes and variant metadata
plink = PlinkReader("data/genotypes")
genotype_df = plink.load_genotypes()
variant_df = plink.bim.set_index("snp")[["chrom", "pos"]]
# Load phenotypes (BED-style) and their genomic coordinates
phenotype_df, phenotype_pos_df = read_phenotype_bed("data/phenotypes.bed")
# Optional: load covariates as a DataFrame indexed by sample IDs
covariates_df = None
results = map_nominal(
genotype_df=genotype_df,
variant_df=variant_df,
phenotype_df=phenotype_df,
phenotype_pos_df=phenotype_pos_df,
covariates_df=covariates_df,
window=1_000_000, # ±1 Mb cis window
maf_threshold=0.01, # filter on in-sample MAF
device="auto", # picks CUDA when available, otherwise CPU
out_prefix="cis_nominal", # default prefix
return_df=True # default is False, parquet streamed sink
)
print(results.head())For analyses that combine nominal scans, permutations, and independent signal
calling, the CisMapper class offers a thin object-oriented façade:
from localqtl.cis import CisMapper
mapper = CisMapper(
genotype_df=genotype_df,
variant_df=variant_df,
phenotype_df=phenotype_df,
phenotype_pos_df=phenotype_pos_df,
covariates_df=covariates_df,
window=500_000,
maf_threshold=0.01,
out_dir="./",
out_prefix="cis_nominal",
)
mapper.map_nominal(nperm=0)
perm_df = mapper.map_permutations(nperm=1_000, beta_approx=True)
perm_df = mapper.calculate_qvalues(perm_df, fdr=0.05)
lead_df = mapper.map_independent(cis_df=perm_df, fdr=0.05)localQTL can incorporate local ancestry (e.g., from RFMix) so that cis-xQTL tests are performed with ancestry-aware genotype inputs. To enable this, pass haplotypes (per-variant × per-sample × per-ancestry dosages) and loci_df (variant positions corresponding to the haplotype tensor) to CisMapper. When these are provided, CisMapper switches to the ancestry-aware input generator under the hood.
from localqtl import PlinkReader, read_phenotype_bed
from localqtl.haplotypeio import RFMixReader
from localqtl.cis import CisMapper
plink = PlinkReader("data/genotypes")
genotype_df = plink.load_genotypes() # samples as columns, variants as rows
variant_df = plink.bim.set_index("snp")[["chrom", "pos"]]
phenotype_df, phenotype_pos_df = read_phenotype_bed("data/phenotypes.bed")
covariates_df = None # optional
# Local ancestry from RFMix (and align to genotype samples)
select_samples = genotype_df.columns.values
rfmix = RFMixReader(
prefix_path="data/rfmix/prefix", # directory with per-chrom outputs + fb.tsv
binary_path="data/rfmix", # where prebuilt binaries live (if used)
select_samples=select_samples, # optional for automatic sample alignment (recommended)
verbose=True
)
# Materialize ancestry haplotypes into memory (NumPy array)
# Shape: (variants, samples, ancestries) for >2 ancestries.
# For 2 ancestries, the reader exposes the ancestry channel selected internally.
H = rfmix.load_haplotypes()
loci_df = rfmix.loci_df # DataFrame with ['chrom','pos', 'ancestry', 'hap', 'index'] (indexed by 'hap')
# (Optional) Ensure chromosome dtype matches between variant_df and loci_df
variant_df["chrom"] = variant_df["chrom"].astype(str)
loci_df = loci_df.copy()
loci_df["chrom"] = loci_df["chrom"].astype(str)
# Ancestry-aware mapping
mapper = CisMapper(
genotype_df=genotype_df,
variant_df=variant_df,
phenotype_df=phenotype_df,
phenotype_pos_df=phenotype_pos_df,
covariates_df=covariates_df,
haplotypes=H, # <-- enable local ancestry-aware mode
loci_df=loci_df, # <-- positions that align to H
window=1_000_000,
maf_threshold=0.01,
device="auto",
out_dir="./",
out_prefix="cis_nominal",
)
# Run nominal scans and permutations as usual; ancestry-awareness is automatic
mapper.map_nominal(nperm=0)
perm_df = mapper.map_permutations(nperm=1_000, beta_approx=True)
# FDR using the pure-Python q-value port (no R/rpy2 needed)
perm_df = mapper.calculate_qvalues(perm_df, fdr=0.05)
# Identify conditionally independent signals
lead_df = mapper.map_independent(cis_df=perm_df, fdr=0.05)
print(lead_df.head())Input expectations
H(fromRFMixReader.load_haplotypes()) is ancestry dosages with shape(variants, samples, ancestries)for ≥3 ancestries; for 2 ancestries the reader exposes the configured channel.loci_dfrows correspond 1:1 toH's and be joinable (bychrom/pos) tovariant_dfused for genotypes.- Sample order in
genotype_df,phenotype_df, andcovariates_dfshould matchH(reindex as shown). - The cis-mapping helpers default to
preload_haplotypes=Trueso ancestry blocks are staged as contiguous tensors on the requested device (GPU or CPU). Override this flag when working under strict memory constraints.
Run the test suite with:
poetry run pytestThis exercises the core cis-QTL mapping routines using small synthetic datasets.