Lead–Lag Structure Analysis
This repository provides a pure, research-grade diagnostic for identifying and characterising lead–lag structure between two pre-processed time series.
It answers questions of the form:
“Does variation in series A tend to precede variation in series B, and over what lag window?”
The output is descriptive, not predictive. No trading assumptions, feasibility constraints, or execution logic are included.
Scope and Philosophy
This codebase is intentionally narrow in scope.
It is designed to:
Reveal temporal association structure
Characterise directionality (A → B vs B → A)
Describe where in lag space the association lives
Summarise shape, spread, and decay of the effect
It is not designed to:
Construct or validate trading strategies
Choose holding periods
Model execution, fees, or liquidity
Produce forecasts or P&L
Claim causality
Those questions belong downstream, in separate systems.
What This Repository Does
Given four pre-processed series:
signal_ab: signal driving A → B
target_ab: target for A → B
signal_ba: signal driving B → A
target_ba: target for B → A
(all aligned, cleaned, and transformed upstream)
the repository:
Computes Information Coefficients (ICs) across a lag range
Runs the analysis in both directions:
A → B
B → A
Supports two complementary modes:
Exploratory (shape inspection)
HAC-adjusted (inference under serial dependence)
Extracts lag-structure diagnostics:
Peak lag
Peak lag region
Integrated IC
Normalised IC mass
IC centroid
Decay half-life
Optionally produces:
IC plots
Structured JSON output
What This Repository Does Not Do
This repository does not:
Align time series
Handle missing data
Compute returns
Define forward horizons
Decide holding periods
Apply filters or transforms
Optimise parameters
Backtest strategies
All inputs must be prepared upstream.
This is a structural diagnostic, not a pipeline.
Core Methodology Information Coefficient (IC)
For each lag 𝑘 k:
IC(k) = corr(signal_{t−k}, target_t)
Supported correlation types:
Spearman (rank-based)
Pearson (level-based)
Lag units are assumed to be uniform (e.g. days).
Lag 0 Semantics
Lag 0 is treated explicitly as a contemporaneous baseline:
It represents synchronous movement
It is not interpreted as lead–lag causality
All structural metrics exclude lag 0 by design
This avoids accidental misuse of synchronous correlation as directional evidence.
Exploratory vs HAC Modes
Two modes are intentionally supported:
Exploratory
Computes IC values only
Fast and lightweight
Intended for shape inspection
HAC (Newey–West)
Computes standard errors and confidence intervals
Accounts for serial dependence
Intended for honest inference
Both modes operate on the same inputs and lag grid.
Peak Region Detection
Rather than selecting a single lag via argmax, the code identifies contiguous lag regions that satisfy criteria such as:
Fraction of peak IC
Statistical significance
Minimum width
This avoids over-interpreting noisy point estimates and highlights whether the effect is:
Sharp and localised, or
Broad and distributed
Integrated IC
Integrated IC measures the total IC mass over a lag window:
Raw integrated IC: sum of ICs
Normalised integrated IC: fraction of total absolute IC mass
This answers:
How much of the overall lead–lag structure lives in the peak region?
IC Centroid
The IC centroid provides a centre-of-mass estimate in lag space:
centroid = Σ(k · IC(k)) / Σ(IC(k))
It represents the effective timing of information transmission.
Decay Metrics
IC is treated as a decay curve over lag:
Peak lag
Peak IC
Half-life after the peak
These are descriptive diagnostics only, not forecasts.
Directional Asymmetry
The repository explicitly compares:
A → B
B → A
over the same lag window.
Directional asymmetry is summarised as:
ΔIC = IntegratedIC(A → B) − IntegratedIC(B → A)
This highlights dominant directionality without claiming causation.
Outputs
If an output_dir is provided, the analysis produces:
output_dir/ ├── A_to_B_exploratory.png ├── A_to_B_hac.png ├── B_to_A_exploratory.png ├── B_to_A_hac.png └── results.json
Plots are optional and purely illustrative
results.json is structured and machine-readable
The main function always returns results as a Python dict
File output is optional, enabling clean in-memory pipelines.
Intended Usage Pattern
This repository is designed to sit between:
an upstream data / transform layer, and
a downstream modelling or strategy layer
Typical flow:
Raw Data ↓ Question-specific preprocessing ↓ Lead–Lag Structure Analysis ← (this repository) ↓ Interpretation / modelling / trading logic
Summary
This repository answers one precise question:
“Is there evidence of a lead–lag relationship, and what does its structure look like?”
It deliberately avoids answering:
“Can this be traded?”
That separation is intentional — and enforced in code.