Research-oriented implementation of DF-Louvain (Dynamic Frontier Louvain) for dynamic community detection, augmented with a random walk graph partition (RWGP) refinement step that can split communities after deletions when modularity improves.
This repo is script-driven: run benchmarks over temporal edge updates (batch updates or time-windowed updates), compare DF vs refined DF, and export plots.
DF-Louvain is efficient for evolving graphs because it updates only an “affected frontier” after edge insertions/deletions. A key limitation (highlighted in the paper) is that pure DF-style updates naturally favor merges / local adjustments, and may fail to split a community when internal connectivity weakens after deletions.
RWGP-DF-Louvain addresses this by adding a lightweight refinement step that proposes binary splits via a short random walk inside candidate communities and accepts a split only if modularity increases.
For each update step (edge insertions + deletions):
- Frontier update (DF-Louvain): apply the batch update and run local Louvain moves restricted to affected nodes.
- Build refinement set: identify communities impacted by intra-community deletions (implementation: communities touched by deleted edges inside the same community / affected frontier).
- Random-walk refinement (bisection): for each candidate community, compute a short random walk distribution and split vertices based on deviation from the stationary distribution; accept the split only if it improves modularity.
Complexity matches the paper’s intent: the refinement cost is proportional to the total edges in refined communities, multiplied by a small walk length
For a candidate community subgraph with adjacency matrix
Let
Starting from a source node
The initial bisection is obtained by comparing
The refinement only accepts a split if it improves modularity. In the fast RW splitter (v5), the modularity change for splitting a community into
where
Names in the repo reflect iteration history:
- DF-Louvain implementation:
DynamicFrontierLouvaininsrc/models/df_louvain.py. - RWGP-DF-Louvain implementation:
GPDynamicFrontierLouvaininsrc/models/gp_df_louvain.py.- Despite the “GP” name, the refinement implementations in
src/gp_df/include random-walk-based splitters. - The refinement implementation is selected by
refine_version.
- Despite the “GP” name, the refinement implementations in
Refinement variants (see src/gp_df/__init__.py):
refine_version="v2-full": dense-matrix random walk split + modularity check.refine_version="v5": sparse random walk proposer + fast modularity gain test (recommended default in the Optuna scripts).
Baselines in src/models/:
StaticLouvain: recompute baselineNaiveDynamicLouvain: naive dynamic baselineDeltaScreeningLouvain: DF-style update with delta screening
- Python 3.10+
Install core dependencies:
pip install -r requirements.txtFor running the benchmark scripts and plots:
pip install pyyaml tqdm seaborn plotly wandbFor the Optuna/MLflow experiment scripts (run.py, run_bitcoin_*.py, etc.):
## Datasets
Create a local `dataset/` directory (gitignored) and place dataset files there. The benchmark config expects paths like:
- `dataset/soc-sign-bitcoinalpha.csv`
- `dataset/soc-sign-bitcoinotc.csv`
- `dataset/sx-mathoverflow.txt`
The loaders live in `src/data_loader/` and support both **batch updates** and **window-frame** updates.
Benchmarks are driven by `config/default.yaml`.
1) Edit `config/default.yaml`:
- Set `mode`: `batch` or `window_frame`
- Choose `target_datasets`
- Verify dataset file paths + column indices (`source_idx`, `target_idx`, and for window mode also `timestamp_idx`)
2) Run:
```bash
python run_benchmarks.py
Outputs:
- Plots are written under
results/<mode>_benchmark/<dataset_name>/...(theresults/directory is gitignored).
The dataset scripts (run_bitcoin_alpha.py, run_bitcoin_otc.py, run_college_msg_graph.py, run_sx_mathoverflow.py) run Optuna sweeps and log to MLflow.
Notes:
- The scripts call
load_dotenv(".env"). If you want a custom MLflow backend, create.envand setMLFLOW_TRACKING_URI(or adjust the constants inconsts/). - Most scripts instantiate
GPDynamicFrontierLouvain(..., refine_version="v5").
import networkx as nx
data_manager = DatasetBatchManager()
G, temporal_changes = data_manager.get_dataset(
dataset_path="dataset/CollegeMsg.txt",
dataset_type="college_msg",
source_idx=0,
target_idx=1,
batch_range=0.005,
max_steps=10,
load_full_nodes=True,
)
initial = nx.algorithms.community.louvain_communities(G, seed=42)
initial_partition = {node: cid for cid, comm in enumerate(initial) for node in comm}
df = DynamicFrontierLouvain(graph=G, initial_communities=initial_partition, verbose=False)
rwgp_df = GPDynamicFrontierLouvain(
graph=G,
initial_communities=initial_partition,
refine_version="v5", # RW-based refinement
verbose=False,
)
df_metrics = df.run(change.deletions, change.insertions)["DF Louvain"]
rwgp_metrics = rwgp_df.run(change.deletions, change.insertions)["GP - Dynamic Frontier Louvain"]
print(df_metrics.modularity, rwgp_metrics.modularity).
├── config/ # Benchmark + synthesis configs
├── consts/ # Dataset-/experiment-specific constants
├── docs/ # Architecture notes and refactor history
├── src/
│ ├── benchmarks.py # Benchmark runner
│ ├── components/ # Result schemas + temporal change objects
│ ├── data_loader/ # Batch + window-frame dataset loaders
│ ├── gp_df/ # RWGP refinement implementations (v1..v5)
│ ├── models/ # DF + RWGP-DF + baselines
│ └── utils/ # Plotting + helpers + MLflow logging
├── run_benchmarks.py # YAML-driven benchmark entrypoint
├── run.py # Synthetic Optuna/MLflow experiment
├── run_*.py # Dataset-specific Optuna/MLflow experiments
└── requirements.txt
This is research / experimental code. Expect rapid iteration (especially in refinement variants) and favor refine_version="v5" for the most paper-aligned RW split criterion.
docs/ARCHITECTURE.md: module-level overviewdocs/REFACTORING_SUMMARY.md: historical refactor notes