OCFLSuite

OCFLSuite is a research-oriented Python package and collection of scripts for designing, generating, and running experiments in Clustered Federated Learning (CFL). The repository contains dataset generation code, simulation drivers, utilities for aggregation and clustering, model templates, and experiment explanation tools used in related research.

This README documents how to install, run, and extend the project, and describes the main repository layout.

Quick facts

Language: Python
Supported Python: as declared in pyproject.toml (currently: >=3.11, <4.0) — use a Python interpreter matching the project metadata or update pyproject.toml if you require a different minor version
Main components: dataset generation, simulation drivers, clustering/aggregation utilities, explanation generation

Checklist (what this README covers)

Installation and environment setup (Python version, virtual env)
Generating datasets (location and format notes)
Running simulations (example commands and where to find scripts)
Explanation / postprocessing framework
Project layout (what each folder contains)
Contributing, testing and license

Installation

Prerequisites

Python 3.10 (the project is tested with Python 3.10.x)
Git

Recommended: create and use a dedicated virtual environment (venv, conda, or pyenv) or use Poetry to manage the project environment and dependencies.

Using a plain venv (optional)

# create and activate a venv (PowerShell)
python -m venv .venv
.\.venv\Scripts\Activate.ps1

# confirm Python version
python --version

Poetry-based installation (recommended)

This project includes a pyproject.toml, so the recommended way to manage the environment is via Poetry. Poetry will create an isolated virtual environment and install dependencies declared in pyproject.toml.

Install Poetry (follow https://python-poetry.org/docs/ for the latest instructions). Example install command (PowerShell):

# install Poetry (using pipx)
pipx install poetry

Ensure Poetry uses a compatible Python interpreter (matching pyproject.toml). You can point Poetry to a specific local Python executable, or to a supported minor version if available:

# point poetry to an existing Python interpreter (example)
poetry env use C:\\Python311\\python.exe
# or use a version specifier if installed (example)
poetry env use 3.11

Install dependencies and create the virtual environment:

poetry install

Run commands inside the Poetry environment:

poetry run python -c "import src; print('src import OK')"
poetry run python experiments\clustering_simulation.py

Note: If you deliberately want to use Python 3.10 but pyproject.toml requires >=3.11, either update pyproject.toml to accept 3.10, or install and use a Python interpreter matching the current pyproject.toml constraint.

Dataset generation

Datasets generated by this project are stored under experiments/datasets/ and follow a consistent layout per dataset (MNIST, FMNIST, CIFAR10, PATHMNIST, BLOODMNIST, ...). Each dataset has subfolders for split type (nonoverlaping / overlaping), balancing (balanced / imbalanced) and client counts (15 / 30).

Generation scripts are located inside the corresponding dataset folders. Example path for a split generator:

experiments/datasets/<DATASET>/<split_type>/<balance>/<num_clients>/data_generation_split.py

When you run a data generation script it creates:

A full dataset in HuggingFace dataset format
A cached (Apache Arrow) format for faster loading during simulations
A blueprint CSV file (used to inspect client partitions)

Important notes

Simulations use the cached (Arrow) format for performance. If you move a cached dataset between machines (or generate on a different OS), the cached format may not be portable — in such cases transfer the full HuggingFace dataset instead.
Blueprints are CSVs that list the partition information and are useful for quick inspection and reproducibility.

Example: generate a dataset (PowerShell)

cd experiments/datasets/FMNIST/nonoverlaping/balanced/15
python data_generation_split.py

Running simulations

Simulation drivers are located under experiments/ (for example: clustering, temperature, centralised tests). Each dataset folder also contains example scripts for centralised experiments.

Common simulation scripts (examples)

experiments/clustering_simulation.py — runs the federated clustering experiments used in the paper
experiments/temperature/simulation_script.py — temperature-based simulations (see folder for specifics)
experiments/centralised_tests/<DATASET>/simulation_script.py — centralised baselines and tests

Typical usage (PowerShell)

cd experiments
python clustering_simulation.py

Configuration

Most simulation scripts contain an if __name__ == '__main__': block listing datasets, number of clients, and other hyper-parameters. Edit those lists to control which experiments are run, or import the main functions and call them programmatically.

Logs and outputs

Numerical results and intermediate outputs are saved under experiments/results/ (subfolders for clustering, temperature, etc.).

Explanation generation (model explanations)

The explanation framework is located at experiments/model_explanations/scripted_experiments_framework.py and helper utilities in experiments/model_explanations/.

Before running the explanation scripts set the following global mounts inside scripted_experiments_framework.py:

DATASET_MOUNT — root directory containing datasets (default experiments/datasets)
MODEL_MOUNT — root directory containing stored models used for explanations
RESULTS_MOUNT — directory with numerical results (e.g., client-cluster attribution)
OUTPUT_MOUNT — where the generated explanations and plots will be written

This decoupling allows you to store large datasets/models on another drive or network mount and still run the framework locally.

Run the framework (example)

cd experiments/model_explanations
python scripted_experiments_framework.py

Project layout (high level)

experiments/ — dataset generation scripts, simulation drivers, and result folders
- datasets/ — scripts and generated datasets (by dataset name and split)
- model_explanations/ — explanation generation utilities and frameworks
- centralised_tests/ — centralised experiment scripts for baselines
- results/ — generated numeric and visual results
src/ — main library source code
- aggregators/ — aggregation strategies and implementations
- data_structures/ — data structures used by the simulations
- files/ — IO utilities, handlers, and logging helpers
- model/ — federated model wrapper
- net_templates/ — model templates (MNIST, FMNIST, ResNet adjustments)
- node/ — federated node implementation
- operations/ — evaluation routines and orchestration helpers
- simulation/ — core simulation loop and helpers
- utils/ — misc utilities (computations, splitters, animation)
pyproject.toml — project metadata and dependency declarations
README.md — this file

File examples

Blueprint CSVs: experiments/datasets/<DATASET>/.../<N>/<DATASET>_<N>_dataset_blueprint.csv
Dataset pointers (pickled caches or arrows): *_dataset_pointers

Source code (`src/`) structure

The src/ package contains reusable components used by the simulation drivers and experiment scripts. Brief description of subpackages and key files:

src/aggregators/
- aggregator.py — base aggregator interfaces and shared utilities
- fedopt_aggregator.py — FedOpt-style aggregation implementations
- distances.py — distance functions used by clustering or similarity measures
- temperature.py — utilities and algorithms for temperature-based aggregation
src/data_structures/
- cluster_sturcutre.py — data structures for storing cluster metadata and client-cluster attributions
src/files/
- archive.py — archival helpers for moving or compressing outputs
- handlers.py — file handling utilities for datasets, pointers, and blueprints
- loggers.py — logging wrappers used across experiments
src/model/
- federated_model.py — model wrapper used by nodes and orchestrators (training / evaluation helpers)
src/net_templates/
- mnist_model.py, fmnist_model.py — lightweight model templates for MNIST/FMNIST experiments
- resnet_adjusted.py — ResNet modifications for CIFAR / larger experiments
src/node/
- federated_node.py — node-level logic: local training, gradient computation, and communication hooks
src/operations/
- evaluations.py — evaluation metrics and reporting utilities
- orchestrations.py — high-level orchestration helpers used by simulation drivers
src/simulation/
- simulation.py — core simulation loop, experiment harness and utilities used by drivers
src/utils/
- computations.py — numeric helpers and small math utilities
- splitters.py — data splitting utilities used for creating client partitions
- select_gradients.py — helpers for selecting/filtering gradients for privacy or compression experiments
- animation.py — small utilities for visualising results or producing animations

This structure keeps experiment drivers lean and reusable. For most research adaptations, modify either a simulation script under experiments/ or extend/replace components in src/.

How code is organized for experiments

There are two main modes to modify experiments:

Direct substitution (recommended for quick experiments)
- Edit the simulation script under experiments/ (e.g., clustering_simulation.py) to change datasets, client counts, models, or hyperparameters.
Source modification (advanced)
- Modify the library code under src/ to implement new aggregation methods, clustering algorithms, or model templates. This is for advanced customization and reusability.

Contributing

Contributions are welcome. Please follow these guidelines:

Fork the repository and create a feature branch.
Keep changes focused and add small tests where appropriate.
Open a pull request with a clear description and the motivation for the change.

If you are planning to add large datasets or pre-trained models, consider adding them via an external mount and documenting the mount locations in scripted_experiments_framework.py rather than committing large binary files to git.

License

This project includes a LICENSE file in the repository root. Refer to it for license terms.

Contact / Support

If you have questions about running experiments, dataset generation, or using the explanation framework, open an issue in the repository with reproduction steps and relevant logs.

A note on reproducibility

To reproduce experiments from the paper:

Use Python 3.10 and install dependencies used during development.
Generate datasets using the provided generation scripts under experiments/datasets/.
Run the simulation drivers (for example clustering_simulation.py) with the lists of datasets and client cardinalities configured in the script.
Use the explanation framework and point its mounts at dataset/model/result folders (see experiments/model_explanations/scripted_experiments_framework.py).

Good luck, and feel free to request any targeted examples or helper scripts to speed up reproducible runs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCFLSuite

Quick facts

Checklist (what this README covers)

Installation

Dataset generation

Running simulations

Explanation generation (model explanations)

Project layout (high level)

Source code (`src/`) structure

How code is organized for experiments

Contributing

License

Contact / Support

A note on reproducibility

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
experiments		experiments
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

OCFLSuite

Quick facts

Checklist (what this README covers)

Installation

Dataset generation

Running simulations

Explanation generation (model explanations)

Project layout (high level)

Source code (src/) structure

How code is organized for experiments

Contributing

License

Contact / Support

A note on reproducibility

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Source code (`src/`) structure

Packages