A C++ library and toolset for modeling transmission networks using genetic data from plasmodium falciparum. The core model implementation is located in src/impl/model/Model/Model.h.
Version: 1.0.0
Citation information will be added upon preprint release. Please check back for the full citation once the manuscript is published.
If you use Plasmotrack in your research, please cite the associated manuscript (details to be added).
# Configure and build (release)
cmake --preset=release
cmake --build --preset=release
# Run tests
./build/release/test/transmission_networks_tests
# Run CLI tool
./build/release/tools/cli/Model/transmission_networks_model- Citation
- Quick Start
- Prerequisites
- Repository Setup
- Conan Dependency Management
- CMake Build Process
- Building the Model
- Running Models
- Model Overview
- Troubleshooting
- Development Workflow
- CMake: Version 4.2.1 or higher
- C++ Compiler: C++20 compatible compiler (GCC 10+, Clang 12+, or MSVC 2019+)
- Python: Version 3.12 or higher (for Python tools)
- Build Tools: Ninja (recommended) or Make
- Conan: Version 2.0.5 or higher (for dependency management)
- CMake: For build configuration
- Git: For cloning the repository
- OpenMP: For parallel computation support
Conan 2.x can be installed via pip:
pip install conanVerify the installation:
conan --versiongit clone <repository-url>
cd transmission_netsKey directories:
src/- Core library source codesrc/impl/model/Model/- Main model implementation (Model.h,Model.cpp)src/core/- Core utilities and data structuressrc/model/- Model components and processes
test/- Test suite (GoogleTest)tools/cli/Model/- Command-line interface toolpython/- Python simulation toolscmake/- CMake modules and utilitieslib/- Third-party libraries (spline, imgui)
This project uses Conan 2.x for dependency management. Dependencies are automatically installed during the CMake configuration phase via the conan_provider.cmake mechanism.
The following dependencies are managed by Conan (defined in conandata.yml):
eigen/3.4.0- Linear algebra libraryfmt/9.1.0- Formatting librarynlohmann_json/3.11.3- JSON libraryboost/1.83.0- Boost C++ librarieszlib/1.3.1- Compression library
The project includes conan_provider.cmake which can automatically install dependencies when CMake's find_package() is called. However, this requires CMake 3.24+ for the dependency provider mechanism.
Note: CMake 4.2.1+ is required for the dependency provider mechanism used by Conan.
To enable automatic dependency installation, add this line near the top of your CMakeLists.txt (before any find_package() calls):
include(conan_provider.cmake)The conanfile.py defines:
- Package type:
application - Generator:
CMakeDeps - Layout:
cmake_layout(Conan 2.x standard layout)
If automatic dependency installation fails, you can manually install dependencies:
conan install . --output-folder=build/conan --build=missingThen configure CMake to use the Conan-generated files:
cmake -DCMAKE_TOOLCHAIN_FILE=build/conan/build/Release/generators/conan_toolchain.cmake -S . -B buildđź“– For comprehensive build instructions, see BUILD.md
The project uses CMake presets for standardized build configurations. All builds are out-of-source and located in the build/ directory:
build/debug/- Debug build with symbolsbuild/release/- Optimized release buildbuild/release-coverage/- Release build with coverage instrumentationbuild/debug-clang/- Debug build with Clang compilerbuild/release-clang/- Release build with Clang compiler
The project uses CMake presets for consistent builds. This is the recommended approach.
For a Debug build:
cmake --preset=debugFor a Release build:
cmake --preset=releaseFor a Release build with coverage:
cmake --preset=release-coverageFor Clang builds:
cmake --preset=debug-clang
cmake --preset=release-clangAfter configuring, build using:
cmake --build --preset=debug
cmake --build --preset=releaseOr combine configure and build:
cmake --preset=release && cmake --build --preset=releaseIf you prefer manual configuration without presets:
mkdir -p build/release
cd build/release
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release ../..The project has a hierarchical CMake structure with centralized dependency management:
-
Root
CMakeLists.txt:- Sets C++20 standard and compiler options
- Integrates Conan for dependency management
- Includes
cmake/Dependencies.cmaketo find all dependencies - Defines project-wide variables and paths
- Includes subdirectories (src, test, tools)
-
cmake/Dependencies.cmake:- Centralized dependency management
- Finds all required dependencies (Boost, Eigen3, nlohmann_json, fmt, ZLIB)
- Finds optional dependencies (OpenMP)
- Sets up imported targets for use throughout the project
-
src/CMakeLists.txt:- Builds the
transmission_networksstatic library - Organizes source files by component (core, model, impl)
- Links dependencies using imported targets from centralized function
- No duplicate
find_package()calls
- Builds the
-
test/CMakeLists.txt:- Downloads and builds GoogleTest
- Creates test executable
transmission_networks_tests - Links against the
transmission_networkslibrary and dependencies - Uses imported targets, no
find_package()calls
-
tools/cli/Model/CMakeLists.txt:- Builds the
transmission_networks_modelCLI executable - Links against the
transmission_networkslibrary and dependencies - Uses imported targets, no
find_package()calls
- Builds the
After building with presets, outputs are located in:
- Libraries:
build/<preset>/lib/libtransmission_networks.a(or.soon Linux) - Executables:
build/<preset>/test/transmission_networks_tests- Test suitebuild/<preset>/tools/cli/Model/transmission_networks_model- CLI tool
- Install targets:
bin/andlib/in the project root (if installed)
- Configure CMake using presets (from project root):
cmake --preset=releaseDuring configuration, Conan will automatically:
- Detect your system profile
- Install missing dependencies
- Generate CMake configuration files
- Build the library and executables:
cmake --build --preset=releaseOr build specific targets:
# Build only the library
cmake --build --preset=release --target transmission_networks
# Build only the tests
cmake --build --preset=release --target transmission_networks_tests
# Build only the CLI tool
cmake --build --preset=release --target transmission_networks_modelOptimized for debugging with symbols:
cmake --preset=debug
cmake --build --preset=debugOptimized for performance:
cmake --preset=release
cmake --build --preset=releaseFor code coverage analysis:
cmake --preset=release-coverage
cmake --build --preset=release-coverageThe project uses the following compiler flags (defined in root CMakeLists.txt):
- Common (all builds):
-Wall -Wextra -Wno-unused-function -Wno-unused-parameter - GCC-specific:
-Wno-maybe-uninitialized(GCC only) - Conditional flags:
-Werror: Only ifTRANSMISSION_NETWORKS_WERROR=ON(default: OFF)-fno-omit-frame-pointer: Only for Debug and RelWithDebInfo builds-g: Added automatically by CMake for Debug builds
- Release builds:
- Default optimization:
-O2 -DNDEBUG(both GCC and Clang) - Aggressive optimization:
-O3available viaTRANSMISSION_NETWORKS_AGGRESSIVE_OPTIMIZATION=ON - Native CPU optimization:
-march=nativeavailable viaTRANSMISSION_NETWORKS_NATIVE_OPTIMIZATION=ON
- Default optimization:
- OpenMP:
-fopenmpadded automatically if OpenMP is found (linked per-target)
The transmission_networks_model executable is the main CLI tool for running transmission network inference models using MCMC (Markov Chain Monte Carlo) with replica exchange.
./build/release/tools/cli/Model/transmission_networks_model \
--input data/input.json \
--output-dir results/ \
--symptomatic-idp data/symptomatic_idp.txt \
--asymptomatic-idp data/asymptomatic_idp.txtRequired Options:
--input, -i <file>: Input JSON file containing network data (see Input Format below)--output-dir, -o <directory>: Output directory where results will be written--symptomatic-idp <file>: File path to Symptomatic IDP (Infection Duration Probability) distribution--asymptomatic-idp <file>: File path to Asymptomatic IDP (Infection Duration Probability) distribution
MCMC Parameters:
--burnin, -b <int>: Number of burn-in steps (default: 5000)--sample, -s <int>: Total number of sampling steps (default: 10000)--thin, -t <int>: Thinning interval - number of steps between logged samples (default: 1000)
Replica Exchange Parameters:
--numchains, -n <int>: Number of chains to run in replica exchange algorithm (default: 1)--numcores, -c <int>: Number of CPU cores to use (default: 1)--gradient, -g <float>: Lower temperature of gradient to use in replica exchange (default: 0.0)
Other Options:
--seed <long>: Random seed for reproducibility. Use -1 to generate a random seed (default: -1)--hotload, -h: Hotload (resume) parameters from the output directory--null-model: Run the null model (ignores genetic data, uses only temporal constraints)--version: Display version information--help: Display help message
Basic run with default parameters:
./build/release/tools/cli/Model/transmission_networks_model \
--input data/network.json \
--output-dir results/run1/ \
--symptomatic-idp data/symptomatic_idp.txt \
--asymptomatic-idp data/asymptomatic_idp.txtExtended run with custom MCMC parameters:
./build/release/tools/cli/Model/transmission_networks_model \
--input data/network.json \
--output-dir results/run2/ \
--symptomatic-idp data/symptomatic_idp.txt \
--asymptomatic-idp data/asymptomatic_idp.txt \
--burnin 10000 \
--sample 50000 \
--thin 100 \
--seed 12345Parallel run with replica exchange:
./build/release/tools/cli/Model/transmission_networks_model \
--input data/network.json \
--output-dir results/run3/ \
--symptomatic-idp data/symptomatic_idp.txt \
--asymptomatic-idp data/asymptomatic_idp.txt \
--numchains 4 \
--numcores 4 \
--gradient 0.1Resume from previous run (hotload):
./build/release/tools/cli/Model/transmission_networks_model \
--input data/network.json \
--output-dir results/run1/ \
--symptomatic-idp data/symptomatic_idp.txt \
--asymptomatic-idp data/asymptomatic_idp.txt \
--hotloadNull model (no genetics):
./build/release/tools/cli/Model/transmission_networks_model \
--input data/network.json \
--output-dir results/null_model/ \
--symptomatic-idp data/symptomatic_idp.txt \
--asymptomatic-idp data/asymptomatic_idp.txt \
--null-modelThe input file must be a JSON file (optionally gzip-compressed with .gz extension) containing network data.
The input JSON file must contain two main sections: loci and nodes.
{
"loci": [
{
"locus": "AS1",
"num_alleles": 8,
"allele_freqs": [0.0013, 0.0017, 0.001, 0.7533, 0.2027, 0.0389, 0.0006, 0.0005]
},
{
"locus": "AS11",
"num_alleles": 12,
"allele_freqs": [0.0006, 0.0001, 0.0001, 0.0199, 0.1868, 0.0365, 0.4397, 0.2626, 0.0352, 0.0081, 0.007, 0.0034]
}
],
"nodes": [
{
"id": "1",
"observation_time": 234.0737,
"symptomatic": true,
"observed_genotype": [
{
"locus": "AS1",
"genotype": "00001000"
},
{
"locus": "AS11",
"genotype": "000000010000"
},
{
"locus": "AS12",
"genotype": "00000101"
}
],
"allowed_parents": ["0"]
},
{
"id": "2",
"observation_time": 306.2297,
"symptomatic": true,
"observed_genotype": [
{
"locus": "AS1",
"genotype": "00010000"
}
],
"allowed_parents": ["1", "0"]
}
]
}Top-level fields:
loci(required): Array of locus definitionsnodes(required): Array of infection event/node definitions
Locus object:
locus(required): String identifier for the locus (e.g., "AS1", "AS11")num_alleles(required): Integer number of alleles at this locusallele_freqs(optional): Array of allele frequencies (floats, should sum to approximately 1.0). If not provided, frequencies are estimated from the observed data. Each frequency corresponds to one allele at the locus.
Node object:
id(required): String identifier for the node/infection event (e.g., "1", "node1")observation_time(required): Float representing the time when the infection was observed (must be non-negative). Units are arbitrary but should be consistent across all nodes.symptomatic(optional): Boolean indicating if the infection is symptomatic (default:true). Used to select the appropriate infection duration probability distribution.observed_genotype(required): Array of observed genetic data objectsallowed_parents(required): Array of node ID strings that can be parents of this node. All parent IDs must correspond to existing nodes in the input. An empty array[]indicates the node has no allowed parents (i.e., it is a root node). This field constrains the possible transmission network structure based on epidemiological or other prior knowledge.
Observed genotype object:
locus(required): String matching a locus label from thelociarraygenotype(required): String of binary characters ('0' or '1') representing presence/absence of each allele.- Length must match
num_allelesfor the corresponding locus - '1' indicates the allele is present, '0' indicates absence
- Missing data can be represented as all zeros (
"000...") or an empty string
- Length must match
Additional Notes:
- Genotype format: Genotypes are strings, not arrays. Each character position corresponds to one allele.
- Missing data: Missing genotypes are handled as latent variables and will be inferred during MCMC sampling.
- Network constraints: The
allowed_parentsfield is critical for constraining the transmission network. Nodes can only have parents from this list, which allows incorporation of temporal, spatial, or other epidemiological constraints.
IDP (Infection Duration Probability) files specify the probability distribution for the time between infection and detection. These are simple text files containing comma-separated probability values, one per line.
Format:
- Each line contains comma-separated probability values
- Values should be non-negative and typically sum to 1.0 (though normalization may be applied)
- Example for a 10-day distribution:
0.05
0.1
0.15
0.2
0.2
0.15
0.1
0.05
0.0
0.0
This represents probabilities for days 0-9, where day 0 has probability 0.05, day 1 has 0.1, etc.
Creating IDP files: You can create IDP files from epidemiological data or use theoretical distributions. The length of the distribution should cover the expected range of infection durations in your study.
The model writes output to the specified output directory with the following structure:
output_dir/
├── parameters/
│ ├── mean_coi.csv.gz # Mean complexity of infection
│ ├── mean_strains_tx.csv.gz # Mean number of strains transmitted
│ ├── parent_set_size_prob.csv.gz # Parent set size probability
│ ├── infection_order.csv.gz # Infection event ordering
│ ├── eps_pos/ # False positive rates per node
│ │ ├── node1.csv.gz
│ │ ├── node2.csv.gz
│ │ └── ...
│ ├── eps_neg/ # False negative rates per node
│ │ ├── node1.csv.gz
│ │ ├── node2.csv.gz
│ │ └── ...
│ ├── infection_duration/ # Infection duration per node
│ │ ├── node1.csv.gz
│ │ ├── node2.csv.gz
│ │ └── ...
│ ├── allele_frequencies/ # Allele frequencies per locus
│ │ ├── L1.csv.gz
│ │ ├── L2.csv.gz
│ │ └── ...
│ ├── genotypes/ # Latent genotypes per node
│ │ ├── node1/
│ │ │ ├── L1.csv.gz
│ │ │ ├── L2.csv.gz
│ │ │ └── ...
│ │ └── ...
│ └── latent_parents/ # Latent parent genotypes
│ └── ...
├── stats/
│ └── likelihood.csv.gz # Log-likelihood values
└── parent_sets/ # Parent set distributions
├── node1_ps.csv.gz
├── node2_ps.csv.gz
└── ...
Parameter files (.csv.gz):
- Compressed CSV files with one value per line
- Each line represents a sampled value from the MCMC chain
- Files are gzip-compressed to save space
Parent set files (*_ps.csv.gz):
- CSV format with columns:
parent_set,prob,iter - Each row represents a parent set configuration and its probability
- Used to infer transmission network structure
Likelihood file (likelihood.csv.gz):
- Contains log-likelihood values for each iteration
- Format:
llik,<value>
- Transmission Networks: Check
parent_sets/directory for inferred parent-child relationships - Model Parameters: Review
parameters/for estimated epidemiological parameters - Convergence: Monitor
stats/likelihood.csv.gzto assess MCMC convergence - Genotype Inference: Examine
parameters/genotypes/for inferred latent genotypes
Execute the test suite:
./build/release/test/transmission_networks_testsRun with verbose output:
./build/release/test/transmission_networks_tests --gtest_output=xml:test_results.xmlThe model handles interrupt signals gracefully:
- SIGINT (Ctrl+C), SIGTERM: Gracefully finalizes output and exits
- SIGUSR1, SIGUSR2: Finalizes current output without exiting
- SIGQUIT, SIGABRT: Attempts graceful shutdown
When interrupted, the model will:
- Complete the current iteration
- Write all buffered output to disk
- Exit cleanly (output remains consistent)
The transmission network inference model uses Bayesian MCMC (Markov Chain Monte Carlo) with replica exchange (parallel tempering) to infer:
- Transmission Network Structure: Parent-child relationships between infection events
- Model Parameters: Epidemiological parameters (mean COI, transmission probabilities, etc.)
- Latent Variables: Unobserved genotypes, infection times, and parent assignments
- Observation Process: Models genetic data observation with false positive/negative rates
- Transmission Process: Models how infections are transmitted through the network
- Source Transmission: Models transmission from source population
- Node Transmission: Models transmission between nodes
- Prior Distributions: Bayesian priors for model parameters
When using multiple chains (--numchains > 1), the model employs replica exchange (parallel tempering):
- Multiple chains run at different "temperatures" (probability scales)
- Chains periodically exchange states to improve mixing
- Higher temperature chains explore the parameter space more freely
- Lower temperature chains focus on high-probability regions
- Temperature gradient is controlled by
--gradientparameter
The --null-model flag runs a simplified version that:
- Ignores all genetic/genotype data
- Uses only temporal constraints (observation times)
- Useful for baseline comparisons and testing temporal-only inference
- Start with single chain (
--numchains 1) for faster initial runs - Use multiple cores (
--numcores) when running multiple chains - Adjust thinning (
--thin): Lower values = more samples but larger output files - Monitor convergence: Check likelihood values stabilize before trusting results
- Use hotload (
--hotload) to resume long runs that were interrupted
Error: Conan install failed or conan: command not found
Solution:
- Install Conan:
pip install conan - Verify installation:
conan --version - Ensure Conan is in your PATH
Error: CMake 4.2.1 or higher is required
Solution: Upgrade CMake to version 4.2.1 or higher. On Linux:
# Using package manager
sudo apt-get install cmake # Ubuntu/Debian
sudo yum install cmake # CentOS/RHEL
# Or download from cmake.orgError: C++20 standard requested but compiler does not support it
Solution:
- Use GCC 10+, Clang 12+, or MSVC 2019+
- Specify compiler explicitly:
cmake -DCMAKE_CXX_COMPILER=g++-10 ..
Error: Package 'eigen/3.4.0' not found or similar
Solution:
- Update Conan remotes:
conan remote add conancenter https://center.conan.io
- Clear Conan cache and retry:
conan remove "*" -c - Manually install dependencies:
conan install . --build=missing
Error: OpenMP not found (warning, not fatal)
Solution: OpenMP is optional but recommended. Install:
- Linux:
sudo apt-get install libomp-dev(Ubuntu/Debian) - macOS:
brew install libomp - Windows: Included with Visual Studio
Error: Could not find a package configuration file provided by "eigen3"
Solution: This usually means Conan dependencies weren't installed.
-
If using CMake 3.24+: The
conan_provider.cmakeshould handle this automatically. Ensure it's included in CMakeLists.txt:include(conan_provider.cmake) -
If using CMake 4.2.1+: The dependency provider should work automatically. If not, manually run Conan install:
conan install . --output-folder=build/conan --build=missing cmake -DCMAKE_TOOLCHAIN_FILE=build/conan/build/Release/generators/conan_toolchain.cmake -S . -B build
-
Alternative: Use the manual Conan setup method described in the Conan section above
Warning: CMAKE_BUILD_TYPE is not set
Solution: Always specify build type:
cmake -DCMAKE_BUILD_TYPE=Release ..Error: C++20 features not recognized
Solution: Use GCC 10 or newer. Check version:
g++ --versionIf using Clang, ensure it's configured correctly:
cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ..Error: undefined reference to 'boost::...'
Solution: Ensure all dependencies are properly linked. Check src/CMakeLists.txt for required libraries.
Error: cannot find -ltransmission_networks
Solution: Build the library first:
cmake --build --preset=release --target transmission_networksFor development, use Debug builds for easier debugging:
cmake --preset=debug
cmake --build --preset=debugFor performance testing, use Release builds:
cmake --preset=release
cmake --build --preset=releaseThe project includes CMakePresets.json with predefined configurations. List available presets:
cmake --list-presetsConfigure and build using presets:
cmake --preset=release
cmake --build --preset=release- Open the project in CLion
- CLion will automatically detect CMakeLists.txt
- Select build configuration from the toolbar
- Build and run from the IDE
- Install the CMake Tools extension
- Open the project folder
- Select a kit (compiler) and build type
- Use the CMake Tools panel to configure and build
The project sets CMAKE_EXPORT_COMPILE_COMMANDS=ON, so compile_commands.json is automatically generated in the build directory (e.g., build/release/compile_commands.json). This enables:
- Better IDE code completion
- clangd support
- Other language server features
For CI/CD, typical workflow:
# Install dependencies
pip install conan
conan --version
# Configure and build using presets
cmake --preset=release
cmake --build --preset=release
# Run tests
./build/release/test/transmission_networks_tests- Model Implementation:
src/impl/model/Model/Model.h- Main model class - State Management:
src/impl/model/Model/State.h- Model state - Configuration:
src/impl/model/Model/config.h- Model configuration - CLI Tool:
tools/cli/Model/Model.cpp- Command-line interface - Test Suite:
test/src/- Unit tests
- Conan Documentation: https://docs.conan.io/
- CMake Documentation: https://cmake.org/documentation/
- C++20 Reference: https://en.cppreference.com/w/cpp/20