Quick reference for common EdgeVolution commands. See the README for setup instructions.
Build the ML/NAS image (default):
docker build -t edgevolution .Build the embedded image (includes nRF tools, J-Link, Zephyr SDK):
docker build --target embedded -t edgevolution-embedded .Run the ML container (GPU-accelerated):
docker run -it --rm --gpus all -v $(pwd):/EdgeVolution edgevolutionRun the embedded container (with USB passthrough for J-Link):
docker run -it --rm --privileged --gpus all -v $(pwd):/EdgeVolution edgevolution-embeddedEdgeVolution uses Hydra for configuration. Experiments require three config groups:
| Group | Flag | Available configs |
|---|---|---|
| Hyperparameters | +hyperparameters= |
speech_commands, cifar10, daliac, emg_airob |
| Search space | +search_space= |
speech_commands, cifar10, daliac, emg_airob, complete |
| Boards | +boards= |
none, nrf52840dk, nrf5340dk, nrf52833dk |
python main.py +hyperparameters=speech_commands +search_space=speech_commands +boards=nonepython main.py +hyperparameters=speech_commands +search_space=speech_commands +boards=nrf52840dkpython main.py +hyperparameters=cifar10 +search_space=cifar10 +boards=nonepython main.py +hyperparameters=daliac +search_space=daliac +boards=noneHydra lets you override any config value from the command line:
python main.py \
+hyperparameters=speech_commands \
+search_space=speech_commands \
+boards=none \
hyperparameters.num_epochs.value=10 \
hyperparameters.num_generations.value=5python main.py \
continue_path=Results/speech_commands/<run_folder> \
continue_generation=5EdgeVolution includes an optional surrogate model that predicts validation accuracy from architecture encodings. It pre-screens individuals each generation and skips training for those confidently predicted to perform poorly, reducing overall search time.
Two model backends are available:
| Backend | Flag value | Strengths |
|---|---|---|
| Random Forest | random_forest (default) |
Fast, robust, good default. Tree-variance provides uncertainty. |
| Gaussian Process | gaussian_process |
Calibrated Bayesian uncertainty. Best for small datasets. O(n³) scaling. |
python main.py \
+hyperparameters=speech_commands +search_space=speech_commands +boards=none \
surrogate_accuracy.enabled.value=truepython main.py \
+hyperparameters=speech_commands +search_space=speech_commands +boards=none \
surrogate_accuracy.enabled.value=true surrogate_accuracy.model_type.value=gaussian_processIn evaluation mode the surrogate predicts accuracy for every individual but never skips any — all individuals are still fully trained. This produces ground-truth predicted-vs-actual data useful for paper figures (scatter plots, error distributions, per-generation correlation).
python main.py \
+hyperparameters=speech_commands +search_space=speech_commands +boards=none \
surrogate_accuracy.enabled.value=true surrogate_accuracy.evaluation_mode.value=trueA second surrogate can predict hardware metrics (energy, inference time) to skip MCU evaluation:
python main.py \
+hyperparameters=speech_commands +search_space=speech_commands +boards=nrf52840dk \
surrogate_hardware.enabled.value=trueThe surrogate skips training for individuals it confidently predicts to perform poorly. This saves time but means those architectures never get a real evaluation — if the surrogate is wrong, good candidates may be discarded. Two parameters control this trade-off directly:
confidence_threshold(default0.5) — Only individuals predicted below this accuracy are skip candidates. Lowering it makes the surrogate more conservative (skips fewer, wastes less potential). Raising it skips more aggressively and saves more time, but increases the risk of discarding promising architectures.exploration_ratio(default0.2) — Fraction of the population that is always trained, regardless of predictions. This prevents the surrogate from reinforcing its own biases. A higher ratio is safer but reduces the time savings; a lower ratio maximizes speedup at the cost of exploration.
As a rule of thumb: if your per-individual training time is short (a few seconds to minutes), the surrogate overhead may not be worth the risk — train everything. If training is expensive (tens of minutes to hours per individual), even a moderately accurate surrogate pays for itself by cutting the population that needs full training.
Start with evaluation mode (surrogate_accuracy.evaluation_mode.value=true) to measure the surrogate's accuracy on your specific search space before relying on it to skip training. Check surrogate_evaluation.png in the results folder — if the correlation is low or the MAE is large relative to accuracy differences in your population, keep the confidence threshold conservative or increase the exploration ratio.
When the accuracy surrogate is enabled, two CSV files are written to {results_dir}/surrogate_accuracy/:
| File | Contents |
|---|---|
surrogate_log.csv |
Per-individual records: generation, individual, predicted_acc, uncertainty, actual_acc, skipped |
surrogate_summary.csv |
Per-generation aggregates: generation, n_total, n_skipped, n_trained, mae, correlation, r_squared |
When the hardware surrogate is enabled, the same files are written to {results_dir}/surrogate_hardware/.
python3 -m pytest tests/ -v -p no:dashAll config files live under conf/. See the READMEs in each subdirectory for details:
- Hyperparameters — training parameters, population sizes, fitness weights
- Search space — layer types, parameter ranges, topology rules
- Boards — MCU target definitions (
nonedisables hardware evaluation) - Surrogate — surrogate model parameters, backend options, output files