Paper title: Website Fingerprinting on Nym: Attacks and Defenses
Authors: Eric Jollès, Simon Wicky, Ania M. Piotrowska, Harry Halpin, Carmela Troncoso
Requested Badge(s):
- Available
- Functional
- Reproduced
This artifact accompanies the paper "Website Fingerprinting on Nym: Attacks and Defenses" accepted at Privacy Enhancing Technologies Symposium (PETs) 2025.
The artifact provides:
- Traffic capture pipeline: Scripts for capturing website traffic through Tor and Nym networks
- WTF4NYM defense implementation: Defense with configurable parameters
- Flow correlation attack: Deep learning model for correlating traffic flows at proxy and network requester observation points. ML model from "MixMatch: Flow Matching for Mixnet Traffic", Oldenburg et al.
- Feature importance analysis: Tools for analyzing which traffic features contribute to website fingerprinting attacks
- Datasets: Pre-captured traffic traces for monitored and unmonitored websites under various network configurations
No security or privacy risks. The artifact analyzes publicly accessible websites and does not include any personally identifiable information, vulnerable code, exploits, or security-disabling mechanisms. All website visits were automated and did not involve human subjects.
The experiments reported in the paper were performed on:
- CPU: Intel Xeon processors with 16 cores
- RAM: 32GB
- GPU: NVIDIA Tesla V100 (16GB VRAM) or similar
- OS: Ubuntu 22.04 LTS
The artifact can run on commodity hardware without GPU (using CPU), but training will take significantly longer (~4-5x).
Operating System:
- Tested on: Ubuntu 20.04 and 22.04 LTS
- Should work on: Any modern Linux distribution
- May work on: macOS with appropriate dependencies
Python Environment:
- Python 3.9
Python Dependencies:
See requirements.txt
Datasets:
The artifact includes pre-captured traffic datasets available on Zenodo and organized in the data/ directory:
data/full_list/: Complete dataset with monitored and unmonitored websites for Tor and Nym (labnet and mainnet)data/reduced_list/: Multiple configurations testing different WTF4NYM defense parametersdata/traffic_captures/: Raw traffic captures organized by configurationdata/train_test_WF/: Pre-processed pickle files ready for WF attack training (33 configurations, generated fromtraffic_captures/using thetransform_to_ml.ipynbnotebook)data/overheads/: Overhead analysis data including latency and bandwidth measurements
See data/README.md for complete dataset documentation.
Time Estimates:
Setup and Environment:
- Initial setup: 20-30 minutes
Individual Experiments:
- WF Attack training (per configuration, 5-fold CV): 4-8 hours with GPU, 20-40 hours CPU-only
- Flow correlation training: 12-16 hours with GPU
- Feature importance analysis: 2-3 hours
- Defense overhead analysis: 30-45 minutes
Artifact Repository: https://github.com/spring-epfl/WF4NYM-artifacts
Datasets: Zenodo DOI: https://doi.org/10.5281/zenodo.17840656
The Zenodo repository contains the complete datasets as separate ZIP archives:
full_list.zip- Complete dataset with monitored and unmonitored websitesreduced_list.zip- Various Nym mixnet configurations and defense mechanismstraffic_captures.zip- Individual trace files for all configurationstrain_test_WF.zip- Pre-processed pickle files ready for WF attack trainingoverheads.zip- Overhead analysis data (latency, bandwidth, traffic volume)
Permanent Archive: The artifact is permanently archived on Zenodo with a specific DOI for long-term accessibility.
There are two ways to setup the environment. If you want to verify the functionality of the code on a subset of the data, we recommend you use our docker setup in Testing the Environment otherwise here is the full stup for the pipeline.
Step 1: Clone the repository
git clone https://github.com/spring-epfl/WF4NYM-artifacts.git
cd WF4NYM-artifactsStep 2: Set up Python virtual environment
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pipStep 3: Install dependencies
pip install -r requirements.txtExpected output: All packages install successfully without errors.
Step 4: Download datasets from Zenodo
Download the dataset ZIP files from Zenodo (https://doi.org/10.5281/zenodo.17840656) and extract them into the data/ directory.
Option 1: Automated download script (recommended):
./download_data.shOption 2: Manual download:
cd data/
# Download and extract each dataset
wget https://zenodo.org/records/17840656/files/full_list.zip
wget https://zenodo.org/records/17840656/files/reduced_list.zip
wget https://zenodo.org/records/17840656/files/traffic_captures.zip
wget https://zenodo.org/records/17840656/files/train_test_WF.zip
wget https://zenodo.org/records/17840656/files/overheads.zip
# Extract all datasets
unzip full_list.zip
unzip reduced_list.zip
unzip traffic_captures.zip
unzip train_test_WF.zip
unzip overheads.zip
cd ..Expected output: All ZIP files are downloaded and extracted successfully. The data/ directory should contain approximately 100GB of data across all subdirectories.
The artifact can be tested using Docker to ensure a reproducible environment. The following steps demonstrate the complete workflow: processing raw packet captures (PCAP files) into machine learning format, then using this processed data for feature importance analysis and website fingerprinting attacks.
Step 1: Build and launch the Docker container (~10 minutes)
git clone https://github.com/spring-epfl/WF4NYM-artifacts.git
cd WF4NYM-artifacts
docker compose up --buildExpected output: Container builds successfully and Jupyter Lab starts on http://localhost:8888
Step 2: Open Jupyter Lab and start a terminal
- Navigate to
http://localhost:8888/labin your web browser - Click "Terminal" to open a terminal session inside the container
Step 3: Test traffic processing pipeline (~1 minute)
In the Jupyter Lab terminal, run:
python3 /workspace/captures/process_raw_packets/pipeline.py \
--pcap-folder /workspace/data/data_test \
--datasets data-normal \
--output-folder /workspace/dataExpected output: The pipeline processes the test dataset and creates the following populated directories in http://localhost:8888/lab/workspaces/auto-D/tree/data:
1_extracted_pcaps- Extracted PCAP files2_aggregated_websites- Aggregated website traffic3_ml_format- Machine learning format files (includingdata.pkl)4_individual_traces- Individual trace files
Step 4: Test feature importance analysis (~2 minutes)
- Navigate to
http://localhost:8888/lab/workspaces/auto-u/tree/feature_importance/feature_importance.ipynbin your web browser - Run all cells in the notebook (Run -> Run all cells)
Expected output: The final cells display feature importance results with values close to 100%, since the test dataset contains two highly separable website classes.
Step 5: Test website fingerprinting attacks (~45 minutes)
In the Jupyter Lab terminal, run:
cd /workspace/WF_attacks/explainwf-popets2023.github.io/ml/code
python3 -m venv venv
source venv/bin/activate
pip install -r ../requirements.txt # ~3 minutes
python train_test.py /workspace/data/3_ml_format/data.pkl /workspace/output # ~40 minutesExpected output:
- 5-fold cross-validation results are saved to
http://localhost:8888/lab/workspaces/auto-D/tree/output - Accuracy metrics should be close to 1.0 (100%) since the test dataset has two highly separable classes
Website fingerprinting attacks (k-FP, DF, Tik-Tok) achieve high accuracy on traffic captured through Tor and Nym networks. This claim is reproducible by executing Experiment 1. We report these results in Table 2, Table 4, Table 6, Table 7 and Table 9 of our paper.
The MixMatch-based flow correlation attack can correlate traffic flows between different observation points in the Nym mixnet. This claim is reproducible by executing Experiment 2. We report these results in Figure 7 of our paper.
Packet counts and timing features are the most important features for website fingerprinting attacks. This claim is reproducible by executing Experiment 3. We report these results in Table 3, Table 5, Table 8 and Table 9 of our paper.
WTF4NYM defense reduces attack accuracy while introducing acceptable bandwidth and time overhead. This claim is reproducible by executing Experiment 4. We report these results in Table 2, Table 4, Table 6 and Table 7 of our paper.
The artifact is organized into modular components, each with detailed instructions in their respective README files:
- Time: 4-8 hours with GPU per configuration (5-fold CV), 20-40 hours CPU-only
- Storage: ~20GB per configuration
This experiment reproduces Main Result 1. Evaluate WF attacks (k-FP, DF, Tik-Tok, SVM) on captured traffic with different defense configurations.
Location: WF_attacks/
Documentation: See WF_attacks/README.md for detailed instructions on:
- Setup (cloning ExplainWF framework and applying patches)
- Training models with 5-fold cross-validation
- Dataset format (pickle files in
data/train_test_WF/) - Expected outputs and evaluation metrics
Key Command:
cd WF_attacks/explainwf-popets2023.github.io/ml/code
python train_test.py <output_dir> <pickle_files...>- Time: 12-16 hours with GPU
- Storage: ~15GB
This experiment reproduces Main Result 2. Train and evaluate MixMatch-based flow correlation models to match traffic at different observation points.
Location: correlation/
Documentation: See correlation/README.md for detailed instructions on:
- Data preparation pipeline
- Training correlation models with/without defense
- ROC curve generation and comparison
- Evaluation metrics
Key Script: correlation/launch_training.sh
- Time: 2-3 hours
- Storage: ~5GB
This experiment reproduces Main Result 3. Analyze which traffic features are most important for WF attacks using Random Forest feature importance.
Location: feature_importance/
Documentation: See feature_importance/README.md for detailed instructions on:
- Feature extraction methodology
- Random Forest training and feature importance computation
- Visualization of results
Key Notebook: feature_importance/feature_importance.ipynb
- Time: 30-45 minutes (using pre-captured data)
- Storage: ~10GB
This experiment reproduces Main Result 4. Analyze pre-captured traffic with WTF4NYM defense applied and measure bandwidth/time overhead.
Location: captures/
Documentation: See captures/README.md and captures/analysis/README.md for detailed instructions on:
- Traffic capture pipeline
- Defense implementation and parameter configuration
- Overhead computation methodology
- Data transformation for ML experiments
Key Notebooks:
captures/analysis/overheads.ipynb- Defense overhead analysiscaptures/analysis/transform_to_ml.ipynb- Convert captures to ML format
Location: data/
Documentation: See data/README.md for:
- Complete dataset structure and organization
- Mapping between configurations and defense parameters
- Dataset sizes and descriptions
- Pre-processed pickle files for WF attacks
Traffic Collection:
- The original traffic collection scripts in
captures/are provided but are not reproducible for artifact evaluation - Requires specific network setup and Nym API keys
- Traffic capture is a long-running process (days to weeks depending on the dataset size)
- Network conditions, Tor circuit selection, and Nym network state vary over time, so recaptured traffic will differ from our datasets
- Mitigation: We provide complete pre-captured datasets for all experiments, so reviewers do not need to run the capture scripts
Despite this limitation, the artifact is Functional (all components can be executed) and Reproduced (main results can be validated within acceptable variance).
Modularity: Each component can be used independently:
captures/: Traffic collection pipeline and defense implementationsWF_attacks/: Website fingerprinting attack evaluation frameworkcorrelation/: Flow correlation attack implementationfeature_importance/: Feature analysis toolsdata/: Organized datasets with clear directory structure
Documentation: Each component directory contains a detailed README:
captures/README.md- Traffic capture and processing pipelineWF_attacks/README.md- WF attack evaluation and ExplainWF integrationcorrelation/README.md- Flow correlation experimentsfeature_importance/README.md- Feature analysis methodologydata/README.md- Dataset organization and descriptions
This artifact is released under the MIT License. See the LICENSE file in the repository root for full license text.
If there are questions about our tools or paper, please either file an issue or contact eric.jolles (AT) epfl.ch
You can cite our work with the following BibTeX entry:
@inproceedings{jolles2026WFonNym,
author = {Jollès, Eric and Wicky, Simon and Piotrowska, Ania M. and Halpin, Harry and Troncoso, Carmela},
booktitle = {},
title = {{Website Fingerprinting on Nym: Attacks and Defenses}},
year = {2026},
}