πΉ Dual-Detector Reoptimization for Federated Weakly Supervised Video Anomaly Detection via Adaptive Dynamic Recursive Mapping
β If you find our code useful, please consider starring this repository and citing our paper!
π BibTeX Citation (click to expand)
@ARTICLE{11036561,
author={Su, Yong and Li, Jiahang and An, Simin and Xu, Hengpeng and Peng, Weilong},
journal={IEEE Transactions on Industrial Informatics},
title={Dual-Detector Reoptimization for Federated Weakly Supervised Video Anomaly Detection via Adaptive Dynamic Recursive Mapping},
year={2025},
volume={21},
number={9},
pages={7046-7056},
keywords={Adaptation models;Training;Anomaly detection;Feature extraction;Surveillance;Optimization;Accuracy;Privacy;Detectors;Semantics;Adaptive dynamic recursive mapping;adaptive local aggregation;federated;scene-similarity;video anomaly detection (VAD);weakly supervised},
doi={10.1109/TII.2025.3574406}}
- π 2025.10 Guide Released: We have uploaded the first complete guide for deploying federated learning on the NVIDIA Jetson AGX Xavier. π View Full Deployment Guide β
- π 2025.09 Supplementary Appendix Released: We have submitted our Supplementary Appendix. You could view it here.
- β¨ 2025.08 Model Release: We are excited to announce the release of our best-performing model weights! The weights are available here.
- π 2025.08 Video Feature: The Shanghaitech and UBnormal video features, extracted using VideoMAEv2, are now available for download. You can find them here.
Federated weakly supervised video anomaly detection represents a significant advancement in privacy-preserving collaborative learning, enabling distributed clients to train anomaly detectors using only video-level annotations. However, the inherent challenges of optimizing noisy representation with coarse-grained labels often result in substantial local model errors, which are exacerbated during federated aggregation, particularly in heterogeneous scenarios. To address these limitations, we propose a novel dual-detector framework incorporating adaptive dynamic recursive mapping, which significantly enhances local model accuracy and robustness against representation noise. Our framework integrates two complementary components: a channel-averaged anomaly detector and a channel-statistical anomaly detector, which interact through cross-detector adaptive decision parameters to enable iterative optimization and stable anomaly scoring across all instances. Furthermore, we introduce the scene similarity adaptive local aggregation algorithm, which dynamically aggregates and learns private models based on scene similarity, thereby enhancing generalization capabilities across diverse scenarios. Extensive experiments conducted on the NVIDIA Jetson AGX Xavier platform using the ShanghaiTech and UBnormal datasets demonstrate the superior performance of our approach in both centralized and federated settings. Notably, in federated environments, our method achieves remarkable improvements of 6.2% and 12.3% in AUC compared to state-of-the-art methods, underscoring its effectiveness in resource-constrained scenarios and its potential for real-world applications in distributed video surveillance systems.
Figure 1: Overview of the proposed dual-detector re-optimization framework featuring adaptive dynamic recursive mapping for federated weakly supervised video anomaly detection (Fed-WSVAD).
- We introduce a dual-detector framework that leverages adaptive dynamic recursive mapping and decision parameter interaction to generate more stable anomaly scores, thereby enhancing detection accuracy.
- We introduce the SSALA algorithm to learn private local models, enabling effective parameter aggregation across clients and mitigating the effects of scene heterogeneity.
- We demonstrated superior detection performance and robustness through experiments on two benchmark datasets, validating the effectiveness of the proposed framework in both federated and centralized settings.
- WSVAD (Centralized Training)
- Federated Setup (Fed-WSVAD)
- Jetson AGX Xavier Deployment
- Feature Extraction Guide (VideoMAE V2 Backbone)
-
Clone the repository:
git clone https://github.com/rekkles2/Fed_WSVAD.git cd Fed_WSVAD -
Create and activate the Conda environment: (Check the
VAD/environment.ymlfile for the specific environment name if it's defined there, e.g.,vad_env)conda env create -f VAD/environment.yml conda activate <your_environment_name> # e.g., conda activate vad_env
python main.pyTo evaluate the pretrained model (e.g., on ShanghaiTech), run the following command:
# Ensure the path to the .pkl model file is correct.
python VAD/inference.py --inference_model='shanghaitech.pkl'Performance comparison (AUC / FAR) on standard benchmarks.
| Dataset | Centralized (AUC / FAR) | FedSSALA (AUC / FAR) | Pretrained Models |
|---|---|---|---|
| π ShanghaiTech | 97.91% / 0.04% | 97.86% / 0.03% | π₯ Download |
| π UBnormal | 70.91% / 0.00% | 76.51% / 0.00% | π₯ Download |
Comparison with state-of-the-art methods on the ShanghaiTech dataset. (* = utilizes ten-crop augmentation during testing, β = centralized training baseline. Bold = best result, italic = second best result.)
| Method | Year | Feature | AUC (%) | FAR (%) |
|---|---|---|---|---|
| MIL-Rank | 2018 CVPR | C3D | 85.33 | 0.15 |
| AR-Net | 2020 ICME | I3D*/MAE | 91.24 / 96.87 | 0.10 / 0.12 |
| RTFM | 2021 ICCV | I3D*/MAE | 97.21 / 96.89 | 1.06 / 0.05 |
| MIST | 2021 CVPR | I3D | 94.83 | 0.05 |
| MSL | 2022 AAAI | I3D* | 96.08 | - |
| UML | 2023 CVPR | I3D | 96.78 | - |
| CLAV-CoMo | 2023 CVPR | I3D* | 97.59 | - |
| RTFM-BERT | 2024 WACV | I3D* | 97.54 | - |
| Ours β | - | MAE | 97.91 | 0.04 |
| Fed-AR-Net (Fedavg) | - | I3D | 85.63 | - |
| Fed-RTFM (Fedavg) | - | I3D | 92.17 | - |
| CAAD (Fedavg) | - | I3D | 95.78 | - |
| CAAD (SSALA) | - | I3D | 96.13 | - |
| Ours (SSALA) | - | MAE | 97.86 | 0.03 |
Comparison with state-of-the-art methods on the UBnormal dataset. (β = centralized training baseline. Bold = best result, italic = second best result.)
| Method | Year | Feature | AUC (%) | FAR (%) |
|---|---|---|---|---|
| MIL-Rank | 2018 CVPR | C3D | 54.12 | - |
| AR-Net | 2020 ICME | I3D* | 62.30 | - |
| RTFM | 2021 ICCV | I3D* | 66.83 | - |
| MIST | 2021 CVPR | I3D* | 65.32 | - |
| OPVAD | 2024 CVPR | CLIP | 62.94 | - |
| VadCLIP | 2024 AAAI | CLIP | 62.32 | - |
| STPrompt | 2024 ArXiv | CLIP | 63.98 | - |
| OCC-WS | 2024 ECCV | I3D | 67.42 | - |
| Ours β | - | MAE | 70.91 | 0.00% |
| Fed-AR-Net (Fedavg) | - | I3D | 65.74 | - |
| Fed-RTFM (Fedavg) | - | I3D | 68.12 | - |
| CAAD (Fedavg) | - | I3D | 67.18 | - |
| CAAD (SSALA) | - | I3D | 71.33 | - |
| Ours (SSALA) | - | MAE | 76.51 | 0.00% |
Ablation analysis of key components in our proposed federated framework. ([Define CAAD, CSAD, TA, EN components here or refer to paper section]. W/O SSALA shows performance using standard FedAvg instead of our SSALA. Size indicates model parameters.)
| CAAD | CSAD | TA | EN | AUC (%) | AUC (%) W/O SSALA | Size (Millions) |
|---|---|---|---|---|---|---|
| β | β | β | β | 95.08 | 88.29 | 1.9M |
| β | β | β | β | 96.37 | 94.39 | 1.9M |
| β | β | β | β | 95.68 | 93.28 | 1.9M |
| β | β | β | β | 96.59 | 91.67 | 1.9M |
| β | β | β | β | 97.86 | 95.90 | 9.9M |
Scene-specific performance comparison on the ShanghaiTech dataset. This analysis evaluates robustness when the definition of 'normal' changes for certain scenes. (β = scene where 'normal' activity definition was redefined (e.g., including bicycles as normal), β³ = scene with unchanged anomaly definition. 'Baseline' refers to centralized baseline; 'Revised' refers to redefined label scenario. 'Ξ Change' shows difference.)
| Scene | 1β³ | 2β³ | 3 | 4β | 5 | 6β | 7 | 8 | 9 | 10β | 11β | 12β | 13 | Avg |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Baseline | 89.77 | 87.35 | 78.00 | 92.28 | 98.31 | 98.08 | 93.58 | 92.02 | 100.00 | 86.58 | 100.00 | 97.84 | 100.00 | 93.31 |
| Revised | 87.16 | 81.91 | 77.12 | 94.65 | 98.47 | 96.59 | 92.03 | 91.13 | 100.00 | 86.09 | 100.00 | 89.13 | 100.00 | 91.39 |
| Ξ Change | -2.61 | -5.44 | -0.88 | +2.37 | +0.16 | -1.49 | -1.55 | -0.89 | 0.00 | -0.49 | 0.00 | -8.71 | 0.00 | -1.92 |
| Ours | 97.63 | 99.22 | 88.52 | 96.28 | 98.51 | 99.55 | 97.54 | 97.31 | 100.00 | 99.69 | 100.00 | 98.94 | 100.00 | 97.86 |
| Revised | 97.75 | 98.29 | 88.43 | 90.82 | 98.81 | 99.97 | 98.86 | 96.89 | 100.00 | 99.41 | 100.00 | 100.00 | 100.00 | 97.62 |
| Ξ Change | +0.12 | -0.93 | -0.09 | -5.46 | +0.30 | +0.42 | +1.32 | -0.42 | 0.00 | -0.28 | 0.00 | +1.06 | 0.00 | -0.24 |
We acknowledge and thank the authors of the following repositories for their valuable open-source contributions:
- Flower Framework: https://github.com/adap/flower
- AR-Net: https://github.com/wanboyang/Anomaly_AR_Net_ICME_2020
- FedALA: https://github.com/TsingZ0/FedALA