This is a temporary codebase for "DUDA: Distilled Unsupervised Domain Adaptation for Lightweight Semantic Segmentation". Most parts of the code is built on top of MIC (https://github.com/lhoyer/MIC/tree/master/seg) and DAFormer (https://github.com/lhoyer/DAFormer/tree/master). Full codebase will be uploaded after the final decision. This code mainly involves DUDA_MIC for MIC-B0.
We mainly modify the three files (/mmseg/models/uda/dacs.py, /mmseg/models/uda/uda_decorator.py, and /mmseg/models/builder.py) from the original files. We would like to mention that, since these codes are mostly based on the DAFormer/MIC, the original copyrights (with their affiliation and names) are included in the code.
Please refer the DAFormer setup (dataset, checkpoints from imagenet pre-training, libraries) (https://github.com/lhoyer/DAFormer/tree/master).
There are several config files in the /configs/duda/
gtaHR2csHR_mic_hrda_s0.py: baseline MiT-B5 model (no DUDA) gtaHR2csHR_mic_hrda_s0_mitb0.py: baseline MiT-B0 model (no DUDA) gtaHR2csHR_mic_hrda_s2_kd_mitb0.py: MiT-B0 with DUDA (pre-adaptation) gtaHR2csHR_mic_hrda_s0_two_step_kd_mitb0_pretrained_inconsistency.py: MiT-B0 with DUDA (fine-tuning)
We will use the following command to train the models: python run_experiments.py --config configs/duda/filename.py
There are two steps to start training with DUDA.
First step is the pre-adaptation. we can directly run the experiments using the command. It will automatically create the log and checkpoint in the "work_dirs/local-basic" directory.
Second step is the fine-tuning. we need to change some lines in the config and codebase. We need to measure the inconsistency between large teacher and small student using the log file. Please take a look into "inconsistency.py" Running "python inconsistency.py" will print the unnormalized inconsistency values. (see Eq. 11 for normalization) Then, the normalized inconsistency values are inserted in /mmseg/models/uda/dacs.py (line 639)
Also, we need to set the checkpoint of pretrained teacher and student networks. Go to configs for the fine-tuning. The pretrained teacher (pretrained_ema) is from the open source (MIC) and The pretrained student (pretrained_kd) is the saved model from our pre-adaptation. For example,
- pretrained_ema="work_dirs/gtaHR2csHR_mic_hrda_650a8/iter_40000_relevant.pth"
- pretrained_kd="work_dirs/local-basic/gtaHR2csHR_mic_hrda_s2_kd_mitb0_b2bda/iter_40000.pth"
Now, we can run the command to fine-tune the model.
We can run the experiments by the following command: bash test.sh work_dirs/directory_name
For example, "bash test.sh work_dirs/gtaHR2csHR_mic_hrda_s0_two_step_kd_mitb0_pretrained_inconsistency_a2ec4" It will give the results for Table 1 (last row) as below:
+---------------+-------+-------+
| Class | IoU | Acc |
+---------------+-------+-------+
| road | 97.12 | 98.87 |
| sidewalk | 78.31 | 89.01 |
| building | 90.61 | 95.84 |
| wall | 56.28 | 65.25 |
| fence | 50.99 | 62.96 |
| pole | 56.78 | 64.23 |
| traffic light | 58.28 | 75.55 |
| traffic sign | 68.05 | 72.52 |
| vegetation | 91.24 | 95.91 |
| terrain | 49.15 | 60.23 |
| sky | 93.7 | 98.32 |
| person | 76.4 | 85.05 |
| rider | 49.95 | 68.15 |
| car | 93.31 | 95.67 |
| truck | 76.66 | 83.84 |
| bus | 82.37 | 86.12 |
| train | 71.16 | 83.59 |
| motorcycle | 56.69 | 78.56 |
| bicycle | 65.49 | 80.09 |
+---------------+-------+-------+
Summary:
+-------+-------+-------+
| aAcc | mIoU | mAcc |
+-------+-------+-------+
| 94.83 | 71.71 | 81.04 |
+-------+-------+-------+
Note, in test.sh (line 12), we currently set the file name to "iter_80000.pth" since the fine-tuning involves 80k iterations.
MiT-B5 and MiT-B0 models Memory (parameters) and FLOPs for a single input image can be measured by the following commands python efficiency.py --config configs/duda/gtaHR2csHR_mic_hrda_s0.py # for MiT-B5 python efficiency.py --config configs/duda/gtaHR2csHR_mic_hrda_s0_mitb0.py # for MiT-B0
Mit-B5 should print this: Input shape: (3, 512, 1024) Flops: 450.93 GFLOPs Params: 85.69 M
MiT-B0 should print this: Input shape: (3, 512, 1024) Flops: 213.08 GFLOPs Params: 7.3 M
The model size (MB) is considered Params * 4 (for MiT-B0, 7.3 * 4 = 29.2 MB). The above-mentioned "Flops" is the total FLOPs (backbone + decoder head). The backbone FLOPs is considered the total FLOPs - decoder head's FLOPs. You can see the FLOPs of each module (including decoder head) being printed.
efficiency.py is based on the SegFormer codebase (https://github.com/NVlabs/SegFormer/blob/master/tools/get_flops.py)
We provide the checkpoint of MiT-B0 (DUDA MIC). Please download the directory from (https://drive.google.com/file/d/1HDijEICBmmuIWOO3OCvLkkaPOnXJuODR/view?usp=sharing) and paste into /work_dirs/local-basic/. Other checkpoints will be also provided after the final decision. Hopefully, the readers can reproduce the last row of Table 1.
DUDA is based on the following open-source projects. We appreciate the authors for sharing their code in public.