This repository is the official implementations of EEGMixer in pytorch-lightning style:
D.-H. Lee, S.-J. Kim, H. Kong, and S.-W. Lee, "Train Once, Transfer Anywhere: Toward Device-Homogeneous MI-EEG Decoding," 2025. (Under Review)
| EEGMixer | EEGMixer Block |
|---|---|
![]() |
![]() |
Electroencephalogram (EEG) has emerged as a key modality, facilitating the development of brain-computer interface (BCI). Motor imagery (MI), one of the BCI paradigms, has garnered significant attention due to its dual role in motor rehabilitation and daily activity augmentation. Generalizing the decoding of MI-based EEG signals is essential for utilizing BCI systems in real-world environments. While transfer learning facilitates generalization by bridging structural differences across datasets, its deployment is hindered by the challenge of device heterogeneity. While recent studies have attempted to address this limitation, they often require additional preprocessing or dataset-specific architectural modifications. To address these, we propose the EEGMixer that eliminates the need for dataset-specific adaptation. The EEGMixer comprises three key innovations: i) the dynamic spatial hypernetwork for addressing the challenge of device heterogeneity by utilizing a temporally conditioned spatial weight, ii) the mosaic positional encoding that applies absolute and relative encodings along spatial and temporal domains to focus on domain-relevant information, and iii) the orchestration of domain information that extracts informative features by orchestrating EEG representations in spatial and temporal domains and subsequently integrates them to form a unified representation. The EEGMixer achieved competitive performances on each dataset and was extensively validated under six cross-dataset transfer settings across multiple datasets. These demonstrate that the EEGMixer is the first model to enable effective cross-dataset generalization without requiring dataset-specific architectural modifications. Notably, this is the first attempt to validate that a unified architecture can achieve the consistent transferability without the need for dataset-specific adaptation. Hence, we demonstrate the possibility of the EEGMixer to address the challenge of device heterogeneity and enable generalizable decoding across multiple datasets.
| Model | BCIC IV–2a (Brunner et al. 2008) | BCIC IV–2b (Leeb et al. 2008) | Zhou (Zhou et al. 2016) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Acc | Kappa | F1-score | Acc | Kappa | F1-score | Acc | Kappa | F1-score | |
| ShallowConvNet | 0.5976 | 0.4630 | 0.5752 | 0.7558 | 0.5167 | 0.7475 | 0.6660 | 0.4998 | 0.6510 |
| DeepConvNet | 0.5756 | 0.4338 | 0.5640 | 0.7657 | 0.5235 | 0.7656 | 0.5135 | 0.2710 | 0.4911 |
| EEGNet | 0.6069 | 0.4755 | 0.5912 | 0.7457 | 0.5098 | 0.7457 | 0.6532 | 0.4806 | 0.6314 |
| EEGConformer | 0.5532 | 0.4039 | 0.5375 | 0.7391 | 0.4766 | 0.7333 | 0.7162 | 0.5910 | 0.7162 |
| DFformer | 0.5841 | 0.4455 | 0.5837 | 0.7618 | 0.5208 | 0.7552 | 0.7546 | 0.6323 | 0.7433 |
| Proposed | 0.6231 | 0.4971 | 0.6143 | 0.7467 | 0.4925 | 0.7416 | 0.7561 | 0.6343 | 0.7443 |
| Model | BCIC IV–2a (Brunner et al. 2008) | BCIC IV–2b (Leeb et al. 2008) | Zhou (Zhou et al. 2016) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Acc | Kappa | F1–score | Acc | Kappa | F1–score | Acc | Kappa | F1–score | |
| Baseline | 0.6231 | 0.4971 | 0.6143 | 0.7467 | 0.4933 | 0.7418 | 0.7561 | 0.6343 | 0.7443 |
| Fine–tuning Strategy | BCIC IV–2b → BCIC IV–2a | BCIC IV–2a → BCIC IV–2b | BCIC IV–2a → Zhou | ||||||
| + DSH | 0.2635 | 0.0183 | 0.1463 | 0.7358 | 0.4718 | 0.7165 | 0.6882 | 0.5339 | 0.6315 |
| + DSH + Classification head | 0.5214 | 0.3613 | 0.5062 | 0.7609 | 0.5213 | 0.7567 | 0.6952 | 0.5428 | 0.6815 |
| Full fine–tuning | 0.6175 | 0.4896 | 0.6098 | 0.7540 | 0.5072 | 0.7488 | 0.7538 | 0.6308 | 0.7416 |
| Fine–tuning Strategy | Zhou → BCIC IV–2a | Zhou → BCIC IV–2b | BCIC IV–2b → Zhou | ||||||
| + DSH | 0.3532 | 0.1370 | 0.3061 | 0.6890 | 0.3770 | 0.6838 | 0.6472 | 0.4826 | 0.4960 |
| + DSH + Classification head | 0.4959 | 0.3274 | 0.4809 | 0.7222 | 0.4432 | 0.7155 | 0.6970 | 0.5455 | 0.6883 |
| Full fine–tuning | 0.6055 | 0.4736 | 0.5979 | 0.7496 | 0.4984 | 0.7437 | 0.7191 | 0.5789 | 0.7066 |
- (a) Class-wise spatial weights for four MI tasks across four representative indices of the DSS, #7, #19, #39, and #40
- (b) Temporal dynamics of applying spatial weights across a randomly selected index of the DSS, #19
3.2 Comparison of the attention entropy-based analysis and the visualization of the attention maps between the EEGMixer and the DFformer
- Distribution of the attention entropy across (a) spatial and (b) temporal domains, respectively. Visualization of the attention maps from (c) the EEGMixer and (d) the DFformer
- (a) Class-conditional temporal contribution of the temporal experts in the MOTE across different MI tasks
- (b) Class-wise spatial contribution of the spatial experts in the MOSE across EEG channels
- All visualizations are extracted from the output of Block #1 in the EEGMixer
3.4 Expected calibration error (ECE) across different MI tasks and fine-tuning strategies under two cross-dataset settings
| Model | Left | Right | Feet | Tongue | Avg. |
|---|---|---|---|---|---|
| Baseline | 0.0800 | 0.1230 | 0.0900 | 0.0540 | 0.0868 |
| Fine–tuning Strategy | BCIC IV–2b → BCIC IV–2a | ||||
| + DSH | 0.0780 | 0.2480 | 0.1470 | 0.1040 | 0.1444 |
| + DSH + Classification head | 0.0780 | 0.0750 | 0.0630 | 0.0400 | 0.0640 |
| Full fine–tuning | 0.0800 | 0.1140 | 0.0710 | 0.0430 | 0.0771 |
| Fine–tuning Strategy | Zhou → BCIC IV–2a | ||||
| + DSH | 0.0770 | 0.0750 | 0.1570 | 0.1390 | 0.1119 |
| + DSH + Classification head | 0.0630 | 0.0440 | 0.0930 | 0.1660 | 0.0916 |
| Full fine–tuning | 0.0820 | 0.1150 | 0.0670 | 0.0280 | 0.0728 |




