Delineating neural contributions to electroencephalogram-based speech decoding

Public repository for Gmail interface papers using uhd-EEG
Author: Motoshige Sato¹, Yasuo Kabe¹, Sensho Nobe¹, Akito Yoshida¹, Masakazu Inoue¹, Mayumi Shimizu¹, Kenichi Tomeoka¹, Shuntaro Sasai^1*
¹Araya Inc.

Data

The dataset is hosted on OpenNeuro (ds007591). 128-channel EEG recorded during overt, minimally overt, and covert speech production of 5 color words (green, magenta, orange, violet, yellow).

Download and setup

Install the OpenNeuro CLI:

npm install -g @openneuro/cli

Or using Deno (recommended):

deno install -Agf jsr:@openneuro/cli

Download the dataset into data/:
```
openneuro download ds007591 data
```
Extract per-trial epochs from the BIDS EDF files:
```
uv run python bids/extract_from_bids.py
```
This creates the per-trial npy files, word lists, and metadata that the preprocessing and training pipelines expect.

Quick start: loading and visualizing data

import mne
import numpy as np
import pandas as pd

# --- Load a single run from BIDS ---
edf_path = "data/sub-1/ses-20230511/eeg/sub-1_ses-20230511_task-minimallyovert_acq-calibration_run-01_eeg.edf"
events_path = edf_path.replace("_eeg.edf", "_events.tsv")

raw = mne.io.read_raw_edf(edf_path, preload=True, verbose=False)
events_df = pd.read_csv(events_path, sep="\t")
data = raw.get_data()  # (139 channels, n_samples) in Volts

print(f"Channels: {data.shape[0]}, Samples: {data.shape[1]}, Sfreq: {raw.info['sfreq']} Hz")
print(f"Trials: {len(events_df)}")
print(events_df.head())

# --- Extract trial 0 ---
SFREQ = 256
N_CH_TOTAL = 139
EPOCH_SAMPLES = 2880  # 6.25 sec * 256 Hz + margin (packets of 8)

trigger = data[N_CH_TOTAL - 1]
onsets = np.where(np.diff(trigger) > 0.5)[0] + 1
onset_sample = onsets[0]
start = (onset_sample // 8 - 359) * 8
epoch = data[:, start:start + EPOCH_SAMPLES]  # (139, 2880) in Volts

# --- Split into 5 repetitions and average ---
DURA_UNIT = 1.25  # seconds per repetition
samples_per_rep = int(DURA_UNIT * SFREQ)  # 320 samples
eeg_128 = epoch[:128]  # EEG channels only

# Extract the last 1600 samples (5 reps x 320 samples)
eeg_trial = eeg_128[:, -samples_per_rep * 5:]  # (128, 1600)
reps = eeg_trial.reshape(128, 5, samples_per_rep)  # (128, 5, 320)
trial_avg = reps.mean(axis=1)  # (128, 320) - trial-averaged EEG

# --- Show label info ---
WORD_LABELS = {0: "green", 1: "magenta", 2: "orange", 3: "violet", 4: "yellow"}
label = events_df.iloc[0]["value"]
print(f"\nTrial 0: label={label}, color={WORD_LABELS[label]}")
print(f"Trial-averaged EEG shape: {trial_avg.shape}")  # (128, 320)
print(f"  Mean amplitude: {trial_avg.mean():.6e} V")
print(f"  Std amplitude:  {trial_avg.std():.6e} V")

Preparation

Install requirements:
```
uv sync
```

Usage

Save preprocessed EEG/EMG:

uv run python plot_figures/make_preproc_files.py

Visualization of preprocessing pipeline (Fig. 1):

uv run python plot_figures/plot_preprocesssing.py

Visualization of volume of speech (Fig. 1) and RMS of EMGs (Fig. 2):
```
uv run python plot_figures/plot_rms.py
```
Quantify the contamination level of EMG to EEG (mutual information, Fig. 2):
```
uv run python plot_figures/plot_mis.py
```

Train decoders. You can specify in parallel_sets which subjects and which sessions' data to train:

uv run python uhd_eeg/trainers/trainer.py -m hydra/launcher=joblib parallel_sets=subject1-1,subject1-2,subject1-3

Copy the trained models and metrics to data/
Run the inference for online data and evaluate metrics (Table 1, 2, Fig. S1):
```
uv run python plot_figures/evaluate_accs.py
```
Visualization of electrodes used when hypothetically reducing electrode density (Fig. S1):
```
uv run python plot_figures/show_montage_decimation.py
```
Analysis on decoding contributions (integrated gradients, Fig.3-5, Fig.S2):
```
uv run python plot_figures/plot_contribution.py
```

For developers

Internal scripts (BIDS conversion, integration tests) require access to raw data on a NAS. Configure paths by creating a .env file in the project root:

# .env
PARTICIPAT_MAPPING_PATH=/path/to/participant_mapping_gmail.json
RAW_ROOT=/path/to/nas/raw_data
BIDS_ROOT=/path/to/nas/bids_output
OPENNEURO_API_KEY=your_api_key

RAW_ROOT: Root directory of the raw EEG data on NAS (contains subject directories)
BIDS_ROOT: Output directory for BIDS conversion on NAS
PARTICIPAT_MAPPING_PATH: Path to the participant name→BIDS ID mapping JSON
OPENNEURO_API_KEY: API key for uploading to OpenNeuro (get one here)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
bids		bids
configs		configs
data		data
docs		docs
plot_figures		plot_figures
tests		tests
uhd_eeg		uhd_eeg
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Delineating neural contributions to electroencephalogram-based speech decoding

Data

Download and setup

Quick start: loading and visualizing data

Preparation

Usage

For developers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Delineating neural contributions to electroencephalogram-based speech decoding

Data

Download and setup

Quick start: loading and visualizing data

Preparation

Usage

For developers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages