PGSFormer: A Transformer-Based Framework for AUD Prediction from Polygenic Scores

PGSFormer is a transformer-based model for alcohol use disorder prediction from polygenic scores and covariates. In this implementation, each PGS value is treated as a token, projected into a latent embedding space, and passed through a stack of transformer encoder layers. The encoded token sequence is aggregated by attention pooling to produce a global PGS representation.

To preserve direct signal from the original PGS vector, PGSFormer includes a residual shortcut branch. The raw PGS input is mapped by a linear layer and added to the pooled transformer representation through a learnable positive scaling gate. An auxiliary head on the pooled PGS representation produces an auxiliary PGS score, while the final AUD prediction is made after concatenating the PGS representation with non-EEG covariates.

This repository contains the final no-EEG configuration used for the main PGSFormer experiment:

Inputs: 11 PGS features plus SEX and CONTROL
Backbone: transformer encoder with attention pooling
Residual shortcut: enabled
Auxiliary PGS head: enabled
Reconstruction decoder: disabled
d_model=64
n_layers=1
n_heads=4
dropout=0.25
lr=1e-4
weight_decay=1e-5
batch_size=512
epochs=200
patience=50

The main training objective is focal loss on the AUD classifier output. The final model also includes an auxiliary binary cross-entropy loss on the auxiliary PGS head and a residual binary cross-entropy loss on the shortcut branch.

Repository Structure

train_pgsformer_no_eeg.py: main training script
model/pgsformer.py: PGSFormer model definition
data/coga.py: dataset loader
utils/config.py: fold-aware path expansion
utils/misc.py: random seed helper
run_train.sh: five-fold launcher for the final configuration

Data Layout

The training script expects fold-specific data under a root directory:

DATA_ROOT/fold_0/learning_set.csv
DATA_ROOT/fold_0/validation_set.csv
DATA_ROOT/fold_1/learning_set.csv
DATA_ROOT/fold_1/validation_set.csv
...

Within learning_set.csv, rows labeled subset=train are used for training and rows labeled subset=test are used for validation. The external evaluation split is read from validation_set.csv.

Run

Set DATA_ROOT in run_train.sh, then run:

bash run_train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PGSFormer: A Transformer-Based Framework for AUD Prediction from Polygenic Scores

Repository Structure

Data Layout

Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
model		model
utils		utils
.gitignore		.gitignore
README.md		README.md
run_train.sh		run_train.sh
train_pgsformer_no_eeg.py		train_pgsformer_no_eeg.py

Folders and files

Latest commit

History

Repository files navigation

PGSFormer: A Transformer-Based Framework for AUD Prediction from Polygenic Scores

Repository Structure

Data Layout

Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages