Skip to content

Su-informatics-lab/PGSFormer-AUD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PGSFormer: A Transformer-Based Framework for AUD Prediction from Polygenic Scores

PGSFormer is a transformer-based model for alcohol use disorder prediction from polygenic scores and covariates. In this implementation, each PGS value is treated as a token, projected into a latent embedding space, and passed through a stack of transformer encoder layers. The encoded token sequence is aggregated by attention pooling to produce a global PGS representation.

To preserve direct signal from the original PGS vector, PGSFormer includes a residual shortcut branch. The raw PGS input is mapped by a linear layer and added to the pooled transformer representation through a learnable positive scaling gate. An auxiliary head on the pooled PGS representation produces an auxiliary PGS score, while the final AUD prediction is made after concatenating the PGS representation with non-EEG covariates.

This repository contains the final no-EEG configuration used for the main PGSFormer experiment:

  • Inputs: 11 PGS features plus SEX and CONTROL
  • Backbone: transformer encoder with attention pooling
  • Residual shortcut: enabled
  • Auxiliary PGS head: enabled
  • Reconstruction decoder: disabled
  • d_model=64
  • n_layers=1
  • n_heads=4
  • dropout=0.25
  • lr=1e-4
  • weight_decay=1e-5
  • batch_size=512
  • epochs=200
  • patience=50

The main training objective is focal loss on the AUD classifier output. The final model also includes an auxiliary binary cross-entropy loss on the auxiliary PGS head and a residual binary cross-entropy loss on the shortcut branch.

Repository Structure

  • train_pgsformer_no_eeg.py: main training script
  • model/pgsformer.py: PGSFormer model definition
  • data/coga.py: dataset loader
  • utils/config.py: fold-aware path expansion
  • utils/misc.py: random seed helper
  • run_train.sh: five-fold launcher for the final configuration

Data Layout

The training script expects fold-specific data under a root directory:

  • DATA_ROOT/fold_0/learning_set.csv
  • DATA_ROOT/fold_0/validation_set.csv
  • DATA_ROOT/fold_1/learning_set.csv
  • DATA_ROOT/fold_1/validation_set.csv
  • ...

Within learning_set.csv, rows labeled subset=train are used for training and rows labeled subset=test are used for validation. The external evaluation split is read from validation_set.csv.

Run

Set DATA_ROOT in run_train.sh, then run:

bash run_train.sh

About

AI-based Integration of Polygenic Scores for Alcohol Use Disorder Risk Prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors