Skip to content

dakshshah03/SLT

Repository files navigation

VideoMAE SLT (README in progress)

This is an end-to-end ASL sign language translation model designed to be deployable and production-ready for inference. The model is a finetune of VideoMAE pretrained on ssv2 original weights using the ASL-Citizen dataset.

The current state of the model is more for single word translation, generating a sequence of words that may or may not form a grammatically correct sentence (will update this after testing), but future goals include adding temporally-aware translation (possibly via large language models?). I'll be looking for more research that explore this field for better foundation models to use in gesture translation.

Currently, the project is using:

  • Pytorch, Pytorch Lightning, Huggingface transformers
    • model architecture, training loop, initial model weights
    • distributed training (multi-node and multi-gpu)
  • Kubernetes/Kubeflow
    • Distributed multi-node training
  • MLflow (Databricks hosted)
    • experiment analytics and tracking
    • artifact store
  • ONNX and TensorRT
    • Desployment and production-ready for inference
  • GitHub Actions, Docker
    • CI/CD triggering on new production weights (marked in MLflow)
    • building inference container

Setup

TODO:

  • secrets setup instructions
  • github actions workflows/secrets
  • mlflow instructions
  • dataset directory setup instructions
  • dockerhub setup instructions
  • dockerfile (inference)

Training

Theres two options for training.

  1. Kubernetes (using the Kubeflow operator)
  2. Local training

Kubernetes

TODO:

  • kubernetes manifests
    • training
    • pvc

Local training

Requirements

First, you will need to install UV if you haven't already run the following command to install it:

wget -qO- https://astral.sh/uv/install.sh | sh

Then, run

uv sync

to install the required packages to train the model.

About

a deployable ASL SLT model and framework based on VideoMAE

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages