Skip to content

Latest commit

 

History

History
83 lines (54 loc) · 2.42 KB

File metadata and controls

83 lines (54 loc) · 2.42 KB

RoSA: Enhancing Parameter-Efficient Fine-Tuning via RoPE-aware Selective Adaptation in Large Language Models

This repository contains the code for training and evaluating the RoSA model. It includes environment setup instructions, training and inference scripts, key configuration parameters, and dataset usage.

1. Environment Setup

To reproduce our results, we recommend setting up the environment using Conda.

Option 1: Using rosa_environment.yml (Recommended)

You can restore the full Conda environment in one step:

conda env create -f rosa_environment.yml
conda activate rosa

This YAML file includes exact versions of Python, PyTorch, DeepSpeed, PEFT, Transformers, and other necessary dependencies.

Option 2: Manual Setup via requirements.txt

Alternatively, you can create the environment manually:

conda create -n rosa python=3.10
conda activate rosa
pip install -r requirements.txt

The requirements.txt includes core Python packages required for RoSA training and evaluation.

Core Library Versions

This project has been tested with the following key dependencies:

  • torch==2.1.2
  • transformers==4.47.1
  • peft==0.16.0
  • deepspeed==0.12.6

CUDA and DeepSpeed Compatibility

DeepSpeed compiles CUDA extensions at runtime. Please ensure:

  • Your local CUDA Toolkit version matches the version used to compile your installed torch package.
  • The required CUDA runtime libraries and compiler are installed and available in your environment.

2. Training and Inference

Training and evaluation scripts are provided in the script/ directory:

  • run_rosa.sh: Script for model training
  • run_eval.sh: Script for model evaluation

Each script contains inline comments that help you modify critical training arguments such as learning rate, batch size, critical settings, etc.

Example usage:

bash script/run_rosa.sh
bash script/run_eval.sh

3. Key Model Parameters

Below are the custom arguments introduced by RoSA that control model behavior:

LOW_RATIO="0.25"         # Ratio of low-frequency attention dimensions
DYNA_RATIO="0.5"         # Proportion of layers to train
USE_LORA_GATE="yes"      # Use LoRA-style gated linear projections
LORA_DIM="128"           # Dimension of LoRA-based Linear Layer

4. Data

Training and test data are placed in the following directories:

data/
├── train/   # Training samples
└── test/    # Testing samples