Skip to content

NJU-LINK/T2AV-Compass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

Project Page Dataset arXiv

Objective evaluation on Linux is now wrapped by two top-level scripts: setup_objective.sh and run_objective_batch.sh.

Objective Reproduction on Linux

This flow was validated on a Linux server with a single 4090 GPU. The public interface is repository-relative by default and can be overridden with environment variables when you need a different cache or Conda location.

Default relative layout:

  • Repository root: T2AV-Compass/
  • Input videos: input/
  • Prompts file: t2av-compass/Data/prompts.json
  • Output directory: Output/
  • Cache root: .cache/t2av-cache
  • Conda envs: .cache/conda/envs

1. Clone the repository

Use submodules.

git clone --recurse-submodules https://github.com/NJU-LINK/T2AV-Compass.git
cd T2AV-Compass

If GitHub is slow in your region, you can optionally clone through a mirror instead. Keep the checked-out repository layout unchanged.

Optional environment overrides before setup:

export T2AV_CACHE_ROOT=/path/to/cache-root
export T2AV_CONDA_ROOT=/path/to/conda-root
export HF_ENDPOINT=https://huggingface.co
# In mainland China, set this explicitly instead:
# export HF_ENDPOINT=https://hf-mirror.com
# optional when GitHub downloads need a mirror
export T2AV_GITHUB_MIRROR_PREFIX=https://your-mirror.example

2. Install all objective environments and checkpoints

bash setup_objective.sh

What this script does:

  • installs system packages such as ffmpeg
  • creates all required conda environments
  • downloads checkpoints for DOVER, AudioBox, ImageBind, Synchformer, and LatentSync
  • pre-creates cache directories under .cache/ by default

The script is safe to re-run.

3. Prepare videos and prompts

Put videos into input/.

Supported video naming conventions for prompt-linked metrics (T-V, T-A) include:

  • sample_0001.mp4
  • sample_0002.mp4
  • 1.mp4
  • 0001.mp4
  • video_0001.mp4

The index field in prompts.json must match the video file index.

Example layout:

T2AV-Compass/
├── input/
│   ├── sample_0001.mp4
│   └── sample_0002.mp4
├── Output/
├── setup_objective.sh
├── run_objective_batch.sh
└── t2av-compass/
    └── Data/
        └── prompts.json

Minimal t2av-compass/Data/prompts.json example:

[
  {
    "index": 1,
    "prompt": "A person speaking directly to the camera.",
    "video_prompt": "A person speaking directly to the camera.",
    "audio_prompt": "clean speech from a person speaking indoors",
    "speech_prompt": []
  },
  {
    "index": 2,
    "prompt": "A person speaking directly to the camera.",
    "video_prompt": "A person speaking directly to the camera.",
    "audio_prompt": "clean speech from a person speaking indoors",
    "speech_prompt": []
  }
]

4. Run the full objective batch

Default paths:

bash run_objective_batch.sh

Custom paths:

bash run_objective_batch.sh /abs/path/to/input /abs/path/to/prompts.json /abs/path/to/output

The batch runs all objective metrics:

  • VT: video technical quality
  • VA: video aesthetic quality
  • AA: audio aesthetic quality
  • SQ: speech quality
  • T-V: text-video alignment
  • T-A: text-audio alignment
  • A-V: audio-video alignment
  • DeSync: audio-video synchronization error
  • LS: lip-sync quality

5. Check outputs

After a successful run, Output/ contains:

  • video_technical.json
  • video_aesthetic.json
  • audio_aesthetic.json
  • speech_quality.json
  • text_video_alignment.json
  • text_audio_alignment.json
  • audio_video_alignment.json
  • av_sync.json
  • lipsync.json
  • evaluation_summary.json

Single-Metric Debug Commands

Run these from the repository root.

bash t2av-compass/scripts/eval_video_technical.sh input Output
bash t2av-compass/scripts/eval_video_aesthetic.sh input Output
bash t2av-compass/scripts/eval_audio_aesthetic.sh input Output
bash t2av-compass/scripts/eval_speech_quality.sh input Output
bash t2av-compass/scripts/eval_text_video_alignment.sh input t2av-compass/Data/prompts.json Output
bash t2av-compass/scripts/eval_text_audio_alignment.sh input t2av-compass/Data/prompts.json Output
bash t2av-compass/scripts/eval_audio_video_alignment.sh input Output
bash t2av-compass/scripts/eval_av_sync.sh input Output
bash t2av-compass/scripts/eval_lipsync.sh input Output

Notes

  • No manual Hugging Face login is required for the checkpoints used in the validated objective flow.
  • The default Hugging Face endpoint is the official https://huggingface.co. In mainland China, explicitly set HF_ENDPOINT=https://hf-mirror.com; in other regions this is usually unnecessary.
  • You can override cache and Conda locations with T2AV_CACHE_ROOT, T2AV_CONDA_ROOT, or T2AV_CONDA_EXE.
  • The first DeSync run downloads an additional large MotionFormer checkpoint.
  • LS is intended for talking-face videos. For non-talking-face content, the score is not meaningful even if the script finishes.
  • Re-running setup_objective.sh or run_objective_batch.sh is supported.

Troubleshooting

  • 视频文件未找到 (index: N): rename the file to match one of the supported index patterns, or fix the index field in prompts.json.
  • ffmpeg not found: run bash setup_objective.sh again on a Debian/Ubuntu-like system with package manager access.
  • mirror/network failures during checkpoint download: retry first; if needed, set HF_ENDPOINT or T2AV_GITHUB_MIRROR_PREFIX before running setup.
  • LS fails on a batch with no visible speaking face: use talking-face videos for this metric.

Project Overview

T2AV-Compass is a unified benchmark for evaluating Text-to-Audio-Video generation across:

  • unimodal quality
  • cross-modal alignment and synchronization
  • checklist-based subjective evaluation

The benchmark includes 500 prompts and associated checklist annotations. For subjective evaluation and repository internals, see t2av-compass/README.md.

About

The Source Code for T2AV-Compass

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors