DORO-STVG

1. Evaluation Framework

eval/main.py is the unified entry point. The current code supports:

Models: qwen2.5vl / qwen3vl
Datasets: hcstvg, vidstg, doro-stvg

The default script is eval/run_eval.sh. You can edit it directly to change model paths, annotation paths, video paths, and output paths.

Typical outputs:

results.json: per-sample predictions, parsed outputs, GT, and metrics
status.json: overall summary and averaged metrics

2. Data Engine

graph_generator/ is to generate structured data from raw videos. Based on the current code, the main pipeline includes:

Scene splitting
Object detection and tracking
Attribute generation
Action detection
Relation generation
Cross-shot reference edge generation (optional)
STVG query generation from scene graphs
Formatting query outputs into training-friendly JSONL

Relevant entry points:

graph_generator/main.py: main scene graph generation entry
graph_generator/modules/query_generator_cpsat.py: generate queries from scene graphs
graph_generator/utils/format_train.py: convert query outputs into training format
graph_generator/scripts/run_generator.sh: current command collection used in practice

3. Environment Setup

This repository does not currently use a single root-level setup script. The actual setup should follow the module-specific pyproject.toml files under envs/.

3.1 Requirements

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc

3.2 Virtual Environment

cd /home/wangxingjian/DORO-STVG/envs/eval
uv sync

If uv sync times out on files.pythonhosted.org in this environment, refresh the lock and sync against the configured mirror:

cd /home/wangxingjian/DORO-STVG/envs/eval
uv lock --refresh
uv sync --refresh

cd /home/wangxingjian/DORO-STVG/envs/graph_generator/main
uv sync

This environment is used for:

graph_generator/main.py
the main pipeline modules for attributes, relations, reference edges, and query generation

cd /home/wangxingjian/DORO-STVG/envs/graph_generator/action_detector
uv sync

This separate environment is mainly used by the action detection module to avoid dependency conflicts with the main environment.

3.5 Video Reader Backend

The evaluation script currently defaults to decord:

export FORCE_QWENVL_VIDEO_READER=decord

You can switch to torchvision or torchcodec if needed.

3.6 Extra Configuration for `graph_generator`

graph_generator depends on both model checkpoints and API-related environment variables. The repository already contains graph_generator/.env, and the scripts load it automatically.

The most important variables are:

API_KEYS=your_key_1,your_key_2
MM_API_BASE_URL=https://your-compatible-endpoint

You also need to prepare:

YOLO weights
SAM2 / Grounded-SAM2 checkpoints
VideoMAE action detection checkpoints
DAM or other attribute-description models

For those details, refer to graph_generator/README.md.

4. Usage

4.1 Run Evaluation

cd /home/wangxingjian/DORO-STVG/eval
bash run_eval.sh

If you prefer not to use the shell script, you can call the entry point directly:

cd /home/wangxingjian/DORO-STVG/eval
python main.py run \
  --model_name=qwen3vl \
  --model_path=/path/to/model \
  --data_name=hcstvg2 \
  --annotation_path=/path/to/test.json \
  --video_dir=/path/to/videos \
  --output_dir=./res

4.2 Run the Data Engine

The current run_generator.sh contains the full pipeline command examples, and the bottom part of the script keeps the active query-generation example.

A typical workflow is:

Generate scene_graphs.jsonl
Generate query.jsonl
Convert it into query_train.jsonl

5. Output Data Formats

5.3 Training Data Format

This is the training-friendly formatted output generated from query.jsonl by utils/format_train.py. The main fields include:

videopath
queryid
query
Difficulty
Width / Height
box

box is a trajectory string in the following format:

target description: <frame_idx, time_sec, x1, y1, x2, y2; ... />

Here the coordinates are already normalized to [0, 1] using the video width and height, which makes this format easier to use for training and annotation consumption.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DORO-STVG

1. Evaluation Framework

2. Data Engine

3. Environment Setup

3.1 Requirements

3.2 Virtual Environment

3.5 Video Reader Backend

3.6 Extra Configuration for `graph_generator`

4. Usage

4.1 Run Evaluation

4.2 Run the Data Engine

5. Output Data Formats

5.3 Training Data Format

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

DORO-STVG

1. Evaluation Framework

2. Data Engine

3. Environment Setup

3.1 Requirements

3.2 Virtual Environment

3.5 Video Reader Backend

3.6 Extra Configuration for graph_generator

4. Usage

4.1 Run Evaluation

4.2 Run the Data Engine

5. Output Data Formats

5.3 Training Data Format

3.6 Extra Configuration for `graph_generator`