Skip to content

CainanD/RefAV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RefAV: Mining Referred Scenarios in Autonomous Vehicle Datasets using LLMs

RefAV Method

A single autonomous vehicle will stream about ~4TB of data per hour with a full stack of camera and lidar sensors. The vast majority of this data comes from uninteresting scenarios -- the ego vehicle driving straight down a lane, possibly with another car in front of it. It can be prohibitively expensive to retrive and label specific scenarios for ego-behaivor evaluation, safety testing, or active learning at scale.

RefAV serves as the baseline for the 2025 Argoverse2 Scenario Mining Challenge. It utilizes an LLM to construct composable function calls from a set of hand-crafted atomic functions such as "turning" or "has_objects_in_front". Given a prompt, the LLM outputs a composable function that narrows down a set of bounding box track predictions to the set that best corresponds to the prompt. Our paper describing the dataset and baseline in detail is available on arXiv

Installation

Using Conda is recommended for environment management

conda create -n refAV python=3.10
conda activate refAV

All of the required libaries and packages can be installed with

pip install -r requirements.txt
export PYTHONPATH=.

Running this code requires downloading the Argoverse2 test and val splits. Run the commands below to download the entire sensor dataset. More information can be found in the Argoverse User Guide.

conda install s5cmd -c conda-forge

export DATASET_NAME="sensor"  # sensor, lidar, motion_forecasting or tbv.
export TARGET_DIR="$HOME/data/datasets"  # Target directory on your machine.

s5cmd --no-sign-request cp "s3://argoverse/datasets/av2/$DATASET_NAME/*" $TARGET_DIR

It also requies downloading the scenario-mining add on.

export TARGET_DIR="$(pwd)/av2_sm_downloads"
s5cmd --no-sign-request cp "s3://argoverse/tasks/scenario_mining/*" $TARGET_DIR

Generating Detections and Tracks

See the LT3D repository for information on training a baseline detector and tracker on the Argoverse 2 dataset.

We provide additional tracking outputs from previous winning Argoverse submissions on the test set here

Running the Code

All of the code necessary for unpacking the dataset, generating referred track predictions, and evaluating the predictions against the ground truth can be found in the tutorial.ipynb file. It also includes some basic tutorials about how to define and visualize a scenario.

Our experimental results and test/val submissions can be reproduced directly by running python run_experiment.py --exp_name <exp_name>. All experiments are found in the experiments.yml file.

Benchmark Evaluation

Metric Description
HOTA-Temporal HOTA on temporally localized tracks
HOTA-Track HOTA on the full length of a track
Timestamp Balanced Accuracy Timestamp level classification metric
Log Balanced Accuracy Data log/scenario level classification metric

Submission Format

Submissions can be made through through the EvalAI CLI. To submit to the to the validation and test sets respectively, create an EvalAI profile and run

pip install evalai
evalai set_token <EvalAI_account_token>
evalai challenge 2469 phase 4899 submit --file /path/to/submission_val.pkl --large
evalai challenge 2469 phase 4898 submit --file /path/to/submission_test.pkl --large

The evaluation expects a dictionary of lists of dictionaries

{
      <(log_id,prompt)>: [
            {
                  "timestamp_ns": <timestamp_ns>,
                  "track_id": <track_id>
                  "score": <score>,
                  "label": <label>,
                  "name": <name>,
                  "translation_m": <translation_m>,
                  "size": <size>,
                  "yaw": <yaw>,
            }
      ]
}
  • log_id: Log id associated with the track, also called seq_id.
  • prompt: The prompt/description string that describes the scenario associated with the log.
  • timestamp_ns: Timestamp associated with the detections.
  • track_id: Unique id assigned to each track, this is produced by your tracker.
  • score: Track confidence.
  • label: Integer index of the object class. This is 0 for REFERRED_OBJECTs, 1 for RELATED_OBJECTs, and 2 for OTHER_OBJECTs
  • name: Object class name.
  • translation_m: xyz-components of the object translation in the city reference frame, in meters.
  • size: Object extent along the x,y,z axes in meters.
  • yaw: Object heading rotation along the z axis.

Example Submission

example_tracks = {
  ('02678d04-cc9f-3148-9f95-1ba66347dff9','vehicle turning left at stop sign'): [
    {
       'timestamp_ns': 315969904359876000,
       'translation_m': array([[6759.51786422, 1596.42662849,   57.90987307],
             [6757.01580393, 1601.80434654,   58.06088218],
             [6761.8232099 , 1591.6432147 ,   57.66341136],
             ...,
             [6735.5776378 , 1626.72694938,   59.12224152],
             [6790.59603472, 1558.0159741 ,   55.68706682],
             [6774.78130127, 1547.73853494,   56.55294184]]),
       'size': array([[4.315736  , 1.7214599 , 1.4757565 ],
             [4.3870926 , 1.7566483 , 1.4416479 ],
             [4.4788623 , 1.7604711 , 1.4735452 ],
             ...,
             [1.6218852 , 0.82648355, 1.6104599 ],
             [1.4323177 , 0.79862624, 1.5229694 ],
             [0.7979312 , 0.6317313 , 1.4602867 ]], dtype=float32),
      'yaw': array([-1.1205611 , ... , -1.1305285 , -1.1272993], dtype=float32),
      'name': array(['REFERRED_OBJECT', ..., 'REFERRED_OBJECT', 'RELATED_OBJECT'], dtype='<U31'),
      'label': array([ 0, 0, ... 0,  1], dtype=int32),
      'score': array([0.54183, ..., 0.47720736, 0.4853499], dtype=float32),
      'track_id': array([0, ... , 11, 12], dtype=int32),
    },
    ...
  ],
  ...
}

Additional Competition Details

  • Language queries are object-centric -- all correspond to some set of objects.
  • Most language queries are given from the third person persective (such as "ego vehicle turning left"). The language queries given from the first-person perspective (such as "the pedestrian on the right") describe objects from the point of view of the ego vehicle.
  • In the case the language query does not refer to an object (such as "raining"), the track bounding boxes should be drawn around the ego vehicle.
  • Scenarios only involve objects within 50 meters from the ego vehicle and within 5 meters of a mapped road.
  • Interacting objects within a scenario are at most 50 meters away from each other.
  • All referred object tracks persist for at least 3 evaluation timestamps (1.5s).

The ego vehicle has the following bounding box across all logs and timestamps 'translation_m': [1.422, 0, 0.25] 'size': [4.877, 2, 1.473] 'yaw': [0]

Contact

Any questions or discussion are welcome! Please raise an issue (preferred), or send me an email.

Cainan Davidson [crdavids@andrew.cmu.edu]

Citation

If you find our paper and code repository useful, please cite us:

@article{davidson2025refav,
  title={RefAV: Towards Planning-Centric Scenario Mining},
  author={Davidson, Cainan and Ramanan, Deva and Peri, Neehar},
  journal={arXiv preprint arXiv:2505.20981},
  year={2025}
}

About

Argoverse 2 scenario mining benchmark and referential programming baseline.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •