I wonder the video frame #3

Open

opened

Hi, thanks for your great work! When doing the grounding task, will we input all the video sequences into the LLM or just an image?

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests