请问目前VITA-1.5支持audio video understanding任务吗? 我看在代码中audio是dummy variable ` audio = torch.zeros(1, 400, 80).half().to(device)` VLMEvalKit 实现也是类似 `audio = torch.zeros(400, 80)`
请问目前VITA-1.5支持audio video understanding任务吗?
我看在代码中audio是dummy variable
audio = torch.zeros(1, 400, 80).half().to(device)VLMEvalKit 实现也是类似
audio = torch.zeros(400, 80)