Minor Typo in Figure Reference
A minor typo was found in the paper "VITA: Towards Open-Source Interactive Omni Multimodal LLM", specifically in a figure reference. This issue aims to correct the reference for clarity and accuracy.
Paper: VITA: Towards Open-Source Interactive Omni Multimodal LLM
ArXiv Version: arXiv:2408.05211v3 (30 May 2025)
Issue Details
Section: 3.4.2 Audio Interrupt Interaction
Original Text:
To achieve this, we propose the duplex deployment framework... As illustrated in Fig.1, two VITA models are deployed concurrently.
Proposed Correction:
To achieve this, we propose the duplex deployment framework... As illustrated in Fig. 2, two VITA models are deployed concurrently.
Reasoning
- The duplex deployment scheme, which involves two concurrently deployed VITA models, is explicitly shown in Figure 2.
- The paper's "Introduction" section correctly references this architecture, stating: "As shown in Fig. 2, two VITA models are deployed simultaneously: one is responsible for generating responses to user queries, and the other continuously tracks environmental inputs...".
- Figure 1 is titled "Interaction of VITA" and demonstrates the user interaction flow, not the underlying two-model architecture.
Minor Typo in Figure Reference
A minor typo was found in the paper "VITA: Towards Open-Source Interactive Omni Multimodal LLM", specifically in a figure reference. This issue aims to correct the reference for clarity and accuracy.
Paper: VITA: Towards Open-Source Interactive Omni Multimodal LLM
ArXiv Version:
arXiv:2408.05211v3(30 May 2025)Issue Details
Section: 3.4.2 Audio Interrupt Interaction
Original Text:
Proposed Correction:
Reasoning