Skip to content

Minor Typo in Figure Reference #129

@YUCHENYUXI

Description

@YUCHENYUXI

Minor Typo in Figure Reference

A minor typo was found in the paper "VITA: Towards Open-Source Interactive Omni Multimodal LLM", specifically in a figure reference. This issue aims to correct the reference for clarity and accuracy.

Paper: VITA: Towards Open-Source Interactive Omni Multimodal LLM
ArXiv Version: arXiv:2408.05211v3 (30 May 2025)


Issue Details

Section: 3.4.2 Audio Interrupt Interaction

Original Text:

To achieve this, we propose the duplex deployment framework... As illustrated in Fig.1, two VITA models are deployed concurrently.

Proposed Correction:

To achieve this, we propose the duplex deployment framework... As illustrated in Fig. 2, two VITA models are deployed concurrently.


Reasoning

  • The duplex deployment scheme, which involves two concurrently deployed VITA models, is explicitly shown in Figure 2.
  • The paper's "Introduction" section correctly references this architecture, stating: "As shown in Fig. 2, two VITA models are deployed simultaneously: one is responsible for generating responses to user queries, and the other continuously tracks environmental inputs...".
  • Figure 1 is titled "Interaction of VITA" and demonstrates the user interaction flow, not the underlying two-model architecture.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions