Skip to content

Latest commit

 

History

History
205 lines (147 loc) · 7.95 KB

File metadata and controls

205 lines (147 loc) · 7.95 KB

Stage II: Tool-Enhanced Large Language Models

This paradigm typically involves two key components: generating actions from the current state, and transitioning between states by executing those actions.


  • Title: ViperGPT: Visual Inference via Python Execution for Reasoning
  • Venue: ICCV 2023
  • GitHub: Link

ViperGPT Demo


  • Title: Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
  • Venue: NeurIPS 2023
  • GitHub: Link

Chameleon Demo


📄 Visprog

  • Title: Visual Programming: Compositional visual reasoning without training
  • Venue: CVPR 2023
  • GitHub: Link

Visprog Demo


  • Title: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
  • Venue: arXiv 2023
  • GitHub: Link

Visual ChatGPT Demo


  • Title: HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
  • Venue: NeurIPS 2023
  • GitHub: Link

HuggingGPT Demo


  • Title: GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
  • Venue: NeurIPS 2023
  • GitHub: Link

GPT4Tools Demo


  • Title: InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
  • Venue: arXiv 2023
  • GitHub: Link

InternGPT Demo


📄 ViotGPT

  • Title: VIoTGPT: Learning to Schedule Vision Tools Towards Intelligent Video Internet of Things
  • Venue: AAAI 2025
  • GitHub: Link

  • Title: MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
  • Venue: arXiv 2023
  • GitHub: Link

MM-REACT Demo


📄 VisRep

  • Title: Self-training large language models for improved visual program synthesis with visual reinforcement
  • Venue: CVPR 2024

VisRep Demo


📄 CRAFT

  • Title: CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets
  • Venue: ICLR 2024
  • GitHub: Link

CRAFT Demo


📄 CLOVA

  • Title: CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update
  • Venue: CVPR 2024
  • GitHub: Link

CLOVA Demo


📄 HYDRA

  • Title: HYDRA
  • Venue: ECCV 2024
  • GitHub: Link

HYDRA Demo


  • Title: ContextualCoder: Adaptive In-context Prompting for Programmatic Visual Question Answering
  • Venue: TMM 2025

📄 ViUniT

  • Title: Visual Unit Tests for More Robust Visual Programming
  • Venue: CVPR 2025
  • GitHub: Link

ViUniT Demo


📄 SYNAPSE

  • Title: SYNAPSE: SYmbolic Neural-Aided Preference Synthesis Engine
  • Venue: AAAI 2025
  • GitHub: Link

SYNAPSE Demo


📄 Naver

  • Title: NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning
  • Venue: ICCV 2025
  • GitHub: Link

Naver Demo


📄 DWIM

  • Title: DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
  • Venue: ICCV 2025
  • GitHub: Link
  • Website: Link

DWIM Demo


📄 LEFT

  • Title: What’s Left? Concept Grounding with Logic-Enhanced Foundation Models
  • Venue: NeurIPS 2023
  • GitHub: Link

LEFT Demo