Skip to content

[FEATURE] Add GGUF Model Support via llama.cpp #1

@maocide

Description

@maocide

[FEATURE] Add GGUF Model Support via llama.cpp

User Story: As a user with a VRAM-limited GPU (e.g., 8GB-12GB), I want to be able to use quantized GGUF Vision-Language Models so that I can run the application's core features with reasonable performance and without running out of memory.

Problem: The current application relies exclusively on the transformers library, which loads unquantized PyTorch/SafeTensors models. These models are very large and require significant VRAM (16GB+ recommended), making the application slow or unusable for a large portion of the potential user base.

Proposed Solution: Integrate the llama-cpp-python library as an alternative backend for model loading and inference.

  • Modify vlm_profiles.py to include new loader and generation functions specifically for GGUF models.

  • The UI should allow users to select GGUF models, potentially by pointing to a local file.

  • The application will need to detect the model type and route it to the correct backend (transformers or llama.cpp).

Goal: Dramatically lower the VRAM and system RAM requirements, making PlotCaption accessible and performant for a much wider range of hardware. This is the top priority for improving accessibility.

Source: This feature was suggested and proven to be viable by user willdone on Reddit, who successfully patched in a Q8_0 GGUF for testing.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions