Skip to content

๐Ÿš€ PromptEnhancer with GGUF Support - Efficient quantized inference for text-to-image prompt enhancement. Features Q8_0/Q6_K/Q4_K_M models, 50-75% memory reduction, auto-detection.

License

Notifications You must be signed in to change notification settings

cmlshn/PromptEnhancer-GGUF

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

25 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt Rewriting

Linqing Wang ยท Ximing Xing ยท Yiji Cheng ยท Zhiyuan Zhao ยท Donghao Li ยท Tiankai Hang ยท Jiale Tao ยท QiXun Wang ยท Ruihuang Li ยท Comi Chen ยท Xin Li ยท Mingrui Wu ยท Xinchi Deng ยท Shuyang Gu ยท Chunyu Wangโ€  ยท Qinglin Lu*

Tencent Hunyuan

โ€ Project Lead ยท *Corresponding Author

arXiv Zhihu HuggingFace Model T2I-Keypoints-Eval Dataset Homepage HunyuanImage2.1 Code HunyuanImage2.1 Model


PromptEnhancer Teaser

Overview

Hunyuan-PromptEnhancer is a prompt rewriting utility. It restructures an input prompt while preserving the original intent, producing clearer, layered, and logically consistent prompts suitable for downstream image generation or similar tasks.

  • Preserves intent across key elements (subject/action/quantity/style/layout/relations/attributes/text, etc.).
  • Encourages a "globalโ€“detailsโ€“summary" narrative, describing primary elements first, then secondary/background elements, ending with a concise style/type summary.
  • Robust output parsing with graceful fallback: prioritizes <answer>...</answer>; if missing, removes <think>...</think> and extracts clean text; otherwise falls back to the original input.
  • Configurable inference parameters (temperature, top_p, max_new_tokens) for balancing determinism and diversity.

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅUpdates

Prerequisites

  • Python: 3.8 or higher
  • CUDA: 11.8+ (recommended for GPU acceleration)
  • Storage: At least 20GB free space for models
  • Memory: 8GB+ RAM (16GB+ recommended for 32B models)

Installation

Option 1: Standard Installation (Recommended for most users)

pip install -r requirements.txt

Option 2: GGUF Installation (For quantized models with CUDA support)

chmod +x script/install_gguf.sh && ./script/install_gguf.sh

๐Ÿ’ก Tip: Choose GGUF installation if you want faster inference with lower memory usage, especially for the 32B model.

Model Download

๐ŸŽฏ Quick Start (Recommended)

For most users, we recommend starting with the PromptEnhancer-7B model:

# Download PromptEnhancer-7B (13GB) - Best balance of quality and efficiency
huggingface-cli download tencent/HunyuanImage-2.1/reprompt --local-dir ./models/promptenhancer-7b

๐Ÿ“Š Model Comparison & Selection Guide

Model Size Quality Memory Best For
PromptEnhancer-7B 13GB High 8GB+ Most users, balanced performance
PromptEnhancer-32B 64GB Highest 32GB+ Research, highest quality needs
32B-Q8_0 (GGUF) 35GB Highest 35GB+ High-end GPUs (H100, A100)
32B-Q6_K (GGUF) 27GB Excellent 27GB+ RTX 4090, RTX 5090
32B-Q4_K_M (GGUF) 20GB Good 20GB+ RTX 3090, RTX 4080

Standard Models (Full Precision)

# PromptEnhancer-7B (recommended for most users)
huggingface-cli download tencent/HunyuanImage-2.1/reprompt --local-dir ./models/promptenhancer-7b

# PromptEnhancer-32B (for highest quality)
huggingface-cli download PromptEnhancer/PromptEnhancer-32B --local-dir ./models/promptenhancer-32b

GGUF Models (Quantized - Memory Efficient)

# Create models directory
mkdir -p ./models

# Choose one based on your GPU memory:
# Q8_0: Highest quality (35GB)
huggingface-cli download mradermacher/PromptEnhancer-32B-GGUF PromptEnhancer-32B.Q8_0.gguf --local-dir ./models

# Q6_K: Excellent quality (27GB) - Recommended for RTX 4090
huggingface-cli download mradermacher/PromptEnhancer-32B-GGUF PromptEnhancer-32B.Q6_K.gguf --local-dir ./models

# Q4_K_M: Good quality (20GB) - Recommended for RTX 3090/4080
huggingface-cli download mradermacher/PromptEnhancer-32B-GGUF PromptEnhancer-32B.Q4_K_M.gguf --local-dir ./models

๐Ÿš€ Performance Tip: GGUF models offer 50-75% memory reduction with minimal quality loss. Use Q6_K for the best quality/memory trade-off.

Quickstart

Using HunyuanPromptEnhancer

from inference.prompt_enhancer import HunyuanPromptEnhancer

models_root_path = "./models/promptenhancer-7b"

enhancer = HunyuanPromptEnhancer(models_root_path=models_root_path, device_map="auto")

# Enhance a prompt (Chinese or English)
user_prompt = "Third-person view, a race car speeding on a city track..."
new_prompt = enhancer.predict(
    prompt_cot=user_prompt,
    # Default system prompt is tailored for image prompt rewriting; override if needed
    temperature=0.7,   # >0 enables sampling; 0 uses deterministic generation
    top_p=0.9,
    max_new_tokens=256,
)

print("Enhanced:", new_prompt)

Using GGUF Models (Quantized, Faster)

from inference.prompt_enhancer_gguf import PromptEnhancerGGUF

# Auto-detects Q8_0 model in models/ folder
enhancer = PromptEnhancerGGUF(
    model_path="./models/PromptEnhancer-32B.Q8_0.gguf",  # Optional: auto-detected
    n_ctx=1024,        # Context window size
    n_gpu_layers=-1,   # Use all GPU layers
)

# Enhance a prompt
user_prompt = "woman in jungle"
enhanced_prompt = enhancer.predict(
    user_prompt,
    temperature=0.3,
    top_p=0.9,
    max_new_tokens=512,
)

print("Enhanced:", enhanced_prompt)

Command Line Usage (GGUF)

# Simple usage - auto-detects model in models/ folder
python inference/prompt_enhancer_gguf.py

# Or specify model path
GGUF_MODEL_PATH="./models/PromptEnhancer-32B.Q8_0.gguf" python inference/prompt_enhancer_gguf.py

GGUF Model Benefits

๐Ÿš€ Why use GGUF models?

  • Memory Efficient: 50-75% less VRAM usage compared to full precision models
  • Faster Inference: Optimized for CPU and GPU acceleration with llama.cpp
  • Quality Preserved: Q8_0 and Q6_K maintain excellent output quality
  • Easy Deployment: Single file format, no complex dependencies
  • GPU Acceleration: Full CUDA support for high-performance inference
Model Size Quality VRAM Usage Best For
Q8_0 35GB Highest ~35GB High-end GPUs (H100, A100)
Q6_K 27GB Excellent ~27GB RTX 4090, RTX 5090
Q4_K_M 20GB Good ~20GB RTX 3090, RTX 4080

Parameters

Standard Models (Transformers)

  • models_root_path: Local path or repo id; supports trust_remote_code models.
  • device_map: Device mapping (default auto).
  • predict(...):
    • prompt_cot (str): Input prompt to rewrite.
    • sys_prompt (str): Optional system prompt; a default is provided for image prompt rewriting.
    • temperature (float): >0 enables sampling; 0 for deterministic generation.
    • top_p (float): Nucleus sampling threshold (effective when sampling).
    • max_new_tokens (int): Maximum number of new tokens to generate.

GGUF Models

  • model_path (str): Path to GGUF model file (auto-detected if in models/ folder).
  • n_ctx (int): Context window size (default: 8192, recommended: 1024 for short prompts).
  • n_gpu_layers (int): Number of layers to offload to GPU (-1 for all layers).
  • verbose (bool): Enable verbose logging from llama.cpp.

Citation

If you find this project useful, please consider citing:

@article{promptenhancer,
  title={PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt Rewriting},
  author={Wang, Linqing and Xing, Ximing and Cheng, Yiji and Zhao, Zhiyuan and Tao, Jiale and Wang, QiXun and Li, Ruihuang and Chen, Comi and Li, Xin and Wu, Mingrui and Deng, Xinchi and Wang, Chunyu and Lu, Qinglin},
  journal={arXiv preprint arXiv:2509.04545},
  year={2025}
}

Acknowledgements

We would like to thank the following open-source projects and communities for their contributions to open research and exploration: Transformers and HuggingFace.

Contact

If you would like to leave a message for our R&D and product teams, Welcome to contact our open-source team. You can also contact us via email (hunyuan_opensource@tencent.com).

Github Star History

Star History Chart

About

๐Ÿš€ PromptEnhancer with GGUF Support - Efficient quantized inference for text-to-image prompt enhancement. Features Q8_0/Q6_K/Q4_K_M models, 50-75% memory reduction, auto-detection.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.9%
  • Shell 1.1%