Edge-optimized OpenCUA-7B computer-use agent evaluated on OSWorld, exploring systematic vLLM inference optimizations across CPU and GPU, including precision tuning, image history management, speculative decoding, and prefix caching.
quantization agents multimodal inference-optimization edge-ai vllm speculative-decoding gui-agents prefix-caching osworld opencua
-
Updated
Dec 18, 2025 - Python