Performance Guide

Model: qwen2.5-coder:1.5b

Hardware

GPU: NVIDIA GTX 1650 (4GB)
All 29/29 layers offloaded to GPU
GPU utilization: 65-100% during inference

Performance Metrics

Mode	Response Time	Use Case
Simple (streaming)	2-5 seconds	Questions, explanations, code generation
Agent (optimized)	5-20 seconds	Simple code analysis
Agent (complex)	20-60 seconds	Multi-step tasks with tools

Optimizations Applied

Skip Planning - Only plan for queries mentioning code elements (saves 25s)
Skip Reflection - Only reflect when files are modified (saves 25-50s)
Cache Repo Map - Cache file tree generation (saves 2-5s)
Streaming Output - Real-time response display (better UX)

Usage Recommendations

✅ Use Simple Mode For:

python -m assistant --simple "your question"

Code generation
Explanations and tutorials
Quick questions
Debugging help

⚠️ Agent Mode Limitations:

The 1.5b model struggles with:

Complex JSON formatting required by agent loop
Multi-step reasoning with tools
Following strict agent instructions

Recommendation: Stick to --simple mode for best experience.

Future Improvements

To make agent mode viable:

Use larger model (3b or 7b) for better instruction following
Simplify agent protocol (less strict JSON requirements)
Add few-shot examples in system prompt
Implement tool-use fine-tuning

Comparison: Simple vs Agent Mode

# Simple mode (recommended)
$ python -m assistant --simple "write a function to reverse a string"
# Response: 3 seconds, streaming output, works perfectly

# Agent mode (experimental)
$ python -m assistant "write a function to reverse a string"
# Response: 8-15 seconds, often hits iteration limit, inconsistent

Conclusion

The 1.5b model is excellent for simple mode but not suitable for agent mode. For agent features, consider using a larger model (3b+) or use simple mode exclusively.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Guide

Model: qwen2.5-coder:1.5b

Hardware

Performance Metrics

Optimizations Applied

Usage Recommendations

✅ Use Simple Mode For:

⚠️ Agent Mode Limitations:

Future Improvements

Comparison: Simple vs Agent Mode

Conclusion

FilesExpand file tree

PERFORMANCE.md

Latest commit

History

PERFORMANCE.md

File metadata and controls

Performance Guide

Model: qwen2.5-coder:1.5b

Hardware

Performance Metrics

Optimizations Applied

Usage Recommendations

✅ Use Simple Mode For:

⚠️ Agent Mode Limitations:

Future Improvements

Comparison: Simple vs Agent Mode

Conclusion