-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Bruno Scaglione edited this page Jan 9, 2025
·
26 revisions
Be a great issue-resolver for LLM-powered Projects (e.g., building LLMs, RAG, Agents, etc).
Avenues happen in parallel
1. Do well on SWE-Bench, using standard SWE-bench metric and Kowinski Prize's evaluation metric.
- Unrestricted/paid: can use LM Providers [langraph, aisuite, litellm, usearch, neo4j, SWE-Bench, SWE Knowledge Bases]
- Restricted/free: needs to run inside free Github Actions runners (16GB RAM/4core CPU, aprox. 5B/float16 model fully loaded to RAM) [langraph, Ollama, usearch, neo4j, SWE-Bench, SWE Knowledge Bases, ]
Conditions to move to step 2:
- Beat Kowinski Prize's top submission (Unrestricted/paid)
- Have at least 2 AI teams using our issue-resolver for 2 weeks (Unrestricted/paid or Restricted/free).
2. Do well on LME-Bench.
Adapt SWE-bench solution to LME-Bench and try to improve eval results on LME-Bench.
- Unrestricted/paid: can use LM Providers [langraph, aisuite, litellm, usearch, neo4j, LME-Bench, LME Knowledge Bases]
- Restricted/free: needs to run inside free Github Actions runners (16GB RAM/4core CPU, aprox. 5B/float16 model fully loaded to RAM) [langraph, Ollama, usearch, neo4j, LME-Bench, LME Knowledge Bases, ]
-
Add pre-commit routines [pre-commit]
-
Add experiment tracking, visualization & picking [dvc]
-
Add end-to-end tests evals to CI [playwright]
-
Add Performance evals (latency. throuhput and memory profiling) to CI
-
Build it as an API [kubernetes, docker, fastapi, grpc]
- Fix Security warnings