Home

Jump to bottom

Bruno Scaglione edited this page Jan 9, 2025 · 26 revisions

Vision

Be a great issue-resolver for LLM-powered Projects (e.g., building LLMs, RAG, Agents, etc).

Roadmap (next months)

Avenues happen in parallel

1. Core Avenue

1. Do well on SWE-Bench, using standard SWE-bench metric and Kowinski Prize's evaluation metric.

Unrestricted/paid: can use LM Providers [langraph, aisuite, litellm, usearch, neo4j, SWE-Bench, SWE Knowledge Bases]
Restricted/free: needs to run inside free Github Actions runners (16GB RAM/4core CPU, aprox. 5B/float16 model fully loaded to RAM) [langraph, Ollama, usearch, neo4j, SWE-Bench, SWE Knowledge Bases, ]

Conditions to move to step 2:

Beat Kowinski Prize's top submission (Unrestricted/paid)
Have at least 2 AI teams using our issue-resolver for 2 weeks (Unrestricted/paid or Restricted/free).

2. Do well on LME-Bench.

Adapt SWE-bench solution to LME-Bench and try to improve eval results on LME-Bench.

Unrestricted/paid: can use LM Providers [langraph, aisuite, litellm, usearch, neo4j, LME-Bench, LME Knowledge Bases]
Restricted/free: needs to run inside free Github Actions runners (16GB RAM/4core CPU, aprox. 5B/float16 model fully loaded to RAM) [langraph, Ollama, usearch, neo4j, LME-Bench, LME Knowledge Bases, ]

2. Infrastructure Avenue

Add pre-commit routines [pre-commit]
Add experiment tracking, visualization & picking [dvc]
Add end-to-end tests evals to CI [playwright]
Add Performance evals (latency. throuhput and memory profiling) to CI
Build it as an API [kubernetes, docker, fastapi, grpc]

3. Security Avenue

Fix Security warnings