Skip to content
Bruno Scaglione edited this page Jan 9, 2025 · 26 revisions

Vision

Be a great issue-resolver for LLM-powered Projects (e.g., building LLMs, RAG, Agents, etc).

Roadmap (next months)

Avenues happen in parallel

1. Core Avenue

1. Do well on SWE-Bench, using standard SWE-bench metric and Kowinski Prize's evaluation metric.

  • Unrestricted/paid: can use LM Providers [langraph, aisuite, litellm, usearch, neo4j, SWE-Bench, SWE Knowledge Bases]
  • Restricted/free: needs to run inside free Github Actions runners (16GB RAM/4core CPU, aprox. 5B/float16 model fully loaded to RAM) [langraph, Ollama, usearch, neo4j, SWE-Bench, SWE Knowledge Bases, ]

Conditions to move to step 2:

  • Beat Kowinski Prize's top submission (Unrestricted/paid)
  • Have at least 2 AI teams using our issue-resolver for 2 weeks (Unrestricted/paid or Restricted/free).

2. Do well on LME-Bench.

Adapt SWE-bench solution to LME-Bench and try to improve eval results on LME-Bench.

  • Unrestricted/paid: can use LM Providers [langraph, aisuite, litellm, usearch, neo4j, LME-Bench, LME Knowledge Bases]
  • Restricted/free: needs to run inside free Github Actions runners (16GB RAM/4core CPU, aprox. 5B/float16 model fully loaded to RAM) [langraph, Ollama, usearch, neo4j, LME-Bench, LME Knowledge Bases, ]

2. Infrastructure Avenue

  • Add pre-commit routines [pre-commit]

  • Add experiment tracking, visualization & picking [dvc]

  • Add end-to-end tests evals to CI [playwright]

  • Add Performance evals (latency. throuhput and memory profiling) to CI

  • Build it as an API [kubernetes, docker, fastapi, grpc]

3. Security Avenue

  1. Fix Security warnings

Clone this wiki locally