Skip to content
View zanwenfu's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report zanwenfu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
zanwenfu/README.md

Zanwen (Ryan) Fu

ML Engineer · Founder
Building agentic systems — from reasoning loops to production backends.

Website LinkedIn Email


Incoming MLE at Robinhood (Agentic AI) · M.S. CS (AI/ML) at Duke · B.Comp. CS at NUS

Currently interested in: agent evaluation harnesses, context engineering for long-running workflows, and what it actually takes to benchmark agents that make real-world decisions over time.


What I've shipped

VYNN AI — Agentic financial analyst platform (sole engineer, ~500 users, 50K+ LOC)

LangGraph supervisor orchestrates 5 specialized agents for end-to-end equity research: data scraping → DCF modeling (6 sector strategies) → news intelligence → report generation — all in under 7 minutes. The hard part wasn't the LLM calls; it was making the numbers trustworthy. The recommendation engine uses a 3-layer architecture: deterministic math (RecommendationCalculator) → LLM narrative → regex-based validator that blocks publication if citation coverage drops below 95%. Built a custom 1,293-line Excel formula evaluator so the DCF workbook and downstream JSON stay perfectly consistent without requiring Excel. Nightly CI runs a golden-dataset regression suite across 100 QQQ companies and blocks deployment if valuations drift beyond threshold.

stock-analyst (agent backend) · vynnai-web (platform frontend) · api-runner (API layer)


AutoCodeRover — Autonomous code repair agent · Core technology acquired by Sonar

Designed the Self-Fix Agent: when a patch fails to apply, an LLM-as-a-Judge diagnoses which pipeline stage (Context Retrieval or Patch Generation) caused the failure, generates corrective feedback, and replays from that stage — preserving upstream state via UUID-targeted responses. Also built a stateful replay mechanism so developers can inject feedback on any intermediate LLM response and trigger selective re-execution downstream. Result: 51.6% on SWE-bench Verified (up from 38.4%), 1.8× patch precision over next-best open-source agent.

auto-code-rover (agent backend) · Jetbrains-IDE-Plugin (Kotlin, end-to-end)


ACR JetBrains Plugin — IDE-integrated autonomous repair

Built end-to-end in Kotlin. Three things I'm most proud of: (1) GumTree 3-way AST merge — when you've edited code while the agent is patching the same file remotely, the plugin reconciles baseline → your edits → agent's patch at the AST level, not text level. (2) PSI-based context enrichment — before sending a task to ACR, the plugin extracts symbol references, cursor history (last 10 positions), and open files to narrow the agent's search scope. (3) Embedded SonarLint — runs static analysis locally, then lets you one-click send any issue to ACR for autonomous fixing.


LUMINA — Multi-agent citation screening for medical systematic reviews (first author)

Four-agent pipeline: classifier triage → PICOS-guided Chain-of-Thought screening → LLM-as-a-Judge reviewer → self-correction agent. Evaluated across 15 SRMAs (~150K citations from BMJ, JAMA, Lancet). 98.2% sensitivity (10 of 15 at perfect 100%) with 35× fewer missed studies vs. prior baselines, at $0.007/article.


What I think about

  • Agent harness design — VYNN's golden-dataset regression suite and ACR's SWE-bench eval loop taught me the same lesson: the harness that catches agent regressions matters more than the agent itself. I'm interested in building eval infrastructure that can score long-running, multi-step agents where "correct" isn't a single number.
  • Context engineering — Most agent failures I've debugged trace back to what the agent didn't know, not what it reasoned poorly about. PSI-based enrichment in the ACR plugin, MCP self-retrieval in VYNN, 33 externalized prompt templates — these are all different bets on the same problem: giving agents the right context at the right time.
  • The gap between demo benchmarks and production trust — An agent that scores 51.6% on SWE-bench still fails half the time. VYNN's 3-layer recommendation validator exists because "usually right" isn't good enough for financial decisions. I'm drawn to the engineering that makes agents trustworthy enough to run unsupervised.

Last updated: Mar 2026

Pinned Loading

  1. Agentic-Analyst/stock-analyst Agentic-Analyst/stock-analyst Public archive

    VYNN AI Agent Backend is a standalone agent execution system for financial analysis. It orchestrates LLM-based agents to scrape historical financial data, build valuation models, analyze real-time …

    Python 2

  2. Agentic-Analyst/vynnai-web Agentic-Analyst/vynnai-web Public

    VYNN AI frontend — React 18 + TypeScript dashboard with SSE-streaming AI chat, real-time WebSocket market data, portfolio management, and automated daily report generation. Part of the VYNN AI plat…

    TypeScript

  3. Agentic-Analyst/api-runner Agentic-Analyst/api-runner Public

    VYNN AI API layer — FastAPI async orchestration service that spawns ephemeral Docker containers for agent execution, manages SSE/WebSocket streaming, OAuth + passwordless auth, and daily report sch…

    Python

  4. Jetbrains-IDE-Plugin Jetbrains-IDE-Plugin Public

    AutoCodeRover IDE Plugin (ACR Plugin) brings AutoCodeRover Agent - an autonomous program improvement directly into developer's Integrated Development Environment (IDE). Using this IDE extension, de…

    Kotlin

  5. auto-code-rover auto-code-rover Public

    A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-bench lite and 51.6% tasks (pass@3) in SWE-bench verified with…

    Python

  6. Agentic-AI-for-Systematic-Reviews Agentic-AI-for-Systematic-Reviews Public

    The LUMINA agent is a LLM-based intelligent screener designed for automating the large-scale citation screening phase in medical systematic review and meta-analysis (SRMA).

    Python 1 1