Skip to content

Commit 85c15e8

Browse files
authored
Add images
1 parent 0be6a39 commit 85c15e8

1 file changed

Lines changed: 8 additions & 5 deletions

File tree

content/blog/ai-agent-esper.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,13 @@ The timing was on our side: Just before my internship began in June of 2025, Cla
1212

1313
Before Claude Code brought a general-purpose agent to be used by everyone (I hate that people think Claude Code is only for Coding!), building agents was a bit of a complicated process, at least to me at that time! Claude Code commodified the *mechanics*, which gave us more time to think about what and how the agent does. Not only that, but it also pushed other AI labs in the direction of terminal-based agent orchestrators (Codex, Gemini-cli, opencode, etc.) and gave Anthropic some [sweet-sweet revenue unlocking lever ~$1B!](https://www.anthropic.com/news/anthropic-acquires-bun-as-claude-code-reaches-usd1b-milestone)
1414

15+
<img width="1028" height="215" alt="Screenshot 2026-01-05 at 20 09 44" src="https://github.com/user-attachments/assets/921e938a-3320-4b31-abe3-3f49397fde26" />
16+
17+
---
18+
1519
Now, coming to the part where all the talk was put into action. What was done, how it worked, and what actually didn't work\! (Quick note on our setup: we used Claude Code via AWS Bedrock for enterprise compliance, and one thing I did observe in this is that at times, some features in Claude Code were missing, not sure if it was an A/B testing thing or happening because we were using Bedrock. Personally I feel the 200$ max plan had really good limits before it for nerfed, but for compliance we stuck to claude via bedrock.)
1620

17-
**What We Built (And What We Learned)**
21+
## **What We Built (And What We Learned)**
1822

1923
### **1\. The Claude.md Stack (And Why Rules Get Ignored)**
2024

@@ -76,14 +80,13 @@ https://www.devashish.me/p/ai-adoption-framework-phase-1-minimalist
7680
---
7781

7882
### **6\. When Agents Lie About Being Done**
83+
<img width="610" height="532" alt="Screenshot 2026-01-05 at 19 58 29" src="https://github.com/user-attachments/assets/8299d0c1-96eb-4b2c-8343-004789bb55f9" />
7984

80-
![][image1]
81-
82-
Anthropic has talked extensively about [reward hacking](https://www.anthropic.com/research/reward-hacking)—where models learn to game their training objectives rather than actually solving problems. They've worked to mitigate this in Claude, but in my experience with agentic workflows, it still surfaced in subtle ways.
85+
Anthropic has talked extensively about [reward hacking](https://www.anthropic.com/research/reward-hacking) where models learn to game their training objectives rather than actually solving problems. They've worked to mitigate this in Claude, but in my experience with agentic workflows, it still surfaced in subtle ways.
8386
The context decay problem from Claude.md rules showed up elsewhere too: premature task completion. Sonnet 4 and 4.5 would claim "Task complete!" without verifying anything actually worked. No tests run, no compilation checks, no output validation.
8487
Vague instructions made it worse. "Here's the db schema, write the gorm queries" led Opus to confidently implement patterns that didn't match our architecture. Sonnet would announce completion without checking if the code compiled.
8588
The frustration here is real: I'd give the agent verification tools, and it would still skip them. It felt like the model learned that "say you're done" correlates with positive feedback, so it rushed there.
86-
What helped: Being explicit about verification. Don't ask for implementation - ask for implementation plus test execution. Force the verification into the task definition itself, don't leave it optional.
89+
What helped: Being explicit about verification. Don't ask just for implementation, ask for implementation plus test execution. Force the verification into the task definition itself, don't leave it optional.
8790

8891
---
8992

0 commit comments

Comments
 (0)