# Context Discipline Beats Tool Sprawl for Coding Agents

*By Coding Agents Alpha Tracker • June 27, 2026*

monday.com’s Sidekick rebuild and Geoffrey Huntley’s RALPH loop point to the same lesson: coding agents get better when you reduce always-on context and expose tools more selectively. Also in today’s brief: Codex session rituals, LangSmith spend controls, T3’s browser-first remote setup, and fresh model signals from Theo and Simon Willison.

## 🔥 TOP SIGNAL

monday.com’s Sidekick rebuild is the clearest lesson today: one agent with 200+ tools created context pollution, confused the model, and pushed up cost, so the team moved to one orchestrator on a Deep Agent backbone with layered tool discovery, sub-agents, middleware, and a sandboxed write-code tool [^1].

Geoffrey Huntley is arguing for the same principle from the opposite end: larger context windows often make agents worse, so his RALPH loop keeps the model deliberately forgetful—pick the most important item from a backlog, work it, reset context, repeat [^2]. Theo’s repeated complaints about Gemini getting stuck in bad loops, reading the wrong files, and misusing tools are a reminder that agent quality is still heavily harness-dependent, not just model-dependent [^3].

## ⚡ TRY THIS

- **Shrink the active tool surface before adding more agents.** Omri Bruchim’s monday.com pattern: keep a small base set for the 50-60% case, inject only screen/entity-specific tools, and keep the long tail in a deferred catalog with short descriptions that the agent loads only when needed. Put middleware in front for context injection, tool-name repair, model selection, and retries/resource scaling; monday.com says those self-healing paths hit a 94% recovery rate [^1].

- **Use code execution as the universal fallback tool.** Instead of shipping a separate sum/filter/search/etc. tool for every edge case, let the agent write and run Python in a sandbox. monday.com says this replaced hundreds of tools and enabled composite tasks like combining mapping APIs to search for London office locations under commute constraints [^1].

- **Run a forgetful backlog loop.** Huntley’s RALPH recipe: hand the model a task list, rough specs, and the current codebase; tell it to choose the most important next item; let it finish that slice; then reissue a fresh goal with fresh backing context instead of keeping one giant chat alive. For language ports, first reduce the old codebase into PM-style specs plus a test lookup table, then generate the target implementation from that distilled artifact [^2].

- **Make planning and completion explicit.** Huntley uses voice-first prompting to tell the agent which libraries/components to study first, and withholds build permission until he says `move` [^2]. Pietro Schirano adds text-expansion rituals in Codex: `spawn` to launch multiple agents, `agent` to inject a persistent goal-completion instruction, and a closing regression sweep asking for bugs, regressions, and edge cases [^4].

## 📡 WHAT SHIPPED

- **LangSmith LLM Gateway** — before broader ship, LangChain rolled it out internally for real-time spend visibility and budgets at the org, workspace, user, and API-key level. If coding-agent adoption inside your team is running into surprise bills, this is the clearest cost-control pattern in today’s sources. [Blog](https://www.langchain.com/blog/how-we-made-coding-agent-spend-predictable) [^5]

- **T3 Code’s current wedge is remote dev, not Codex.** Theo says `npx t3@nightly serve` works over Tailscale, that he now does 90% of his coding from the browser, and that the demo machine was set up the same day [^6][^7][^8].

- **Gemini still looks weak on long-horizon agent work, per Theo.** His firsthand issues: dumb reasoning loops, wrong-file reads, weak tool use, and worse coherence as tasks run longer. He also says Cursor had to heavily reshape prompts to make Google models use tools correctly, though `/thinking` traces are at least somewhat better than before [^3].

- **Frontend prompt behavior may be shifting.** Simon Willison says he used to add `don't use React` to most frontend prompts; now most models don’t default there, and a fresh-project test returned vanilla HTML+JS. He isn’t sure whether that’s changed model preference or better awareness of existing project context [^9][^10].

- **Training signal to watch:** Theo argues that real human/LLM interaction histories plus before/after code changes matter more for agent RL than raw codebase size, which helps explain how a smaller player like Cursor could improve quickly [^3].

## 🎬 GO DEEPER

- **11:52-14:14 — Huntley explains RALPH.** Best timeless clip today if you’re building your own loop: it’s about memory management, not vibes, and the example makes the case for deliberate forgetting [^2].


[![Hot-takes at a fireside chat during AI:Engineer Miami](https://img.youtube.com/vi/fbqAh46eMkc/hqdefault.jpg)](https://youtube.com/watch?v=fbqAh46eMkc&t=711)
*Hot-takes at a fireside chat during AI:Engineer Miami (11:51)*


- **12:24-13:32 — monday.com on why a write-code tool beats hundreds of micro-tools.** The London-office example makes the payoff obvious: code execution is a general-purpose escape hatch, not just a fancy calculator [^1].


[![How Monday.com Built Sidekick on Deep Agents | Interrupt 2026](https://img.youtube.com/vi/c2fLLS7np3Y/hqdefault.jpg)](https://youtube.com/watch?v=c2fLLS7np3Y&t=744)
*How Monday.com Built Sidekick on Deep Agents | Interrupt 2026 (12:24)*


- **10:22-11:19 — Theo audits Gemini reasoning traces.** Good watch before you pick a default coding-agent model: the traces are more coherent than before, but he still catches low-focus planning and nonsense [^3].


[![Dear Google, we need to talk.](https://img.youtube.com/vi/23BtT8P7rCA/hqdefault.jpg)](https://youtube.com/watch?v=23BtT8P7rCA&t=622)
*Dear Google, we need to talk. (10:22)*


- **26:17-28:54 — Huntley’s voice-first planning workflow.** Useful prompt-engineering clip if your agent thrashes during implementation: assign reading context first, talk through the plan, then unlock build mode with `move` [^2].


[![Hot-takes at a fireside chat during AI:Engineer Miami](https://img.youtube.com/vi/fbqAh46eMkc/hqdefault.jpg)](https://youtube.com/watch?v=fbqAh46eMkc&t=1577)
*Hot-takes at a fireside chat during AI:Engineer Miami (26:17)*


*Editorial take: the highest-alpha work right now is not adding more agent autonomy—it’s deciding what the agent sees, when it sees it, and when the loop should reset.* [^1][^2][^3]

---

### Sources

[^1]: [How Monday.com Built Sidekick on Deep Agents | Interrupt 2026](https://www.youtube.com/watch?v=c2fLLS7np3Y)
[^2]: [Hot-takes at a fireside chat during AI:Engineer Miami](https://www.youtube.com/watch?v=fbqAh46eMkc)
[^3]: [Dear Google, we need to talk.](https://www.youtube.com/watch?v=23BtT8P7rCA)
[^4]: [Builders Unscripted: Ep. 4 - Pietro Schirano](https://www.youtube.com/watch?v=SPC_yCe1cUw)
[^5]: [𝕏 post by @LangChain](https://x.com/LangChain/status/2070493501473554540)
[^6]: [𝕏 post by @theo](https://x.com/theo/status/2070436985576300554)
[^7]: [𝕏 post by @theo](https://x.com/theo/status/2070437920188858876)
[^8]: [𝕏 post by @theo](https://x.com/theo/status/2070438009368150095)
[^9]: [𝕏 post by @simonw](https://x.com/simonw/status/2070610501630042599)
[^10]: [𝕏 post by @simonw](https://x.com/simonw/status/2070610733763903845)