ZeroNoise Logo zeronoise
Post
Parallel agent triage for PR firehoses + programmatic tool calling to cut token waste
Feb 23
5 min read
98 docs
Parallel-agent triage is getting real: Steinberger’s 50-Codex PR analysis pipeline shows how to turn a PR firehose into structured, queryable reports (without a vector DB). Also: concrete token/accuracy wins from Anthropic’s programmatic tool calling + dynamic filtering, plus fresh tool and repo drops (OpenClaw beta, Lion Reader).

🔥 TOP SIGNAL

Peter Steinberger (@steipete) shared a high-volume PR triage pattern that’s actually runnable at OSS scale: spin up 50 parallel Codex instances, have each one emit a JSON PR report (vision/intent/risk/etc.), then ingest all reports into a single session to query, de-dupe, auto-close, and merge—and he says you don’t even need a vector DB to make it work.

🛠️ TOOLS & MODELS

  • OpenAI Codex (terminology + why “harness” matters)

    • Gabriel Chua (OpenAI DevEx, APAC) frames Codex as: Codex = Model + Harness + Surfaces.
    • He defines the harness as “the collection of instructions and tools,” and notes it’s open source in openai/codex.
    • Key detail: an OpenAI insider acknowledgment (via Chua) that Codex models are trained in the presence of the harness, so tool use + execution loops + compaction + iterative verification are not “bolted on.”
  • GPT-5.3-Codex speed + depth via subagents (practitioner report + team clarification)

    • @rafaelobitten reports a “massive speed jump” in gpt-5.3-codex xhigh, and says deep multi-agent setups (subagents calling subagents) now feel viable for shipping larger features—at the cost of burning through limits faster (he’s running seven Pro accounts; mentions 2× limits until April).
    • @thsottiaux (Codex team) pushes back on the Cerebras attribution: says only the spark model is served through Cerebras and that GPT-5.3-Codex speed optimizations are “something different,” with more coming.
  • OpenClaw beta release

  • Anthropic tool-calling: concrete token/accuracy levers (thread + video coverage)

    • @jasonzhou1993 highlights “advanced tool calling” features: programmatic tool calling, dynamic filtering, tool search, and tool use examples.
    • Reported benchmark deltas:
      • Programmatic tool calling: ~37% token reduction.
      • Dynamic filtering: average 24% fewer input tokens.
      • Tool search + deferred loading: ~85% reduction in tool-definition tokens (77K → 8.7K).
      • Tool use examples: 72% → 90% accuracy on complex parameter handling.

💡 WORKFLOWS & TRICKS

  • “PR firehose” triage with parallel agents (Steinberger’s pattern)

    1. Run many Codex workers in parallel (he used 50) to analyze each PR.
    2. Require a structured output: each worker emits a JSON report with signals like vision alignment, intent (higher signal than text), and risk.
    3. Ingest all reports into one session and do the actual maintainer work there: query across the set, de-dupe, auto-close, or merge.
    4. Don’t overbuild: he says you don’t need a vector DB (he’d been thinking too complex).
    5. Extend the same machinery to Issues: “Prompt Requests” are “just issues with additional metadata.”
    • Real scale note: he’s ingesting ~3k PRs (1k done so far); saw “like 8 PRs for auto-update in the last 2 days alone.”
  • Context hygiene that actually moves agent quality (Kent C. Dodds)

    • Treat your existing code, tests, and docs as part of the “prompt” for autonomous agents; if they’re miserable, results will be miserable.
    • Practical takeaway: cleaning them up makes both agents and humans more effective.
  • Mobile-first delegation loop (Kent C. Dodds’ production anecdote)

    • Kent says Cursor “cloud agents” let him tackle “really ambitious projects,” including shipping password-based auth (replacing magic links) from his phone.
    • He also merged 23 PRs in a day while at his son’s gymnastics meet by prompting remote review bots (Bugbot, CodeRabbit) from his phone + doing a cursory glance himself.
  • Programmatic tool calling: stop making the LLM be the glue (Jason Zhou’s framing)

    • Instead of forcing the model to emit tool-call JSON every step, give it an environment with tool access and let it write code to chain calls—reported as ~37% token reduction.
  • MCP integration reality check (Armin Ronacher’s experiments)

    • Ronacher’s takeaway: “MCP server using code works, the other way round not yet”—because MCP servers today return mostly markdown or barely structured text.
    • He built:

👤 PEOPLE TO WATCH

  • Peter Steinberger (@steipete) — repeatedly high-signal on operationalizing coding agents: parallel Codex PR analysis at real PR volume, plus ongoing OpenClaw releases.
  • Kent C. Dodds (@kentcdodds) — concrete “agents in the loop” habits (phone-driven PR merges) + a timeless reminder that repo quality is part of your agent prompt.
  • Armin Ronacher (@mitsuhiko) — doing hands-on MCP experiments and calling out the blocker: tool outputs that aren’t structured enough to compose.
  • Brendan Long — shipping a “vibe-coded” project to completion (Lion Reader), and wiring it up to Claude Code workflows + MCP.
  • Chris Lattner (via Simon Willison’s write-up) — careful technical read on AI-generated systems: CCC looks like a competent “textbook implementation,” and the flaws are informative (tests vs abstractions; generalization limits).

🎬 WATCH & LISTEN

1) Programmatic tool calling: use code (loops/conditionals) to chain tools deterministically (~4:09)

Hook: A crisp explanation of why “LLM emits JSON tool calls” is often the wrong abstraction—let the model write executable code that passes results between tools.

2) Dynamic filtering for web fetch: stop dumping raw HTML into context (~9:10)

Hook: Shows the token-waste failure mode (raw HTML + noise) and the fix: execute code to extract only the relevant content before it ever hits the model’s context.

📊 PROJECTS & REPOS


Editorial take: Today’s recurring edge is structure over vibes: structured PR reports, structured tool outputs, and structured repo context (tests/docs) are what make “many-agents at once” actually hold together.