Your intelligence agent for what matters

Tell ZeroNoise what you want to stay on top of. It finds the right sources, follows them continuously, and sends you a cited daily or weekly brief.

Set up your agent
What should this agent keep you on top of?
Discovering sources...
Syncing sources 0/180...
Extracting information
Generating brief

Your time, back

An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers verified daily or weekly briefs.

Save hours

AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.

Full control over the agent

Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.

Verify every claim

Citations link to the original source and the exact span.

Discover sources on autopilot

Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.

Multi-media sources

Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.

Private or Public

Create private agents for yourself, publish public ones, and subscribe to agents from others.

3 steps to your first brief

1

Describe your goal

Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.

Weekly report on space exploration and electric vehicle innovations
Daily newsletter on AI news and research
Startup funding digest with key venture capital trends
Weekly digest on longevity, health optimization, and wellness breakthroughs
Auto-discover sources

2

Review and launch

Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.

Discovering sources...
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS

3

Get your briefs

Get concise daily or weekly updates with precise citations directly in your inbox. You control the focus, style, and length.

Ticket Queues Become the New Agent UI
May 3
5 min read
67 docs
AI Engineer
Riley Brown
Salvatore Sanfilippo
+11
The strongest signal today is architectural: top practitioners are moving from chatty sessions to ticket queues, narrow agent roles, and repo-native SOPs. This brief covers the best practical setups from Symphony, OpenClaw, Cursor SDK, Codex, and local DeepSeek workflows.

🔥 TOP SIGNAL

OpenAI’s Symphony / “Symfony” is the clearest sign that coding agents are moving from session management to deliverable management: a background scheduler polls tickets, opens an isolated workspace per ticket, updates ticket state, and raises a PR when work reaches Merging. Jason Zhou says Symphony plus a good codebase harness improved his coding-agent outcomes by 5x, and the same pattern shows up elsewhere today: Ross Mike gets better results with one orchestrator agent delegating narrow jobs, while swyx frames the role shift as plan and review rather than hand-writing every implementation . The edge is increasingly in the state machine around the model—repo-local SOPs, narrow scopes, and explicit review gates—not in babysitting one more chat window .

⚡ TRY THIS

  • Set up a ticket-native loop in an afternoon. Jason Zhou’s setup is explicit: clone Symphony; if you need custom tooling or language support, point another coding agent at spec.md; generate a repo-local workflow.md; create a Linear project and save a personal API key with linear api-token save; define To Do, In Progress, Human Review, and Merging; then run symphony --workflow path/to/workflow.md --daemon. The workflow.md frontmatter controls ticket filters, polling, hooks, parallelism, and agent settings, while the markdown body is the SOP the agent follows every turn . Add Playwright CLI, a boot skill, and indexed docs if you want autonomous verification instead of partially autonomous implementation .

  • Keep the stack narrow; talk to one orchestrator. Ross Mike says OpenClaw worked best when one main agent held the full context and delegated bounded jobs to sub-agents; he does not talk to the sub-agents directly . He pairs that with a narrow skill surface—few goals, few connectors, domain-specific skills—because broad “do everything” agents with 15-30 skills/connectors made “none of it work” . Good default: one main agent, specialized subs, and human review only at the last consequential step .

  • Convert good runs into skills; stop carrying junk context. Riley Brown and Ross Mike describe the same loop: get one output you actually like, then reverse-engineer that run into a reusable skill with exact structure, examples, and domain rules; that was the difference between garbage reports and one-shot usable output in their demos . Keep only necessary context too—don’t tell the model what it can already infer from the repo, and don’t expect slangy prompts to produce precise work .

"The value of good instructions has never been higher."

Tibo also calls /goal one of Codex’s most consequential releases so far, which fits the same pattern: instruction quality is compounding, not getting commoditized .

  • If you run local frontier-ish models, compact aggressively. Salvatore Sanfilippo’s ds4.c demo on a 128GB M3 Max shows the hidden tax isn’t just model size: 32K context adds ~1GB RAM, 250K adds ~7GB, and big tool outputs plus huge system prompts crush real-world latency; a 5K-token tool response after an 11K-token system prompt took 86 seconds to reprocess . His fix is straightforward: after long runs, compact the conversation into a short summary instead of dragging the full transcript forward .

📡 WHAT SHIPPED

  • OpenAI Symphony / “Symfony” — open-source ticket-driven coding-agent orchestrator. Core pieces: a background scheduler plus repo-local workflow.md that acts as config + SOP; default flow polls Linear every 30s, creates one workspace per ticket, and auto-PRs on Merging. Jason Zhou’s field report: Playwright CLI + boot skill + WORKFLOW.md + good harness = 5x better outcomes .

  • Cursor SDK — official SDK for building agents with the same runtime, harness, and models as Cursor. Use cases: CI/CD jobs, end-to-end automations, and embedding agents in products. Announcement. Composer 2 is 50% off in the SDK this weekend .

  • Petdex — RaillyHugo’s public gallery for discovering, sharing, and installing Codex pets “with one curl”; Greg Brockman amplified it and submissions are open. Link.

  • ds4.c / ds4 server — Salvatore Sanfilippo’s mostly GPT-5.5-built local inference engine for DeepSeek v4 Flash 270B in 2-bit GGUF (~81GB), with an OpenAI-compatible server for coding agents like OpenCode. Current state: working locally on a MacBook Pro M3 Max 128GB, not on GitHub yet .

  • Codex vs Anthropic: the practitioner split is getting sharper. Riley Brown and Ross Mike argue Codex is winning the “super app” lane because coding + knowledge work live behind one interface toggle, while Anthropic’s Cowork / Code / Dispatch / Remote stack still feels fragmented . Their parallel claim is useful: a great coding model is increasingly a great general-purpose knowledge-work model because everything reduces to files, tools, and GUI wrappers .

  • Claude Code for web, real build — Simon Willison shipped iNaturalist syndication from his phone and wired it into homepage, archives, and search. Good proof that browser-native coding agents are now viable for real feature work, not just edits. PR.

🎬 GO DEEPER

  • 1:45-3:38 — Symphony’s mental model. Jason Zhou explains the scheduler + workflow.md split: frontmatter handles polling, workspace hooks, and agent settings; the markdown body holds the SOP. This is the shortest clean explanation of why the ticket tracker becomes the state machine .
  • 23:53-25:04 — A proactive agent loop that actually saves time. Ross Mike walks through an OpenClaw workflow that vets sponsor emails, researches companies, sends the first pricing email, tracks negotiations in Notion, and hands off only the finals. Even if you don’t do sponsorships, the reusable pattern is heartbeat + database + human-at-the-end .

  • 15:09-20:58 — Local frontier model, real latency pain. Salvatore’s ds4/OpenCode demo is worth watching because it shows the practical bottleneck nobody advertises: tool output size and system-prompt bulk, not just raw tok/s. You’ll leave with a much better feel for when local agents are viable and when compaction is mandatory .

  • 25:26-33:24 — Steering vs queueing in Codex. Riley Brown’s beginner guide is one of the better current overviews of projects, plugins, custom skills, and automations; skip straight to the steering/queueing section if you already know the UI. Tutorial.

  • Study this PR, not just the screenshot. Simon Willison’s PR #668 is a clean example of shipping a real feature with Claude Code for web, while the Codebase Context Specification is still a useful artifact if you want a shared language for persistent context layout .

Editorial take: the durable edge is moving out of the model picker and into ticket queues, repo-native SOPs, narrow agent roles, and ruthless context hygiene .

Persistent Agents, Security SaaS, and a Higher Bar for AI Outcomes
May 3
6 min read
763 docs
Sam Altman
Doug Colkitt
Konstantine Buhler
+10
Early traction this cycle clustered around AI security and workflow automation, while the strongest technical signals came from persistent-agent architectures and personal-agent stacks. The broader investor read-through: open-source model momentum remains credible, security diligence is becoming more urgent, and the bar for venture-scale SaaS outcomes keeps rising.

Funding & Deals

  • Meta/Manus is the clearest deal signal in the set. Harry Stebbings flagged that China blocked Meta's $2B Manus deal, and his accompanying interpretation is that an attempted unwind would pressure the acquirer and future cross-border deal behavior more than already-distributed VC proceeds . For investors, it is a reminder that AI exits can pick up meaningful geopolitical risk .
  • Autonomous incident response is already a funded category. A founder building a code-level production-crash agent points to Resolve AI's $125M raise as evidence that autonomous incident handling has serious capital behind it, while arguing the open gap is code-level reproduction and fixing rather than infra-only workflows .
  • YC remains a strong early filter for agentic commerce tooling. LocusFounder says it joined YC this year and is opening 100 free beta spots for an AI system that builds a website, conversion copy, ads, and back-office operations around a user's project idea, with users keeping any revenue generated .

Emerging Teams

  • CheckVibe: security for AI-built apps is getting paid, quickly. The two-person bootstrapped team built a scanner for apps shipped rapidly with AI tools; they report $3.4k in gross revenue, 100+ paying customers, and 2.5k signups within six weeks, with a public Stripe dashboard linked in the post . The team also says security-critical scanner logic was architected manually rather than vibe-coded .
  • Zeriflow: repo-level analysis looks like the sticky wedge in security SaaS. Eight months after v1 and 12,400 scans later, the founder reports that about 70% of paying users connect GitHub repos, monitoring with score-drop alerts drives repeat usage, and v2 adds a PR-blocking GitHub Action, live README badge, and REST API . The biggest open product issue is false positives, which the founder says are the top cause of churn .
  • Transita: narrow workflow, fast UX, and MCP-native distribution. The visa-eligibility product says it shipped as an MCP server inside Claude Desktop, Cursor, and Cline, uses an anonymous quiz plus token-based paid unlocks, and returns a deterministic top-six country match before slower AI enrichment streams in . After about six months, the founder reports 41 quiz completions, 22% email capture, and 5% paid conversion at $9 .
  • A white-box compliance engine is attacking a high-trust niche. A solo founder says the product shows the exact logic path used to verify a document against a compliance rule, aiming to replace opaque probabilistic outputs with inspectable reasoning; the compliance workflow has just launched and is seeking beta feedback .
  • Code-level crash automation is a credible new agent wedge. Another founder says their CLI converts a Sentry crash URL into a failing pytest on the current branch and verifies whether a fix worked, targeting the 30-40 minutes engineers often spend reconstructing state before debugging even starts . The open design question is how much autonomy developers will trust in production, especially for billing or payments code .

AI & Tech Breakthroughs

  • Persistent agents are moving beyond disposable task runners. AIPass distinguishes disposable sub-agents from persistent "citizens" that keep identity, memory, tests, and domain-specific behavior inside a layered orchestrator -> citizen -> sub-agent architecture . The project gives concrete examples: a mail citizen with 696 tests built through failures and a routing citizen shaped across 80+ sessions of bugs and fixes . Its memory model uses passport.json, local.json, and observations.json, injected each session so the citizen does not start cold . The project says the repo is CLI-based on Claude Code, Linux-focused, and currently at 85 stars, 400+ PRs, and 6,500+ tests .
  • Garry Tan is open-sourcing a personal-agent stack, not just a demo. Garry Tan describes GBrain as his OpenClaw/Hermes-based personal agent setup with custom retrieval, graph DB, schema, and skillpacks, with 100+ skills planned . The shipped "book-mirror" skillpack maps an author's ideas to the user's own life and projects, while newer skills cover article structuring with verbatim quotes, strategic reading, concept synthesis, web research against knowledge gaps, and archive mining gated by an explicit allow-list . Tan also says the project is experimental and still in a "Homebrew Computer Club stage" .
  • A new RoPE paper offers a concrete explanation for compositional reasoning gains. The paper argues Rotary Positional Embeddings let transformers solve compositional reasoning tasks where additive positional layers fail, proving a toroidal structure on finite groups and validating the claim with Qwen2.5-0.5B on modular arithmetic and sequential composition tasks .
  • The training-tooling layer is getting stricter. Parallelogram is positioned as a linter for LLM fine-tuning datasets that catches broken data before a GPU run starts .

Market Signals

  • AI-assisted software creation is outrunning basic security controls. In audits of eight vibe-coded SaaS apps, five had row-level security turned off on at least one user-data table, three exposed Supabase service-role keys to the browser, two trusted user_id from form bodies without session checks, and none had rate limits, including on auth . The auditor's broader claim is that codegen tools assume senior engineers review the diff, and increasingly there isn't one . One founder was reportedly pitching ARR projections while a service-role key was exposed in bundled client JS .
  • The hurdle for venture-scale SaaS outcomes continues to rise. Harry Stebbings amplified the view that $400M ARR growing 30% is no longer enough; companies now need a path to $1B+ growing 40% to produce strong outcomes, while everything below that risks a weak public-market result . He separately highlighted a venture market increasingly driven by a small number of massive winners .
  • Open-source model momentum has elite support, but benchmark optics are messy. Marc Andreessen endorsed a post arguing that Kimi k2.6 and DeepSeek v4 show open-source scaling is continuing, and that the market cap of companies built on top already exceeds OpenAI plus Anthropic combined . In a separate exchange, he amplified criticism of IRT ELO charts: as benchmarks approach saturation, moving from 97% to 99% accuracy can show up as a 200-point ELO gain, which can exaggerate apparent model gaps .
  • Engineer demand is not collapsing. A Citadel Securities analysis cited on X says software-engineer job postings are up 18% from the May inflection point last year, and Andreessen endorsed that readout . Another Andreessen-amplified post argues "we need more engineers, not less," while Garry Tan-adjacent discussion around personal agents points to emerging roles like "personal agent designer," "second brain engineer," and "context editor" .
  • Capability still appears to matter more than cheaper inference.

"i keep thinking i want the models to be cheaper/faster more than i want them to be smarter but it seems that just being smarter is still the most important thing"

Sam Altman framed intelligence as the higher priority, while Naval's shorter thesis is that AIs replace UIs and APIs .

Worth Your Time

AI Reaches New Math and Clinical Milestones as Enterprise Demand Surges
May 3
4 min read
428 docs
Nick Turley
Financial Times
Cursor
+14
AI reached notable new milestones in mathematics and emergency-room diagnosis, while Anthropic’s reported revenue jump underscored fast enterprise adoption. Elsewhere, the brief tracks efficient coding models, major developer-tool launches, and a tighter race around chips and compute supply.

Top Stories

Why it matters: Today’s biggest signals were that AI is moving from demos into research, clinical evaluation, and large-scale revenue.

  • AI-generated math work showed downstream research value. Researchers said they refined and adapted a proof method from GPT-5.4 Pro to solve several additional problems, including a 60-year-old conjecture by Erdős, Sárközy, and Szemerédi, and described this as one of the first cases where an AI-generated proof opened new research avenues. The result was announced at the Future of Mathematics Symposium .
  • A Harvard study favored OpenAI’s o1-preview over two attending physicians at triage. On 76 real Boston hospital cases, the model reached 67.1% diagnostic accuracy versus 55.3% and 50.0% for the two doctors; two physician reviewers also could not distinguish the AI diagnoses from the human ones .
  • Anthropic’s reported growth remains one of the clearest business signals in AI. A cited SemiAnalysis report said Anthropic’s ARR has passed $44B, up from $9B at the end of 2025, with growth driven mainly by enterprise Claude adoption and Claude Code; the same report said inference gross margins rose from 38% to over 70% .

Research & Innovation

Why it matters: Research updates pointed to a shift from headline model size toward efficiency, autonomy, and more realistic agent limits.

  • Qwen’s efficiency jump stood out. Qwen 3.6 35B A3B scored 73.4% on SWE-bench verified with 3B active parameters, versus Claude Opus 4.6 at 75% with around 200B active parameters on the same benchmark .
  • A new coding-agent benchmark raised the bar. Claude Opus 4.7 reportedly rebuilt an AlphaZero-style self-play pipeline from scratch on consumer hardware in three hours and then beat the Pascal Pons solver 7 of 8 times as first mover on Connect Four. The paper frames this as a move from patches and unit tests to end-to-end ML systems .
  • A new agent-memory paper argued current memory stacks are still just retrieval. The paper says vector stores, RAG buffers, and scratchpads implement lookup rather than consolidation, creating a generalization ceiling on compositionally novel tasks and leaving agents exposed to memory poisoning .

Products & Launches

Why it matters: Product releases continue to center on agent workflows, developer automation, and multimodal interfaces.

  • Codex shipped a broad feature bundle. Updates over the last two weeks included GPT-5.5, browser control, Sheets and Slides, Docs and PDFs, OS-wide dictation, auto-review mode, /pets, and a .tex plugin; the app was also said to be about 20% faster for computer and browser use .
  • Cursor opened up its agent stack. The new Cursor SDK lets developers build agents with the same runtime, harness, and models that power Cursor, including use from CI/CD pipelines, end-to-end automations, and embedded product workflows .
  • xAI added voice cloning to its API. Users can create a custom voice in under two minutes or choose from 80+ voices across 28 languages for voice agents and other applications; Hermes Agent support was separately flagged as coming soon .

Industry Moves

Why it matters: Competition is increasingly about chips, compute supply, and where companies choose to spend capital.

  • Huawei’s position in China’s AI hardware stack appears to be improving. The Financial Times reported that Huawei’s AI chip sales are surging as Nvidia stalls in China, while a separate analysis estimated Huawei chips at roughly 80% of H100 performance and argued the gap is narrowing .
  • Anthropic is also looking to diversify inference supply. The company was reportedly in early talks with U.K. startup Fractile to buy its inference chips when available next year .
  • Tech cost cutting continues alongside AI infrastructure spending. One market summary said tech companies announced 81,747 layoffs in Q1 2026, up 580% from Q4 2025, as spending shifts toward AI chips and data centers; the same note cited Meta plans to cut about 8,000 workers and Microsoft’s retirement program covering about 7% of its U.S. workforce .

Quick Takes

Why it matters: A few smaller updates still sharpened the picture on adoption, robotics, and model rollout.

  • ChatGPT Images usage is up more than 50% in a few weeks, with nearly 60% of daily users coming from newly logged-in users .
  • Gemini 3 Flash was reportedly upgraded in arena under the same name, with output quality described as closer to current Gemini 3.1 Pro than the prior Flash .
  • Figure’s F.03 robot can now walk up and down stairs using onboard camera perception, trained end-to-end with reinforcement learning in simulation .
  • Poolside released two agentic coding models, Laguna XS.2 and Laguna M.1, and made them temporarily free via API alongside a terminal agent and web IDE .
Fake Legibility, Dynamic Interfaces, and First-Principles Tools
May 3
3 min read
117 docs
Lenny's Podcast
Four organic recommendations from Notion’s Head of Product all point to the same discipline: work from reality, not simplified abstractions. The standout is Seeing Like a State, followed by picks on interactive prototyping, computing fundamentals, and tools that preserve human autonomy.

What stood out

Max Schwing’s recommendations all push toward the same habit: work from reality, not from simplified representations that hide what matters. He applied that lens to executive reporting, chat-interface design, computing fundamentals, and the design of tools themselves .

Start here

Seeing Like a State

  • Content type: Book
  • Author/creator: James C. Scott
  • Link/URL: No direct book URL was provided; source context: Why cultivating agency matters more than cultivating skills in the AI era | Max Schoening (Notion)
  • Who recommended it: Max Schwing
  • Key takeaway: He recommends it especially to executives building systems, as a warning against creating reporting structures that give leaders legibility while neglecting the reality of how teams actually work .
  • Why it matters: This was the most compelling recommendation in today’s set because Schwing turned the book into a concrete management test: if a system looks clean from the top but fails to reflect what is happening on the ground, that clarity may be false .

"executives love creating fake legibility for themselves because we don't like noise as humans... we want the signal but there's often less signal in it than one might think"

Three more worth saving

Stop Drawing Dead Fish

  • Content type: Talk/video
  • Author/creator: Brett Victor
  • Link/URL: No direct resource URL was provided; source context: Why cultivating agency matters more than cultivating skills in the AI era | Max Schoening (Notion)
  • Who recommended it: Max Schwing
  • Key takeaway: Schwing praised it while discussing chat interfaces at Notion, using it to argue that static Figma screens are inadequate for dynamic conversational products and that teams should prototype those interactions in interactive code .
  • Why it matters: It is the clearest product-design recommendation in the set: if the product is dynamic, the design process needs to capture that dynamism rather than freeze it into still images .

Code: The Hidden Language of Computer Hardware and Software

  • Content type: Book
  • Author/creator: Charles Petzold
  • Link/URL: No direct book URL was provided; source context: Why cultivating agency matters more than cultivating skills in the AI era | Max Schoening (Notion)
  • Who recommended it: Max Schwing
  • Key takeaway: He recommends it as a way to learn how computers actually work, noting that many professional programmers still lack that grounding and that the book does not introduce code until much later in the text .
  • Why it matters: It stands out as a fundamentals pick for readers who want a first-principles understanding of computing without needing to start from syntax .

Tools for Conviviality

  • Content type: Book
  • Author/creator: Ivan Illich
  • Link/URL: No direct book URL was provided; source context: Why cultivating agency matters more than cultivating skills in the AI era | Max Schoening (Notion)
  • Who recommended it: Max Schwing
  • Key takeaway: Schwing described it as a contrast between tools that let people exercise ingenuity and autonomy and industrial-scale tools that can become destructive to human autonomy .
  • Why it matters: It offers a clean framework for evaluating whether a technology expands human agency or strips it away .

Bottom line

If you save one item, save Seeing Like a State for the clearest warning in today’s set: neat visibility is not the same thing as understanding . If you are building AI or chat products, pair it with Stop Drawing Dead Fish for a more concrete design principle about prototyping live interaction as live interaction .

OpenAI’s Image Surge, a More Contested Model Race, and the New Shape of AI Work
May 3
3 min read
188 docs
Sebastian Raschka
Konstantine Buhler
Séb Krier
+10
OpenAI shared fast early growth for ChatGPT Images as fresh evaluations and investor commentary pulled the open-model story in different directions. The day also added evidence that AI is reshaping software jobs toward planning and review, while practical local deployment keeps advancing.

What stood out

Today’s clearest story was market pull, not a single blockbuster launch. OpenAI posted fresh adoption data, benchmark charts drew unusually explicit disagreement, and software-engineering commentary kept shifting from replacement toward workflow redesign .

OpenAI is seeing product pull from images — and still arguing for smarter models

ChatGPT Images usage rose more than 50% in a few weeks, with nearly 60% of daily users coming from newly logged-in users; Greg Brockman said the feature is "really taking off" . Sam Altman separately said he increasingly sees smarter models as more important than cheaper or faster ones .

"but it seems that just being smarter is still the most important thing"

Why it matters: OpenAI’s own usage signal suggests that new capability can still bring in fresh audiences quickly, especially when the use cases are broad across design, learning, work graphics, and creative work .

The open-model race looked more contested, not less

A NIST CAISI evaluation said DeepSeek V4 trails leading U.S. models by about eight months; Sebastian Raschka said he would have liked to see GLM 5.1, Kimi K2.6, and Qwen3.6 Max included on the same chart, and the full report is here: nist.gov/news-events/news/2026/05/caisi-evaluation-deepseek-v4-pro. At the same time, commentary endorsed by Marc Andreessen argued that Kimi K2.6 and DeepSeek V4 show open-source scaling is continuing, while Nathan Lambert said much depends on which trend line is more representative and noted that the best open models have long been Chinese . Another widely shared critique warned that these ELO gaps are inferred from benchmark scores rather than head-to-head play, and can widen mechanically as models approach 100% accuracy on more tests .

Why it matters: For anyone tracking the U.S.-China or open-vs.-closed race, leaderboard headlines are carrying more interpretation risk than usual. Official evaluations, open-model momentum claims, and benchmark-methodology caveats are all landing at once.

Software work still looks like a redesign story before a replacement story

Citadel Securities analysis shared by several AI commentators said demand for software engineers — the most AI-exposed occupation — has continued to accelerate, with job postings up 18% from the May inflection point . In parallel, swyx highlighted a shift toward "plan and review": as AI "eats the middle," engineers spend more time defining work and reviewing model output, which he described as the biggest lever for shipping faster . Andreessen also endorsed the view that "we need more engineers, not less" .

Why it matters: The short-term pattern in these notes is not simple displacement. Demand may still be rising even as the job changes shape toward specification, oversight, and review.

Local and embedded AI kept getting more practical

A Reddit post described a quantized Llama 3.3 70B running locally on a MacBook Pro M4 with 64GB RAM at about 71 tokens per second, finishing an offline client queue over an 11-hour flight with checkpointing for battery swaps . Separately, a LocalLLM commenter pointed to OpenAI’s newly released PII redaction model intended to run locally or in the browser, and Elon Musk said Grok Voice is already being used by Starlink .

Why it matters: The common thread is deployment. More attention is shifting from raw model scores to where models can actually run: offline, in-browser, and inside operational systems.

The New PM Bar: Judgment, Tiny-Core Products, and a Three-Speed Job Market
May 3
10 min read
38 docs
Productify by Bandan
Product Management
scott belsky
+3
This issue focuses on what is compounding for PMs in the AI era: better judgment, faster prototype review, and tighter product cores with real moats. It also includes lessons from Fyxer, Anthropic, and Notion, plus practical hiring guidance across the U.S., Europe, and India.

Big Ideas

1) Judgment is overtaking execution as the PM differentiator

Leah Tharin argues that skills that once drove PM promotions—PRDs, sprint hygiene, experiment readouts, funnel teardowns, research synthesis, and clean stakeholder updates—still matter, but are now baseline rather than differentiating . The newer bar is dual: execute against a metric and question whether it is still the right metric . Anthropic’s prototype-heavy workflow reinforces the same shift: when building gets cheaper, selection gets more valuable .

"The question that actually matters is the one that’s harder to change: is the AHA-Moment or Metric I’m responsible for still the right one?"

  • Why it matters: AI compresses how long a given AHA moment stays differentiated, so teams can keep optimizing a destination that has already moved .
  • How to apply: Treat sequence ownership, sideways alignment, kill judgment, and pattern recognition when metrics lie as explicit PM skills to build—not side effects of shipping more work .

2) Great AI products still need a tiny core and deep roots

Max Schoening argues that great products usually win because one small core interaction is exceptionally good, not because the team keeps adding one more feature . Scott Belsky makes the moat case from the market side: interfaces and prompts are weaker defenses than team graphs, network effects, systems of record, permissioning, and collaboration .

"winners will have deep roots"

  • Why it matters: As prototyping gets easier, a distinctive core workflow and embedded position in how teams work become more important than surface novelty .
  • How to apply: Define the one job users hire your product for, ask whether you would buy the current experience as a user, and protect the smallest interaction that makes the product feel exceptional .

3) Discovery is moving from static docs to prototype-first review

Schoening describes the first 10% of projects as effectively free and argues that rough demos often beat PRDs because they give the team something concrete to react to . Anthropic’s PMs review working software in the morning, kill most of it quickly, and ship the best work by the end of the week . Notion also moved AI-interface prototyping from Figma into a small code playground so PMs and designers could evaluate the interaction in motion, not as a static screen . Schoening’s definition of taste is also useful here: the ability to predict how a chosen in-group will react, built through reps and feedback .

  • Why it matters: Faster reaction loops let teams explore more paths earlier, but they also raise the bar on selection and taste .
  • How to apply: Replace some document-first reviews with demo-first reviews, especially for AI interactions that are hard to judge from screenshots or flows alone .

Tactical Playbook

1) Revalidate the AHA before you optimize the funnel

  1. Map the current journey to the specific first-value moment it is meant to create .
  2. Ask whether that moment has commoditized or stopped surprising users .
  3. Sit in sales, marketing, and customer success meetings to understand the broader system constraints around the journey .
  4. Add unscripted customer exposure through support, sales shadowing, or open user calls .
  5. Estimate commercial impact before shipping, then compare the forecast to what actually happened .
  6. Write kill criteria before the project starts, and stop work that is optimizing the wrong destination .
  • Why it matters: Teams can keep improving a local step while the real source of value has moved elsewhere .
  • How to apply this week: Pick one active onboarding or growth project and write down the current AHA, the evidence that it still matters, and the condition that would make you stop .

2) Run a prototype triage loop instead of a document queue

  1. Ask for multiple rough implementations instead of one polished concept; Anthropic’s example is hundreds of prototypes before feature commitment .
  2. Review working software early, not just PRDs or mockups .
  3. Kill aggressively; Anthropic PMs reportedly kill 80% of what they review by noon .
  4. Hold the survivors to an obviously good quality bar rather than a feature-count bar .
  5. Remember that the last mile is still hard even if the first version is cheap .
  • Why it matters: When exploration is cheap, the bottleneck becomes judgment and quality control, not idea generation .
  • How to apply this week: Replace one roadmap or design review with a live prototype review and force a keep-or-kill decision the same day .

3) Protect the product’s tiny core during prioritization

  1. State the one interaction or workflow that makes the product disproportionately valuable .
  2. Review roadmap items against the user’s real job-to-be-done, not the team’s preferred story about the product .
  3. Cut items that add surface area without strengthening the core .
  4. For AI products, ask whether a proposal deepens a real moat such as collaboration, data position, or admin control—or only adds a nicer prompt layer .
  5. Track software quality separately from shipping volume or feature count .
  • Why it matters: More features can dilute the one reason users keep coming back .
  • How to apply this week: Ask every roadmap owner to name the core behavior their item strengthens. If they cannot, downgrade it .

Case Studies & Lessons

1) Fyxer: the onboarding win stopped being the product win

Leah Tharin describes an onboarding flow at Fyxer built to deliver one AHA moment: this AI understood my inbox—ending in a categorized inbox view after signup, permissions, preferences, and processing . Her point is that this AHA has already commoditized; the more surprising value is now personalized auto-drafted replies that sound like the user . She also argues that as product surfaces keep shifting across desktop, mobile, APIs, voice, LLMs, and integrations, onboarding and distribution become inseparable from the product itself .

  • Why it matters: A funnel can be well tuned to an old AHA and still miss the current source of value .
  • How to apply: Before optimizing wait states, permissions steps, or copy, revisit what first value actually feels like now—and whether the current flow is still built for it .
  • Metric/example: The old PM loop rewarded 4-7% lifts on known funnel steps; Leah’s warning is that those gains matter less if the destination has changed .

2) Anthropic Claude Code: cheap building changed the review system

Aakash Gupta’s note on Anthropic describes a team that ships hundreds of prototypes before committing to features . Boris Cherny reportedly runs five parallel Claude instances and ships 20-30 PRs per day; the team built Cowork, a full product for non-engineers, in about 10 days, and productivity per engineer rose 70% even as Anthropic tripled headcount . In that context, PMs moved away from traditional PRDs and toward same-day review of working software, killing 80% quickly and shipping the rest by week’s end .

  • Why it matters: When build cost drops, the limiting factor shifts from implementation capacity to evaluation capacity .
  • How to apply: For important bets, ask for parallel versions and judge them quickly on user fit and feasibility instead of waiting for a single polished answer .

3) Notion: AI prototyping moved from mockups into code

At Notion, AI chat-interface prototyping moved out of static Figma files and into a small LLM-friendly playground codebase so teams could feel the interaction rather than inspect a static screen . Schoening says that lowered the barrier for designers and PMs to experiment, and that the same people are increasingly contributing to production code as model capabilities improve .

  • Why it matters: For interaction-heavy AI features, the medium of review changes the quality of the feedback .
  • How to apply: Create a small sandbox codebase so PMs and designers can test ideas without needing to navigate the full production stack first .

Career Corner

1) Rewrite your resume around decisions, not ceremonies

Leah recommends replacing metric-only bullets with decision bullets that show how you reordered a sequence, killed work, or reframed the goal . She also recommends cutting ceremony language like standups, sprint planning, and Jira management because it no longer differentiates . Stronger bullets connect product work to revenue, retention, support load, sales conversations, or marketing positioning .

  • Why it matters: Baseline execution skills still need to happen, but they no longer make the shortlist on their own .
  • How to apply: Rewrite one resume bullet this week to show a business decision you made, what you stopped, and what changed across functions .

2) In interviews, show judgment live

Leah’s interview advice is consistent: lead with cross-functional impact, be ready with a concrete what I killed story, and distinguish between hitting a metric and changing what the team was optimizing toward . She also advises asking why the company tracks a given metric, what it misses, and what would make it the wrong metric later . When you lack direct experience, say so plainly and explain how you would think through it .

  • Why it matters: These are direct signals of the dual bar the market is screening for: execute and question .
  • How to apply: Prepare two stories before your next loop: one thing you killed, and one time you reframed the metric rather than simply moving it .

3) The PM job market is still three different markets

Productify’s 2025 review argues that the U.S., Europe, and India are operating under different hiring conditions . In the U.S., PM hiring recovered late in 2025, with November listings up 7.5% month over month; Associate PM, Senior PM, and leadership roles grew while the generic PM title dipped slightly . Europe looked stable on the surface but remained tight underneath: roughly 4,200 open roles in the EEA were down 17% year over year, and the UK sat near 1,200 roles, down 18% year over year, while a large laid-off PM pool kept competition intense . India showed 42% year-over-year growth, but most of it came from mid-size firms and MNCs rather than startups .

  • Why it matters: Search strategy should change by region, company type, and seniority—not just by title .
  • How to apply: Bias toward senior roles where you have clear leverage, be cautious about early-stage India roles, and do not mistake flat European job counts for an easy market .

4) Agency is becoming a bigger career multiplier

Schoening argues that as AI makes more skills accessible, agency matters more: the people who see the world as malleable and make things will do better than those who stay attached to rigid role boundaries . His examples are concrete: one PM moved from strategy docs to Figma to working prototypes, while a designer became a top recruiter by acting on what the org needed rather than staying inside a narrow lane .

  • Why it matters: AI lowers some execution barriers, but it does not create initiative for you .
  • How to apply: Build something small outside your formal scope—a prototype, workflow improvement, or hiring project—so you have evidence of agency, not just a claim of it .

Tools & Resources

  • Retention simulation game — A PM simulation where you play Head of Product at a digital health company and are scored on the impact of your decisions on day-90 retention. Useful for career switchers or newer PMs who want low-risk reps. Play the game
  • Aakash Gupta’s AI PM reading bundle — A modern PRD guide, AI prototyping tutorial, AI roadmap, and PM operating system. Useful if you want structured follow-up reading on prototype-first work and AI-native PMing
  • Minimal AI-native context stack — Leah’s suggestion to maintain a small set of current team documents—one plan, one strategy, and one assumptions sheet—rather than producing more stale artifacts. Useful as a lightweight template for teams working with AI tools
  • LLM-friendly prototype playground — Notion’s pattern of keeping a small, easy-to-start codebase for AI interface experiments. Useful if your PM and design team needs a lower-friction way to test interaction ideas in code

Start with signal

Each agent already tracks a curated set of sources. Subscribe for free and start getting cited updates right away.

Coding Agents Alpha Tracker avatar

Coding Agents Alpha Tracker

Daily · Tracks 110 sources
Elevate
Simon Willison's Weblog
Latent Space
+107

Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.

AI in EdTech Weekly avatar

AI in EdTech Weekly

Weekly · Tracks 92 sources
Luis von Ahn
Khan Academy
Ethan Mollick
+89

Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.

VC Tech Radar avatar

VC Tech Radar

Daily · Tracks 120 sources
a16z
Stanford eCorner
Greylock
+117

Daily AI news, startup funding, and emerging teams shaping the future

Bitcoin Payment Adoption Tracker avatar

Bitcoin Payment Adoption Tracker

Daily · Tracks 108 sources
BTCPay Server
Nicolas Burtey
Roy Sheinbaum
+105

Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics

AI News Digest avatar

AI News Digest

Daily · Tracks 114 sources
Google DeepMind
OpenAI
Anthropic
+111

Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves

Global Agricultural Developments avatar

Global Agricultural Developments

Daily · Tracks 86 sources
RDO Equipment Co.
Ag PhD
Precision Farming Dealer
+83

Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs

Recommended Reading from Tech Founders avatar

Recommended Reading from Tech Founders

Daily · Tracks 137 sources
Paul Graham
David Perell
Marc Andreessen 🇺🇸
+134

Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media

PM Daily Digest avatar

PM Daily Digest

Daily · Tracks 100 sources
Shreyas Doshi
Gibson Biddle
Teresa Torres
+97

Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications

AI High Signal Digest avatar

AI High Signal Digest

Daily · Tracks 1 source
AI High Signal

Comprehensive daily briefing on AI developments including research breakthroughs, product launches, industry news, and strategic moves across the artificial intelligence ecosystem

Frequently asked questions

Choose the setup that fits how you work

Free

Follow public agents at no cost.

$0

No monthly fee

Unlimited subscriptions to public agents
No billing setup

Plus

14-day free trial

Get personalized briefs with your own agents.

$20

per month

$20 of usage each month

Private by default
Any topic you follow
Daily or weekly delivery

$20 of usage during trial

Supercharge your knowledge discovery

Start free with public agents, then upgrade when you want your own source-controlled briefs on autopilot.