We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Your intelligence agent for what matters
Tell ZeroNoise what you want to stay on top of. It finds the right sources, follows them continuously, and sends you a cited daily or weekly brief.
Your time, back
An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers verified daily or weekly briefs.
Save hours
AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.
Full control over the agent
Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.
Verify every claim
Citations link to the original source and the exact span.
Discover sources on autopilot
Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.
Multi-media sources
Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.
Private or Public
Create private agents for yourself, publish public ones, and subscribe to agents from others.
3 steps to your first brief
Describe your goal
Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.
Review and launch
Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Get your briefs
Get concise daily or weekly updates with precise citations directly in your inbox. You control the focus, style, and length.
Logan Kilpatrick
Romain Huet
Cat Wu
🔥 TOP SIGNAL
Anthropic's internal Claude Code/Cowork playbook is the highest-signal drop today. Cat Wu says AI has pulled many Claude Code feature timelines from six months to one month, one week, or even one day, and that the team ships almost all Claude Code features first as Research Preview to lower commitment and get feedback fast. The interesting part is the operating system around it: weekly metrics readouts, explicit team principles, engineers dropping ready features into a launch room, and docs/PMM turning those into launches the next day; PRDs are mostly reserved for infra-heavy or ambiguous work.
"We want to remove every single barrier to shipping things."
🛠️ TOOLS & MODELS
- OpenAI GPT-5.5 — live in Codex and ChatGPT today; API is coming soon. Reported evals: 82.7% Terminal-Bench 2.0, 73.1% Expert-SWE, 58.6% SWE-Bench Pro, 78.7% OSWorld-Verified. Codex gets 400K context, with 1M in the API.
- Codex app update — full browser use, computer use, in-app docs/PDF viewer, non-dev mode, global dictation, and a new auto-review mode. Auto-review uses a guardian agent to vet higher-risk actions so Codex can keep running tests, builds, long tasks, and automations with fewer manual approvals; Alexander Embiricos says it is now his default mode.
- Early GPT-5.5 read from actual users — Aaron Friel says the new Codex harness caused a "tidal wave" of PRs and has engineers running single tasks for 40+ hours; Will Koh says it handles ambiguous tasks with less prompting, finds the right code paths, and uses DB + telemetry tools in novel ways inside Ramp's Inspect harness. On frontend, Tylernotfound and Thibault Sottiaux both say 5.5 is the first OpenAI model that feels like a real programming partner.
- But the caveats are real — Theo calls GPT-5.5 the best code-writing model he has used, but also says it needs stricter instructions, explores when under-specified, is hard to steer back once it goes off track, and is expensive at $5/M input and $30/M output. Matthew Berman and Riley Brown both argue the right evaluation metric is task quality + time + total tokens/cost, not price-per-token alone.
- Claude Code quality fix — Anthropic says the recent quality regressions were three harness bugs, fixed in v2.1.116+, with subscriber usage limits reset. The standout bug repeatedly cleared older thinking after sessions sat idle for more than an hour, which made Claude look forgetful and repetitive; Simon Willison says "stale" sessions are a huge part of his real usage.
- Support-ticket-to-PR automation is already showing up in the wild — Jason Zhou says an AI agent read a support ticket, checked the database, found the root cause of a broken customer credits issue, and submitted the fix PR in 10 minutes with no human intervention; he says the usual path takes about two weeks through sprint prioritization.
💡 WORKFLOWS & TRICKS
- Choose the Anthropic surface by output type. Cat Wu's split is clean: CLI for one-off coding and newest features; Desktop for frontend work, live preview, and a graphical control plane across sessions; web/mobile to dispatch work on the go; Cowork for non-code outputs like docs, inbox/slack cleanup, and decks.
- Steal Cat Wu's async deck workflow. First connect the data sources relevant to your role (Slack, Gmail, Calendar, Drive). Then give Cowork the narrative, draft links, and constraints; ask it to propose an outline first; lock the outline; let it run for an hour or a few hours; then do the last-mile editing yourself—mainly trimming text and picking the final story.
- Build narrow internal apps for repetitive work, not demo-ware. Anthropic's sales team used Claude Code to build a web app that pulls Salesforce + Gong + customer notes and spits out tailored decks in seconds instead of 20-30 minutes. Cat Wu's explicit advice: pick something you do constantly, push the last 5-10% to 100% reliability, and build apps you will actually use every day.
- Simon Willison's plan-first build loop is worth copying verbatim. He started by probing the repo/problem in regular Claude chat, pasted the findings into
notes.md, had Claude Code writeplan.md, iterated the plan, then usedbuild it.plus Playwright red/green TDD, queued prompts, and small commits. He rannpx vitefor live preview, used a separate Claude session for GitHub Actions + Pages deploy, and finally used GPT-5.5/Codex as a second-model verification pass. - For large unstructured datasets, hardcode the expensive parallel ops. Listen models research data as a table (rows = responses, columns = extracted features), exposes classification as a hardcoded map-reduce tool, and can spawn ~500 constrained subagents with small models for quantitative passes. They keep a sandboxed Python fallback for the 20% long tail, and they split live vs. async runs: faster/smaller models with minimal thinking live, fuller scans with more thinking asynchronously.
- Improve agents from traces, not vibes. Cat Wu asks the model to explain its own failures so the team can fix prompts or harnesses; she says even 10 good evals can be enough to make progress visible. Listen goes deep on one or two traces at a time and runs a reviewer agent that checks reports for unsupported claims, while LangChain summarizes the broader loop as: traces -> evals/feedback -> reusable datasets -> iteration.
- Route reasoning effort by task. Multiple testers are landing on the same pattern: use low/medium reasoning for most work, save high/xhigh for long builds, and compare models on quality, tokens, time, and total cost on the exact task you care about.
👤 PEOPLE TO WATCH
- Cat Wu — rare operator-level walkthrough of how a frontier lab is actually using Claude Code and Cowork across PM, sales, applied AI prep, and launch operations.
- Simon Willison — still the cleanest source for reproducible agent workflows: today he published the LiteParse build notes and a Codex-subscription plugin for
llm, both with exact commands and verification steps. - Romain Huet — useful if you want the shortest path from launch post to practitioner feedback; his GPT-5.5 posts pair official claims with concrete reactions from OpenAI and Ramp builders.
- Logan Kilpatrick — one of the few people talking concretely about how agent-written code gets into a production codebase: CI green, test pass, then handoff to engineers who steward the final merge and infrastructure.
- Theo — high-signal skeptic on GPT-5.5: he calls it the best code-writing model he has used, while documenting the practical annoyances around context contamination, steering, and cost.
🎬 WATCH & LISTEN
- 36:08-40:50 — Cat Wu on Cowork as an async PMM assistant. Best concrete non-code agent workflow today: connect the right sources, ask for an outline first, let it research launches and internal channels, then do the last-mile edit yourself.
- 15:08-18:04 — Riley Brown stress-tests GPT-5.5 browser use by making it build a canvas app and draw a flowchart. Good visual proof of spatial reasoning plus the new browser loop: build, inspect, click, correct, repeat.
- 06:55-08:03 — Florian Jungermann on the table + map-reduce pattern. If you are building agents over lots of messy qualitative data, this is the clip: rows as responses, columns as extracted features, and one tool call fanning out to hundreds of constrained subagents.
- 12:03-13:49 — Logan Kilpatrick on getting agent-written code into production. The important bit is not the coding demo; it is the handoff model: technical staff get the change green, engineering owns the final merge, and the same team improves the next loop.
📊 PROJECTS & REPOS
llm-openai-via-codex— Simon Willison had Claude Code reverse-engineeropenai/codexauth so hisllmCLI can use a Codex subscription. Setup: install Codex CLI,uv tool install llm,llm install llm-openai-via-codex, thenllm -m openai-codex/gpt-5.5 '...'; it also supports images, chat, logs, and tools.openai/codex— the open-source CLI + app server behind the Codex ecosystem. OpenAI's Romain Huet says the point is to let ChatGPT subscriptions work in the app, terminal, JetBrains, Xcode, OpenCode, Pi, and even Claude Code.run-llama/liteparse+ web demo — LiteParse is a fast, heuristics-based PDF parser that can be used from the CLI or inside a coding agent; Simon Willison used Claude Code to build a browser version in about an hour.- Agentic red/green TDD guide — Simon's red/green TDD write-up is still one of the best concrete pattern docs to hand an engineer who is moving from "prompting" to repeatable agent workflows.
Editorial take: the real edge today is not "which model won"—it is whether you can wrap a strong model in plans, evals, trace review, reviewer loops, and narrow internal apps that people use every day.
Paul Graham
Garry Tan
Aravind Srinivas
1) Funding & Deals
PetualAI — $20M total. Petual raised $20M total, including a $17M round led by a16z and a $3.2M round led by First Round, with participation from Cowboy Ventures, Elad Gil, and founders from Lyft and Opendoor . Founder Snir Kodesh previously led engineering at Retool and held senior engineering roles at Lyft . The company applies agentic AI to SOX testing and internal audit, autonomously gathering evidence and generating auditor-ready workpapers in minutes rather than hours; it says S&P 500 and NASDAQ 100 customers see 68–80% efficiency gains . a16z’s thesis is that SOX is the entry point to a broader AI-powered control system for audit and compliance .
Glif — $17.5M seed. Glif announced a $17.5M seed led by a16z and USV . It positions itself as a creative super agent that uses virtually every available AI model to create ads, marketing content, films, voiceovers, music, and more inside one conversation . a16z’s angle is workflow consolidation: marketers often touch multiple gen-AI products in a single session, while Glif tries to collapse that sprawl into one agent; the founding team is described as strong across both technical and creative domains .
Mindfort — $3M seed. Mindfort raised a $3M seed to build autonomous security agents that run pentests on every CI/CD push, chain vulnerabilities into working proofs of exploit, and ship fixes as pull requests .
Ulysses / The Ocean Company — $46M. Ulysses raised $46M led by a16z American Dynamism to build ocean infrastructure and treat the ocean as a permanent economic fixture . Its stack combines $50,000 Mako AUVs, which the company says are 10x to 100x cheaper than incumbent models, with the Leviathan surface craft and Kraken launch/recovery platform for persistent subsea operations without crewed ships . Management says demand is already appearing at fleet scale, with one commercial customer requesting 10,000 vehicles and another at least 1,000 .
2) Emerging Teams
Dayjob. YC says Dayjob is building AI scheduling for waste trucks and is already at $496K ARR with 12 customers .
Huscarl. Huscarl is pitching an AI-native advisory model for corporate insurance buyers, with a claim of 30% savings on annual premiums and zero downside .
Asendia AI. Asendia AI builds AI recruiters for staffing agencies and enterprises by cloning top recruiters into agents that match, screen, and submit candidates 10x faster . YC highlighted founders @LajmiRihab and @zormati_ba at launch .
Kinect. Kinect is turning e-commerce stores into AI-powered storefronts that adapt to each visitor in real time and capture new buying-intent data for merchants .
3) AI & Tech Breakthroughs
- GStack is turning into a fast-growing open-source agent-coding toolkit. Garry Tan’s toolkit turns Claude Code into an AI engineering team with specialist skills such as Office Hours, adversarial review, design-shotgun, browser QA, and parallel PR workflows . Tan says the scaffolding should stay thin, describes the result as a level-seven software factory rather than full autonomy, and says the repo was built three weeks earlier and had already crossed 70,000 GitHub stars . In practice, he says he runs 10 to 15 parallel Claude sessions and can land 10 to 50 PRs in a day across projects .
“Basically, I’ve written a lot of code in my career and I’m here to tell you we are in a completely new era of building software, the agent era.”
DeepSeek is making a new long-context efficiency push. DeepSeek says V4 introduces token-wise compression plus DSA (DeepSeek Sparse Attention), delivering world-leading long-context efficiency with sharply lower compute and memory costs and making 1M context the default across official services . Adoption signals were immediate: DeepSeek-V4-Pro cleared 500+ likes on Hugging Face in 28 minutes and reached #1 trending after 43 minutes . Early outside commentary described the first benchmark numbers as “astounding” and comparable to top frontier models, but verification was still underway .
Replit is targeting the post-codegen security gap. Replit argues AI has already automated most of the software development lifecycle, leaving DevSecOps as the next bottleneck, and launched Auto-Protect as a 24x7 vulnerability scanner for live apps . Replit frames it as the next step after Replit Agent: extending AI from building software into monitoring, security, and upkeep .
Model-native interfaces are being prototyped. Flipbook streams every pixel on screen directly from a model, with no HTML, layout engine, or code, and applies the same idea to video by generating each frame live without timelines, compositors, or render farms . The prototype was built by @zan2434, @eddiejiao_obj, and @drewocarr .
4) Market Signals
AI-written code has already crossed the 75% line in important startup and big-tech cohorts. Paul Graham says YC startups passed 75% AI-written code at least one or two years ago . A separate data point cited this week says Google went from 0% to 75% AI-written code in roughly two years .
More of the commercial logic is shifting to the application layer. Latent.Space describes an agent-lab playbook: start with frontier models, specialize for a domain, then train or distill your own model once workload and user data justify the cost and latency gains . Aravind Srinivas makes the market version of the same argument: consumers buy products, pure model/API businesses are hard to defend as model gaps compress, and value accrues in the application layer and its harnesses .
Efficiency heuristics are tightening for AI-native SaaS. Team8 managing partner Alon Huri argues that AI-native companies are already hitting $2M to $5M in ARR per employee, and that headcount growing linearly with MRR is often a sign the company is acting more like an agency than software . His pre-PMF template is a four-person core team and an agent-first model where humans judge while agents execute tasks in sales, customer success, and ops .
LLMs are becoming both a distribution surface and a monitoring surface. ReqRes says it has 48,000 registered users and 300 daily signups with no paid marketing; ChatGPT is already its third-largest traffic source, and it says 333 universities teach with the product while 100+ engineers at one Big 4 IT services firm signed up on their own . Lima is building around the inverse problem: tracking how brands are mentioned across ChatGPT, Claude, Grok, Google AI, and Perplexity, with prompt suggestions plus prompt and citation breakdowns .
Founders are still distinguishing task automation from AGI. Garry Tan called openclaw “highly effective task-automation” and “genuinely impressive and useful,” but said AGI would require zero-shot identification and solution of novel, unscoped problems without human setup .
5) Worth Your Time
- GStack walkthrough:How to Make Claude Code Your AI Engineering Team shows Office Hours, adversarial review, design-shotgun, browser QA, and parallel PR workflows in one system .
Ocean thesis:The Great Blue Frontier lays out Ulysses’ thesis, the Mako/Leviathan/Kraken stack, and the early demand signal for 1,000–10,000 vehicle fleets .
Agent-labs framework:AIE Europe Debrief + Agent Labs Thesis covers the frontier-model to domain-specialized to in-house-model playbook, coding-market scale, and the idea of zero-human-review “dark factories” .
Security agents:Mindfort’s seed announcement outlines a product that moves from autonomous pentesting to shipping fixes as pull requests .
Production agent architecture:Max Agency with ListenLabs CTO Florian Jue discusses self-reviewing subagents, sandboxes, abstractions, and response analysis at scale .
Unsloth AI
Sam Altman
Greg Brockman
Top Stories
Why it matters: The day’s biggest signal was a two-front competition: OpenAI pushed a more efficient frontier model into products, while DeepSeek answered with a large open-weight release built around million-token context.
- OpenAI launched GPT-5.5 in ChatGPT and Codex, with API access coming soon. OpenAI says the model is built for real work and agents, matches GPT-5.4 per-token latency, and uses significantly fewer tokens on Codex tasks . API pricing is $5 / 1M input and $30 / 1M output with a 1M context window. Official results highlighted 82.7% on Terminal-Bench 2.0 and 81.8% on CyberGym . Artificial Analysis says GPT-5.5 now leads its Intelligence Index by 3 points, though its AA-Omniscience hallucination rate remains high at 86%.
- DeepSeek open-sourced V4 Pro and Flash. The new family ships with V4-Pro at 1.6T total / 49B active and V4-Flash at 284B / 13B active, both with 1M context and live API/web availability . DeepSeek says V4 uses token-wise compression plus DeepSeek Sparse Attention to cut long-context compute and memory costs . Pricing is aggressive: $1.74/$3.48 for Pro and $0.14/$0.28 for Flash per 1M input/output tokens . vLLM says the V4 design reduces per-layer KV state at 1M context by about 8.7x, and Artificial Analysis already ranks V4 Pro as the top open-weights model on GDPval-AA .
Research & Innovation
Why it matters: The most useful research today focused on making training and agent systems more scalable, while clarifying where multi-agent hype breaks down.
- Google DeepMind’s Decoupled DiLoCo enables training across multiple data centers without stalling on failures; Google says it trained a 12B Gemma model across four U.S. regions and mixed TPU v6e/v5p hardware without performance loss .
- Neural Garbage Collection (NGC) trains models to manage their own KV cache with RL/GRPO, aiming to stabilize memory for long-horizon reasoning, agents, and tool use .
- A new paper on diversity collapse found that multi-agent LLM systems can converge toward near-identical outputs over time because shared context and mutual feedback homogenize the group .
Products & Launches
Why it matters: New launches are less about one-off demos and more about dependable autonomy, memory, and orchestration.
- Codex added broader browser/computer support plus auto-review, letting the agent keep moving through tests, builds, files, and UI tasks while a separate checker reviews higher-risk actions .
- Sakana Fugu entered beta as a multi-agent orchestration system; Sakana says it has hit SOTA on SWE-Pro, GPQA-D, and ALE-Bench and ships as an OpenAI-compatible API with Mini and Ultra modes .
- Claude Managed Agents Memory is now in public beta, giving Anthropic-managed agents a memory layer that learns from prior sessions .
Industry Moves
Why it matters: Competition is shifting toward distribution, enterprise rollout, and on-device deployment.
- OpenAI and NVIDIA piloted a company-wide Codex rollout, and OpenAI says it is now offering whole-company deployments to other enterprises .
- Liquid AI signed a multi-year Mercedes-Benz partnership to bring embedded speech, language understanding, and reasoning into future MBUX systems .
- Glif V2 launched alongside a $17.5M seed led by a16z and USV, positioning itself as a creative super agent for ads, films, voiceovers, and more .
Policy & Regulation
Why it matters: The clearest government action today was around model theft and distillation.
- A U.S. memo said foreign entities, primarily in China, are running industrial-scale distillation campaigns against American AI and said the government will act to protect domestic innovation .
Quick Takes
Why it matters: These are smaller updates, but each points to where competition is moving next.*
- Kimi K2.6 is now the #1 open model in both Vision Arena and Document Arena.
- Qwen3.6-27B can run locally on 18GB RAM and beats the much larger Qwen3.5-397B-A17B on major coding benchmarks .
- Kling 3.0 is live with native 4K video generation, without upscaling .
- Anthropic says recent Claude Code quality issues were traced to three problems, fixed in v2.1.116+, with usage limits reset for subscribers .
Simon Willison
Satya Nadella
Cat Wu
What stood out
Most of today's recommendations focused on large-system change rather than narrow tactics: AI capability scaling, historical technology transitions, economic development, civilizational rise and fall, and nation-building .
Most compelling recommendation
- Title: the scaling laws paper
Content type: Research paper
Author/creator: Not specified in the provided excerpt
Link/URL: Source context: Microsoft CEO Talks AI, Australia and Upskilling workers
Who recommended it: Satya Nadella
Key takeaway: Nadella said that when he first read the paper, he thought that if its prediction held, it would be very interesting. He now says it has held that line as capability jumps arrive
Why it matters: This was the strongest example today of a leader pointing to a resource that directly shaped his model of AI progress
when I first read even the scaling laws paper, I felt like wow, this if this happens, this would be really interesting and it's turned out that it's held that line.
Books for thinking about the AI era through history and institutions
Source context for Cat Woo's recommendations: How Anthropic’s product team moves faster than anyone else | Cat Wu (Head of Product, Claude Code)
Title:The Technology Trap
Content type: Book
Author/creator: Not specified in the provided excerpt
Who recommended it: Cat Woo
Key takeaway: Woo said it studies the industrial and computer revolutions and how they affected workers
Why it matters: She recommended it specifically because history can help make the AI transition go wellTitle:How Asia Works
Content type: Book
Author/creator: Not specified in the provided excerpt
Who recommended it: Cat Woo
Key takeaway: Woo described it as a story about economic development and the policies and governments that create long-lasting successful economies
Why it matters: It is a direct recommendation for readers trying to understand how durable economic systems get builtTitle:Paper Menagerie
Content type: Book
Author/creator: Not specified in the provided excerpt
Who recommended it: Cat Woo
Key takeaway: Woo called it a book of short stories about coming of age, AI, and self-discovery
Why it matters: It broadens the set beyond policy and economic history into narrative work that still touches AITitle:Atlas Shrugged
Content type: Book
Author/creator: Ayn Rand
Who recommended it: Brian Armstrong
Key takeaway: Armstrong called it a classic that celebrates builders and said readers will start noticing the same characters and events playing out today
Why it matters: He framed it as a way to think about builders and recurring patternsTitle:The Changing World Order
Content type: Book
Author/creator: Ray Dalio
Who recommended it: Brian Armstrong
Key takeaway: Armstrong recommended it for understanding how civilizations rise and fall and how crypto can help create better countries
Why it matters: It ties macro history to Armstrong's view of crypto and institutional designTitle:From Third World to First
Content type: Book
Author/creator: Lee Kuan Yu
Who recommended it: Brian Armstrong
Key takeaway: Armstrong said it is worth reading for understanding nation-building
Why it matters: It was the clearest state-building recommendation in today's book set
One writer surfaced twice
David Breunig was the only author to appear more than once in today's recommendations: once through a direct Matt Mullenweg endorsement, and once through Simon Willison's security note that Mullenweg amplified .
- Title:Cybersecurity is Proof of Work Now
Content type: Blog post / article
Author/creator: David Breunig
Link/URL:https://www.dbreunig.com/2026/04/14/cybersecurity-is-proof-of-work-now.html
Who recommended it: Simon Willison; Matt Mullenweg endorsed the recommendation
Key takeaway: Willison highlighted Breunig's point that the ease of finding exploits makes proven open source libraries more valuable
Why it matters: Mullenweg's follow-on comment showed why the piece resonated with open-source-minded builders
the ease with which exploits can be found makes proven open source libraries more valuable
- Title:The Cathedral, the Bazaar, and the Winchester Mystery House
Content type: Essay
Author/creator: David Breunig
Link/URL:https://www.dbreunig.com/2026/03/26/winchester-mystery-house.html
Who recommended it: Matt Mullenweg
Key takeaway: Mullenweg simply called it a great essay
Why it matters: Even with little added commentary, it was a direct founder recommendation with a clear link to the source
A targeted watch for operators
- Title: Short advice clip from Dylan Field with Patrick O'Shaughnessy
Content type: Video clip
Author/creator: Dylan Field with Patrick O'Shaughnessy
Link/URL:https://www.youtube.com/watch?v=LF3aUIM57uw&t=2514s
Who recommended it: Bill Gurley
Key takeaway: Gurley said everyone in marketing/PR at an AI company, especially large model companies, should watch it, and described it as crisp and to the point
Why it matters: It was the narrowest operational recommendation in the set, aimed directly at how AI companies communicate
Bottom line
If you open one resource first, start with the scaling laws paper for the strength of the endorsement. If you want a more immediately actionable reading stack, pair The Technology Trap with How Asia Works for historical and institutional context on how major technology shifts and durable economies are managed .
Marty Cagan
Cat Wu
Big Ideas
1) AI is widening the gap between project coordination and real product management
Marty Cagan separates three PM models: backlog-owning software-factory product owners, roadmap-driven feature-team PMs, and empowered product-model PMs. He argues the first two are directly exposed to AI automation and layoffs, while the third stays valuable because it owns outcomes, rapid discovery, and solution shaping. Anthropic Head of Product Cat Wu reaches a similar conclusion from the other side: as code gets cheaper, the scarce skill becomes deciding what to build .
- Why it matters: PM leverage is moving away from specs, handoffs, and roadmap translation toward judgment about customers, value, viability, and strategy .
- How to apply: If your week is dominated by project mechanics, deliberately shift time into customer work, business constraints, prototyping, and outcome-oriented discovery with design and engineering .
2) AI-native teams are replacing long planning cycles with tighter operating loops
Anthropic says timelines that used to run 6–12 months have compressed to one month, one week, and sometimes one day. Its response is not no process but a tighter system: clear user and use-case definitions, research-preview launches, weekly metrics readouts, explicit team principles, and a lightweight launch lane across engineering, docs, marketing, and DevRel. PRDs still exist, but mainly for ambiguous or infrastructure-heavy work .
"The most important thing for building AI-native products is iterating quickly and finding a way to launch features every single week."
- Why it matters: Faster building makes ambiguity more expensive, not less. Teams need clearer goals and faster coordination even if they need fewer heavy documents .
- How to apply: Set the target user, target problem, and success condition before building; run a weekly metrics ritual; and reserve detailed docs for the places where ambiguity or infrastructure risk is still high .
3) AI should shrink research mechanics and expand insight mining
Sachin Rekhi argues AI improves research only if PMs reallocate time. His before/after split moves from 40% conducting, 50% producing, and 10% mining insights to 10%, 10%, and 80%. The new work is interactive: ask for more verbatims, challenge a theme, jump to a timestamp, or pull clips tied to a sub-theme. Cagan makes the same broader point: AI is most useful when it strengthens thinking, not when it replaces it .
- Why it matters: AI does not remove PM judgment; it reallocates judgment toward interpretation, skepticism, and decision-making .
- How to apply: Automate transcription, coding, and first-pass synthesis, then spend the saved time interrogating patterns and watching the raw customer moments that matter most .
4) High-signal discovery still depends on timing, framing, and human follow-up
Analysis of 4.2 million responses across nearly 6,000 in-app surveys found that exit surveys had the highest response rate at 15.5%, event-triggered surveys beat URL-based targeting 11.7% to 8.8%, and surveys that open with a single-choice question get 15.6% response versus 4.3% when they open-ended. Contextual open-ended questions nearly doubled response from 3.2% to 6% .
- Why it matters: AI can make research cheaper, but signal quality still depends on asking the right person at the right moment in the right format .
- How to apply: Trigger surveys after relevant behavior, lead with an easy structured question, then add contextual open-ended follow-ups and close the loop with a human when possible .
Tactical Playbook
1) Borrow Anthropic’s AI-native shipping loop
Step 1: Narrow the goal. Define the key user, key use case, and clear success condition up front. Anthropic’s example is explicit: professional enterprise developers safely reaching zero permission prompts .
Step 2: Ship in research preview. Anthropic uses research previews to get features into users’ hands quickly while signaling that the product is still early and may change .
Step 3: Align the team weekly. Use recurring metrics readouts and explicit team principles so people understand goals, drivers, and trade-offs without waiting on PM approval .
Step 4: Build a tight launch lane. When a feature is ready, engineering posts it in the launch room and docs, PMM, and DevRel can turn around launch materials the next day .
Step 5: Write heavier docs selectively. Save PRDs and one-pagers for ambiguous work or infrastructure-heavy projects, not for every feature .
Why it matters: This keeps speed high without pretending structure is optional .
2) Rebuild your research loop around insight mining
Step 1: Let AI compress the mechanics. Shift more of conducting and producing work to AI so you can spend proportionally more time mining insights .
Step 2: Query the corpus, not just the summary. Ask for more verbatims, contradictory evidence, frequency checks, timestamps, and theme-specific clips .
Step 3: Inspect the source moment. When a theme matters, go back to the actual customer clip or transcript moment instead of stopping at synthesized output .
Step 4: Keep PM judgment active. Rekhi’s warning is explicit: if you skip the mining step, you will get worse insights .
Why it matters: The quality gain comes from deeper questioning, not from automation alone .
3) Use the survey sequence that maximizes response and signal
Step 1: Always instrument an exit survey. Exit surveys produced the highest response rate at 15.5%; trigger them in cancellation or downgrade flows .
Step 2: Trigger on behavior, not URL. Event-triggered surveys outperform page-load targeting 11.7% to 8.8% .
Step 3: Start shallow. Lead with single-choice or multiple-choice, then ratings, then open-ended prompts .
Step 4: Make open-ended questions contextual. Questions tied to the user’s recent action outperform generic prompts 6% to 3.2% .
Step 5: Treat PMF surveys as a targeted instrument. Run them after activation, plan for 300–400 active users, and add a self-description question to reveal the ICP .
Why it matters: Each change removes friction or noise before you ask users for richer input .
4) Pressure-test product ideas before you build them
Step 1: Reframe the idea with forcing questions. YC’s GStack Office Hours starts with six questions before any building begins .
Step 2: Ask the demand question first. Start with: what is the strongest evidence that someone actually wants this? .
Step 3: Force competitive and business-model pushback. The demo challenged whether TurboTax, H&R Block, or Plaid already solved the need, then reframed the product as a wedge into tax-prep matchmaking instead of just document aggregation .
Step 4: Compare multiple approaches explicitly. The tool evaluates smaller and larger solution paths before committing .
Step 5: Run adversarial review. In the demo, adversarial review found and auto-fixed 16 issues, raising the design-doc score from 6/10 to 8/10 .
Step 6: Only then move into design variants and implementation planning. The flow continues into design shotgun, CEO review, engineering review, and auto-plan .
Why it matters: It makes demand, feasibility, and failure modes explicit before code creates false confidence .
Case Studies & Lessons
1) Anthropic: speed works best as a company-wide decision system
Anthropic credits two factors for execution speed: a unifying safe-AGI mission and focus over diversification. Cat Wu says teams are willing to deprioritize their own local goals in service of the broader company mission, which makes cross-org trade-offs faster . Its PM team is also organized around research, developer platform, Claude Code and Co-work, enterprise, and growth, reflecting how much product work sits around model launches, APIs, enterprise controls, and adoption .
- Lesson: Speed is easier when teams share a clear decision filter, not just a faster engineering stack .
- Watchout: Anthropic also says the trade-off can be less product consistency, overlapping features, and more onboarding needs, which is why features like /powerup were added .
- How to apply: If you want faster shipping, pair it with explicit prioritization rules and deliberate onboarding support .
2) PostHog, Superhuman, and Slack: surveys can shape roadmap, onboarding, and retention
PostHog routes every survey response into a dedicated Slack channel and has a human respond quickly; its Session Replay exit survey reaches a 42% response rate . Superhuman used its PMF survey to learn both who to build for and what to build next: feedback from somewhat disappointed users who valued speed pointed to a mobile app, and a self-description question helped narrow the ICP and lift the PMF score from 22% to above 40% . Slack kept a three-question onboarding survey across multiple product iterations because it personalizes setup and creates durable segmentation data .
- Lesson: Survey systems work when they drive decisions, not spreadsheets .
- How to apply: Close the loop with humans, reuse onboarding answers inside the product, and design PMF surveys to learn both who and what .
3) Honeywell: structured pre-work can make workshops evidence-driven
Strategyzer describes Honeywell growth symposiums with 10–14 teams across regions, pre-work in playbooks, weekly office hours, a one-day workshop, and final pitches evaluated on how teams create value, capture value, and support claims with evidence . The system uses smaller targeted workspaces and reusable assets, which lets teams stay autonomous and keeps leadership focused on evidence rather than presentation polish .
- Lesson: Move concept learning and artifact creation into pre-work so the live workshop can focus on synthesis and decisions .
- How to apply: Run lightweight pre-learning, reuse the same customer/problem assets across exercises, and keep live sessions for review, trade-offs, and leadership decisions .
4) Gusto: disciplined friction removal compounds into trust
Tony Fadell highlighted Gusto’s 75 product changes, all sourced from real customer problems, and summarized the operating principle as fixing friction, earning trust, and compounding over time . Joshua Reeves adds that Gusto sees its role as taking work off small businesses’ shoulders rather than acting as just a tool .
- Lesson: Products become partners when they repeatedly remove customer work, not just when they add capabilities .
- How to apply: Treat friction removal as a continuing product habit and measure it as part of trust-building, not as one-off cleanup .
Career Corner
1) The PM market is splitting into vulnerable and advantaged roles
Cagan’s framework is blunt: backlog-owning product owners and project-model PMs are under direct pressure from AI automation, while empowered product-model PMs remain in demand because they shape outcomes and solutions .
- Why it matters: The safest PM path is moving toward judgment-heavy work, not doubling down on coordination-heavy work .
- How to apply: Build credibility in discovery, strategy, customer value, and business viability—not just planning rituals and handoffs .
2) Product taste is becoming a hiring filter, but team design still matters
Wu says product taste is the rarest skill as code gets cheaper, and Anthropic is comfortable hiring engineers with strong product taste because it reduces shipping overhead. She also says an engineering background is especially useful right now because it improves judgment about how hard something will be to build . At the same time, Cagan argues true product-design-engineering triple threats are rare and not scalable, so most companies still do better with a strong product trio .
- Why it matters: Hiring may blur role boundaries, but scalable teams still need both taste and complementary depth .
- How to apply: Train taste by spending more time with users, shipped product details, and model behavior; build enough technical fluency to reason about effort; and use language models as a coach for product sense rather than as a PRD generator .
3) Business savvy and reflective time are compounding skills
Cagan says aspiring PMs need product sense plus real business savvy, and he is more positive than many tech leaders on MBAs as one possible foundation for that breadth . Nir Eyal adds a practical operating layer: timeboxing is about planning what and when, measuring how much focused work you can do without distraction, and turning values into time across self, relationships, and work .
"You can do it all, you just can’t do it all at once."
- Why it matters: AI can raise leverage, but career progress still depends on judgment, breadth, and protecting reflective work from constant reactivity .
- How to apply: Invest in business fluency, block time for reflective work, and automate repetitive tasks only when you can make them reliably work end to end—Wu’s view is that 95% automation is not really automation .
Tools & Resources
1) GStack Office Hours demo
This YC-style skill turns early product or startup ideas into an interactive pressure test with six forcing questions, competitive pushback, adversarial review, and design exploration .
- Why explore it: It is useful when you need sharper pre-build validation, especially around evidence of demand and failure modes .
- How to use it: Bring a rough idea, answer the demand question first, compare multiple solution paths, then run adversarial review before planning .
2) Strategyzer Playbooks webinar
Strategyzer positions playbooks as a more scalable alternative to books, training, or consulting for business-model and value-proposition work. The core ingredients are step-by-step guidance, video explanations, pre-structured workspaces, reusable data assets, and built-in facilitation .
- Why explore it: It is aimed at teams that want less blank-canvas workshop time and more repeatable outcomes .
- How to use it: Filter the library by tool, time, and expertise; launch a project; assign pre-learning; and let teams work through smaller focused workspaces instead of giant boards .
3) In-App Surveys: The Playbook from 4M PostHog Responses
This is a benchmarked survey resource built from 4.2 million responses across nearly 6,000 in-app surveys .
- Why explore it: It gives concrete response-rate baselines for exit surveys, event-triggered surveys, question order, open-ended framing, and PMF survey design .
- How to use it: Start with exit and event-triggered surveys, redesign question order, and add the self-description question to PMF surveys after activation .
4) A practical note for API PMs: make specs LLM-friendly before you overbuild agent support
One community thread notes that much AI product-development advice is still front-end-heavy. A concrete takeaway for API teams is to make schemas and API specs LLM-friendly with more context, and to consider CLI enablement before jumping to MCP if your usage scale does not justify it .
- Why explore it: It is a useful reminder that AI-native PM practice is not only about UI prototyping; API surfaces need their own adaptation path .
- How to use it: Start by rewriting specs for clearer machine-readable context, then test simpler CLI flows before adding heavier protocol layers .
Greg Brockman
Jeff Dean
Romain Huet
OpenAI turns GPT-5.5 into the day’s center of gravity
GPT-5.5 arrives with longer context, lower token use, and near-term API access
OpenAI says GPT-5.5 is a new class of intelligence for real work and agents: it is built to understand complex goals, use tools, check its work, and carry tasks through to completion . It is rolling out now to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, with GPT-5.5 Pro for Pro, Business, and Enterprise in ChatGPT; API access is planned very soon after security and safeguards work .
Quick facts:
- Context window: 1M tokens in the API
- API pricing: $5 per 1M input tokens and $30 per 1M output tokens
- Serving profile: GPT-5.4-like per-token latency, but significantly fewer tokens per task
Artificial Analysis says GPT-5.5 (xhigh) now leads its Intelligence Index and tops GDPval-AA, Terminal-Bench Hard, and APEX-Agents-AA; it also says GPT-5.5 (medium) matches Claude Opus 4.7 (max) at roughly one-quarter the cost. Its notable caveat is reliability: on AA-Omniscience, GPT-5.5 posted the highest accuracy at 57% but an 86% hallucination rate, above Opus 4.7 and Gemini 3.1 Pro Preview .
Why it matters: The launch ties raw model performance to token efficiency, pricing, and product availability, which is the mix needed for heavier agent workflows .
Codex moves closer to general computer work
With GPT-5.5, OpenAI says Codex now gets more of the job done across the browser, files, docs, and the computer itself; it can interact with web apps, test flows, click through pages, capture screenshots, and iterate until a task is finished . Greg Brockman said the combination is no longer just for coders but for broader computer work, including spreadsheets and slides, and OpenAI is now rolling Codex out across whole companies after a pilot with NVIDIA .
Early users described a step change in autonomy. At Ramp, GPT-5.5 plugged into an internal harness and began discovering how to use databases and telemetry tools without explicit guidance; inside OpenAI, one engineer said it produced a tidal wave of pull requests and cases where the model worked on a single task for more than 40 hours . OpenAI also turned on auto-review in Codex, where a guardian agent evaluates higher-risk actions so long tasks can continue with fewer approvals .
Why it matters: The noteworthy change is not only a stronger model, but a broader product surface for handing off longer, multi-tool workflows inside real organizations .
The subtext: inference is becoming strategy
Sam Altman separately said OpenAI now has to become an AI inference company and praised the inference team for serving GPT-5.5 efficiently .
“To a significant degree, we have to become an AI inference company now.”
NVIDIA said GPT-5.5-powered Codex runs on GB200 NVL72 systems, that more than 10,000 NVIDIA employees are already using it, and that those systems deliver 35x lower cost per million tokens and 50x higher token output per second per megawatt than prior-generation platforms . NVIDIA also described the launch as part of a long-running OpenAI partnership that now includes a commitment to deploy more than 10 gigawatts of NVIDIA systems for next-generation AI infrastructure .
Why it matters: On the same day as a flagship model launch, both OpenAI and NVIDIA framed the bottleneck as serving capable agents economically and at enterprise scale, not only training the next model .
The rest of the field kept moving on openness and infrastructure
DeepSeek open-sources V4 with 1M context
DeepSeek said its V4 preview is live and open-sourced, pitching cost-effective 1M context length. The release includes V4-Pro at 1.6T total / 49B active parameters and V4-Flash at 284B total / 13B active parameters, with updated API access and public weights and technical report links .
Emad Mostaque estimated the final training runs at under $14M for Pro and under $4M for Flash, with total compute across data prep, tuning, and testing around 10x those figures .
Why it matters: The open-model push is still advancing on long context and cost claims at the same time, rather than conceding those fronts to closed labs .
Google DeepMind shows more failure-tolerant frontier training
Google DeepMind introduced Decoupled DiLoCo, a system for training advanced models across multiple data centers without halting when hardware fails . The company says it combines Pathways and DiLoCo, is self-healing during induced failures, trained a 12B Gemma model across four U.S. regions over low-bandwidth networks, and mixed TPUv5p with TPU v6e without slowing training .
Jeff Dean said the approach lets (N-1)/N units proceed when one fails, framing it as a continuation of Google’s long-running work on large-scale fault-tolerant training .
Why it matters: As cluster sizes grow, resilience across regions and hardware types is becoming a research advantage in its own right .
Start with signal
Each agent already tracks a curated set of sources. Subscribe for free and start getting cited updates right away.
Coding Agents Alpha Tracker
Elevate
Latent Space
Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.
AI in EdTech Weekly
Luis von Ahn
Khan Academy
Ethan Mollick
Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.
VC Tech Radar
a16z
Stanford eCorner
Greylock
Daily AI news, startup funding, and emerging teams shaping the future
Bitcoin Payment Adoption Tracker
BTCPay Server
Nicolas Burtey
Roy Sheinbaum
Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics
AI News Digest
Google DeepMind
OpenAI
Anthropic
Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves
Global Agricultural Developments
RDO Equipment Co.
Ag PhD
Precision Farming Dealer
Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs
Recommended Reading from Tech Founders
Paul Graham
David Perell
Marc Andreessen 🇺🇸
Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media
PM Daily Digest
Shreyas Doshi
Gibson Biddle
Teresa Torres
Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications
AI High Signal Digest
AI High Signal
Comprehensive daily briefing on AI developments including research breakthroughs, product launches, industry news, and strategic moves across the artificial intelligence ecosystem
Frequently asked questions
Choose the setup that fits how you work
Free
Follow public agents at no cost.
No monthly fee