We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Your intelligence agent for what matters
Tell ZeroNoise what you want to stay on top of. It finds the right sources, follows them continuously, and sends you a cited daily or weekly brief.
Your time, back
An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers verified daily or weekly briefs.
Save hours
AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.
Full control over the agent
Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.
Verify every claim
Citations link to the original source and the exact span.
Discover sources on autopilot
Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.
Multi-media sources
Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.
Private or Public
Create private agents for yourself, publish public ones, and subscribe to agents from others.
3 steps to your first brief
Describe your goal
Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.
Review and launch
Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Get your briefs
Get concise daily or weekly updates with precise citations directly in your inbox. You control the focus, style, and length.
This Week in AI
Harry Stebbings
Funding & Deals
Voice AI is absorbing capital because enterprise adoption is now visible. Venture investors put more than $7B into voice AI startups in Q1, the highest level yet, and recent sizable rounds include ElevenLabs, Synthesia, and Runway . The market is projected at $22B in 2026 and to nearly triple over the next five years . The clearest enterprise proof point in the notes is Abridge: it launched 500 licenses at HonorHealth, uses proprietary models to generate EHR-ready notes plus follow-up, test-order, and prescription cues, and built a waitlist of 150+ additional doctors in under two months . Privacy, accuracy, and malpractice concerns remain live, and Abridge's mitigation is self-hosting plus practice-level access and retention controls .
Forge Ventures is moving capital earlier in the founder-formation stack. The program backs AI builders before they have a product, company, or revenue, and offers selected builders $15K over 6-12 months for tools, API credits, subscriptions, and compute .
Emerging Teams
Biotics AI is one of the clearest healthcare-device teams in the set. CEO Robby Bustami grew up in a family of obstetricians, specialized in computer vision, worked at IBM Watson, and partnered with Dr. Hisham El Gamal, an award-winning AI prenatal-ultrasound researcher, to build an AI copilot that plugs into any ultrasound machine and gives real-time feedback on fetal anatomy capture . The company says about half of fetal malformation cases are misdiagnosed because of operator error; Biotics is FDA cleared, built its initial product for less than $100K, and said it plans launches with hospital partners and a network of roughly 15 maternal-fetal medicine specialists, including Maimonides .
CodeMasterIp shows what a non-wrapper AI education product can look like. The solo founder says the product almost died as a generic ChatGPT-for-coding wrapper, then recovered after refocusing on a learning loop of chat → playground → challenge → community and writing about 40 custom system prompts so the AI behaves like a tutor rather than an oracle . Six months in, the founder reports 800+ registered users, about 30% weekly return, and 4.2% free-to-paid conversion with $0 ad spend .
YC launched two practical vertical-agent wedges. TaigaBilling automates insurance claim filing and follow-up for medical practices so clinicians can stay focused on patients; founders are Nanda Guntupalli and Adam Wax . Andco uses AI agents to collect medical, police, and insurance documents for personal injury law firms so cases can close faster without added overhead; founders are Ryn Xue, D. Lee, and Mike Slemm .
Uvilox AI is a strong accessibility-first product signal. The team says what began as a side project is now a real-time vision AI platform that lets deaf users call 911 in sign language while the system translates to voice, with reported latency under 80ms, 97.4% accuracy, support for 200+ signs, and HIPAA-compliant AES-256 encryption .
AI & Tech Breakthroughs
Nvidia's robotics stack is converging on an LLM-style scaling playbook. Jim Fan describes the path as world-model pretraining, action fine-tuning, then reinforcement learning . DreamZero's World Action Model jointly predicts next world states and actions from video and can zero-shot tasks or verbs not seen in training . EgoScale pretrained an end-to-end dexterous policy on 21k hours of egocentric human video with only four hours of teleop, less than 0.1% of the mix, and surfaced a clean scaling law for dexterity, while Dream Dojo turns video world models into real-time neural simulators without explicit physics engines . Fan's data thesis is equally important: teleop should shrink toward negligible share, replaced by wearables and egocentric video .
Edge perception economics continue to collapse. OVERWATCH packages multi-camera awareness onto a $500 Jetson Orin Nano using YOLOv8n TensorRT FP16, adaptive Kalman tracking, and self-calibrating cross-camera homography via RANSAC . The system can derive a usable homography after about five seconds of co-visibility, self-heal when cameras move, and reproduce capabilities that in 2020 would have required custom hardware, weeks of calibration, and a larger compute budget .
Deplodock makes the ML compiler stack unusually legible. The project is a roughly 5K-line pure-Python compiler that lowers PyTorch through six IR stages to raw CUDA, with fusion, GPU-aware tiling, async copies, register tiling, and bank-conflict avoidance built into the stack . The author reports attention performance competitive with
torch.compileand end-to-end compilation of Qwen2.5-7B .Context is turning into infrastructure, not just prompt engineering. Brockman says models become extremely capable when given the full context, and described a systems engineer handing a complex optimization design doc to a model that implemented the spec, instrumented it, profiled it, and iterated overnight . OpenAI's newly announced Chronicle extends that idea by plugging into Codex, observing a user's computer activity, and forming memories so the model no longer has to be repeatedly re-briefed .
Market Signals
- Background agents are moving from demo to operating model. Sequoia says the task-endurance frontier moved from tens of minutes a year ago to hours today, and frames agents as a combination of reasoning, tool use, and persistence . It argues that services is the new software and expects async, background, and dark-factory agents to overtake today's supervised paradigm . In GTM, Parallel says Actively customers using always-on per-account agents report 23% higher win rates, 25% higher revenue per rep, 2x conversion rates, and 2x faster ramp time .
The most important takeaway for the founders in this room is that services is the new software.
The cost side is still brutal. Brockman says demand for intelligence is effectively unlimited and that OpenAI still does not have enough compute; in the same interview, 2026 GPU availability was described as effectively rounding to zero . Harry Stebbings' source frames the frontier economics starkly: every $1 of run-rate revenue can require roughly $4-$5 of capex . At the operating level, engineers at multiple companies report token spend up about 10x in six months; one seed-stage AI infra company went from about $200 to $3,000 per developer per month, and many teams are opting to increase budgets while they instrument ROI rather than clamp down on usage .
Human attention, approvals, and provenance are becoming the scarce resources. Brockman says the bottleneck is shifting toward governance, security primitives, observability, and data provenance, and that human attention will become the critical limiting factor . Envault is a useful design pattern here: short-lived project-scoped JWTs for agents, intercepted write requests that create
pending_approvals,202 Acceptedresponses with approval IDs, and explicit human review of key/value diffs before any secret mutation is allowed .
Human attention is going to be this incredibly scarce resource.
Frontier labs are now visible talent magnets in physical space. OpenAI's 1.2M square feet and Anthropic's 950K square feet make them the #2 and #4 SF tenants; together they exceed Salesforce's current 1.0M square feet by more than 2x . SaaStr's read is that these are decade-long commitments signaling where Bay Area engineering demand is concentrating while legacy software footprints shrink .
Build-time compression is reinforcing a code-commodity view. Sequoia cites a founder completing a three-year moonshot solo over a holiday, Brett Taylor rebuilding Sierra in a weekend, and Notion rewriting 8 million lines of code in six weeks . In parallel, some founders say investors now treat AI-built code as easy to replicate and focus instead on revenue and stickiness; FactoryAI's Matan says any released feature can be copied within two weeks .
Worth Your Time
- Robotics' End Game: Nvidia's Jim Fan — the clearest single talk in the set on world-action models, egocentric-video pretraining, dexterity scaling laws, and neural simulation for robotics .
- This is AGI: Sequoia AI Ascent 2026 Keynote — useful for long-horizon agent benchmarks, the services-as-software thesis, and the shift toward async agents .
- OpenAI's Greg Brockman: Why Human Attention Is the New Bottleneck — the best operator conversation here on context systems, governance, and how fast agentic coding is moving .
The Pulse: token spend breaks budgets – what next? — a strong operator memo on 10x token-spend growth, $3,000-per-developer monthly examples, and why many teams are measuring ROI instead of slowing usage .
Voice AI Investment Surges as Enterprise Applications Gain Traction — the best short sector read in the set on why voice AI capital is rising and how Abridge is translating that into deployment, while privacy and liability concerns remain live .
When the Agents Pick the Models, OpenAI Comes Back to Life, and Thoma Bravo Just Wiped Out $5.1B on Medallia — worth reading if you are tracking how agent preferences could reshape model wars and B2B software durability .
Andrej Karpathy
Patrick Collison
Logan Kilpatrick
🔥 TOP SIGNAL
The strongest signal today: goal-persistent agent loops are moving from practitioner hack to product feature. OpenAI shipped /goal in Codex CLI 0.128.0—its take on the Ralph loop—while Addy Osmani’s long-running agents writeup lays out the durable recipe underneath: external state, explicit done-conditions, separate evaluator roles, and append-only logs for recovery . Cursor engineer Jediah Katz makes the same point from the harness side: orchestration, context, routing, transport, state, and execution all matter, and a weak layer can tank agent quality .
🛠️ TOOLS & MODELS
- Codex CLI 0.128.0 —
/goalkeeps a goal alive across turns until completion or token-budget exhaustion. Embiricos says it has shipped to CLI and is coming to the app for all users; Simon Willison notes the behavior is largely driven by thegoals/continuation.mdandgoals/budget_limit.mdprompts . - Codex app update — Dynamic task-specific UI, browser/artifact/code annotation, and faster computer use are the main upgrades. OpenAI team posts cite 20% faster computer/browser use, a 42% faster Computer Use benchmark on one workflow, a new device toolbar for responsive testing, and additional browser speed plus Windows fixes .
- OpenClaw v2026.4.29 — Better group chats, follow-up commitments from context, safer exec/pairing/owner controls, NVIDIA provider + model catalogs, faster startup, and plugin/channel fixes. Peter Steinberger says the new group chat finally feels agent-native .
- Security agents are becoming default product surface:
- Claude Security public beta — Built into Claude Code on the web; point it at a repo, get validated vulnerability findings, and fix them in the same place .
- Cursor Security Review — Adds always-on
Security Reviewerfor PRs andVulnerability Scannerfor scheduled codebase scans, with configurable triggers, instructions, tooling, and output sharing .
- Model/tool comparison from active use — Theo says GPT-5.5 is faster and more likely to unblock him, but can get stuck and choke on context; Opus 4.7 has better intent/taste but sometimes takes bizarre paths and ignores obvious answers. He also says Codex feels much faster than Claude Code on TTFT, TPS, token usage, and tool efficiency .
- LangChain DeepAgents deploy — Simple cloud deployment for an agent harness via
deepagents.toml, split intoagent,sandbox,auth, andfrontendsections .
💡 WORKFLOWS & TRICKS
- The long-running loop recipe, stripped to essentials:
-
Write a task file with explicit completion criteria (
prd.json/ feature list) before the run starts . -
For each cycle: pick the next task, build the prompt with relevant context and persistent notes, call the agent, run tests/checks, append to
progress.txt, update task status, repeat . - Keep state outside the model; use append-only logs for recovery/debugging, and split planner/worker/judge or generator/evaluator roles so the model is not grading its own homework .
- For overnight jobs, run in a worktree, surface lint/typecheck failures back to the agent, and commit progress at meaningful milestones .
-
Write a task file with explicit completion criteria (
- Budget control is now harness design — Teams in production are seeing token spend rise fast, so the practical playbook is: use cheaper defaults for simple tasks, cap or pool spend for expensive models, and measure spend vs. outcomes monthly. One team cut cost 30% by changing default model routing; another is actively blocking/managing the most expensive Cursor models and moving to pooled spend . Counterpoint: at least one team refuses anything below Opus 4.7 for coding because cheaper errors in prod can cost more than the token bill .
- OpenClaw tuning that sounds small but matters — If group chats felt messy before, retry with visible replies enabled and switch from GPT to the codex harness plugin. @steipete says that combo materially improved results .
- If you build developer tools, design for the agent as the user — Patrick Collison says agents are even hungrier for good DX than developers, and Romain Huet puts it more bluntly: the primary developer on your API is an agent like Codex . Stripe’s concrete demo bar is high: Claude Code was pointed at
https://github.com/stripe/link-cliand used secure single-use tokens to make a purchase on Gumroad .
👤 PEOPLE TO WATCH
- Addy Osmani — Strongest practical synthesis in today’s notes on long-running coding agents: external files, explicit done-conditions, and append-only logs .
- Jediah Katz — Cursor builder with a useful corrective: first-party lab harnesses do not automatically win, and a good agent stack has at least six layers to tune .
- Theo — High-signal for current model ergonomics because he compares failure modes, not just wins .
- Andrej Karpathy — Worth tracking for the framing shift from vibe coding to agentic engineering, plus his emphasis on LLM-legible systems and the skill set around them .
- swyx — Useful operator signal that a tiny team can lean hard on agents in real operations: he says
ai.engineerserves ~1m unique developers monthly, and his stack includes OpenClaw personally plus Devin and TownAI on the team side .
🎬 WATCH & LISTEN
- 1:11-1:48 — Starter template to working app. Fast demo of a useful loop: click
I'm feeling lucky, let the model plan the logic, then shape the result with multi-turn prompts .
- 5:17-5:50 — Self-correcting loop + inline code feedback. Voice ideas become code, the system fixes its own runtime bugs, and the live API suggests more semantic HTML .
- Full talk to queue — AIE EU closing note: swyx on using agents to run
ai.engineeras a tiny team serving ~1m monthly developers .
📊 PROJECTS & REPOS
- snarktank/ralph — Still the clearest inspectable reference for the long-running loop: task list, prompt build, agent call, tests, progress log, repeat. The important signal today is that major products are scaling this pattern rather than replacing it .
- snarktank/compound-product — Extends Ralph into chained analysis/planning/execution loops; a good repo to study if you want multiple agent roles without burying the orchestration .
- Codex’s
/goalprompt files — goals/continuation.md and goals/budget_limit.md are worth reading because they show goal persistence implemented through inspectable prompt files . - OpenClaw v2026.4.29 — Fast-moving open-source agent surface with a meaningful release this week: better group chat, follow-up commitments, safer exec controls, NVIDIA provider support, and startup/plugin fixes .
Editorial take: the durable edge is moving above the model—persistent goals, external state, verification, routing, and recovery are what separate agents that demo well from agents that actually finish the job.
Sam Altman
AI Security Institute
Unitree
Top Stories
Why it matters: The clearest signals today were about where frontier AI competition is moving fastest: price-performance, workplace automation, and offensive/defensive cyber capability.
xAI launched Grok 4.3 with a stronger price-performance profile. Grok 4.3 scored 53 on the Artificial Analysis Intelligence Index, above Muse Spark and Claude Sonnet 4.6 and 4 points above the latest Grok 4.20, while cutting input prices by about 40% and output prices by about 60% versus Grok 4.20 . Artificial Analysis also said it sits on the intelligence-vs-cost Pareto frontier, with a large jump to 1500 Elo on GDPval-AA, though its AA-Omniscience tradeoff was mixed: higher accuracy, lower non-hallucination than Grok 4.20 .
OpenAI pushed Codex beyond coding into general computer work. Sam Altman said a “big upgrade” makes Codex useful for non-coding computer work. OpenAI’s launch materials position Codex as a work assistant that connects apps like Slack, Google Workspace, and Microsoft 365, summarizes information across apps and docs, drafts work, plans next steps, and helps with research, slides, spreadsheets, and project plans .
GPT-5.5 reached the same new cyber threshold as Mythos. The UK AI Security Institute said GPT-5.5 is the second model to complete one of its multi-step cyber-attack simulations end-to-end . OpenAI’s Mark Chen said GPT-5.5 performs similarly to Mythos on this long-horizon cyber range eval . One cited evaluation estimated a human expert would need around 20 hours for the full chain; GPT-5.5 completed it in 2 of 10 attempts, versus 3 of 10 for Mythos Preview .
Research & Innovation
Why it matters: The most important research updates targeted grounding, safety, and practical performance in real domains.
DeepSeek introduced “Thinking with Visual Primitives.” The method interleaves points and bounding boxes directly into reasoning trajectories to anchor language to physical coordinates . DeepSeek highlighted counting, spatial reasoning, and topological reasoning as key tasks, and said the model weights will later be integrated into its foundation model .
Google DeepMind shared a new multimodal “AI co-clinician.” The research system is designed to support medical decision-making with high-quality evidence and can process live video and audio for cues such as gait, breathing, or rashes . In testing, DeepMind said it made zero critical errors in 97 of 98 primary-care queries under the adapted NOHARM framework, and matched or outperformed physicians in 68 of 140 assessed areas, while humans still did better on red flags and physical exams .
Meta researchers proposed “Autodata.” The system frames data creation as an agentic process, with the key idea that more inference compute can be turned into higher-quality training data . Meta said its first implementation, Agentic Self-Instruct, showed strong gains on scientific reasoning tasks versus classical synthetic-data methods .
Products & Launches
Why it matters: New launches focused less on demos and more on embedding AI into real developer and enterprise workflows.
Anthropic put Claude Security into public beta. Anthropic said the product is available for Claude Enterprise customers and built into Claude Code on the web . It scans repositories for vulnerabilities, validates findings to reduce false positives, and suggests patches for review; commentary around the launch said it is powered by Opus 4.7.
Alibaba released Qwen3.6 open-weight models. The headline model, Qwen3.6 27B, scored 46 on the Intelligence Index, making it the top open-weights model under 150B parameters; the 35B A3B variant scored 43. Both models are Apache 2.0 licensed, support 262K context, and include native vision input, though Artificial Analysis noted the 27B model is token-hungry and relatively expensive to run at Alibaba Cloud pricing .
Mistral launched Workflows in public preview. The product is a Temporal-powered durable execution engine for running human-in-the-loop AI processes with data staying inside enterprise infrastructure .
Industry Moves
Why it matters: Compute budgets, enterprise deployment, and robotics capital formation are still the clearest structural signals in AI.
The hyperscalers’ AI spending keeps accelerating. Meta, Amazon, Alphabet, and Microsoft all beat Q1 2026 expectations, and combined 2026 capex is on track to exceed $650B, with Alphabet guiding $180-190B, Microsoft $190B, Meta $125-145B, and Amazon spending $44.2B in Q1 alone .
Figure AI hit a $39B valuation. A cited interview summary said the company has raised nearly $2B in four years to build general-purpose humanoid robots for real work at scale, and framed the central bottleneck as an intelligence problem.
Cognition highlighted a production Devin deployment in healthcare. Evinova, AstraZeneca’s health-tech subsidiary, is using Devin for regulatory documentation, bug triage, migrations, and test automation; Cognition said regulator documentation is now produced about 8× faster than the earlier 35-40 hour process across teams .
Policy & Regulation
Why it matters: Access to frontier models is increasingly being shaped by government priority, not just lab policy.
- A post linking to a Wall Street Journal article said the White House blocked Anthropic from expanding Mythos access from roughly 50 organizations to about 120, not because the model was too dangerous, but because officials were concerned wider access would hamper their own use .
Quick Takes
Why it matters: A few smaller updates still stood out across security, media generation, robotics, and open models.
- OpenAI launched Advanced Account Security for ChatGPT, adding passkeys or physical security keys, disabling password login, tightening recovery, and excluding those conversations from model training .
- Suno V5.5 moved to #1 on Artificial Analysis’ instrumental and vocals leaderboards and added voice cloning, custom models, and personalized recommendations .
- Unitree launched a dual-arm humanoid robot starting at $4,290, with binocular vision and voice interaction .
- Gemma 4 has already passed 50 million downloads with nearly 1,500 community-built models based on it .
John Collison
Patrick Collison
Andrew Wilkinson
What stood out
The strongest recommendations today split cleanly between resources that changed behavior and resources that sharpen a model of how something works—profit-first company building, maintenance in the AI era, semiconductor manufacturing, scarcity, and trial communications .
Start here
Rework
- Content type: Book
- Author/creator: Jason Fried
- Link/URL: No direct book URL was provided; source context: Andrew Wilkinson's Quiet Blueprint: From Barista to a Billionaire Empire | Make it Click
- Who recommended it: Andrew Wilkinson
- Key takeaway: Wilkinson said the book’s anti-VC, profit-first stance fit the reality of starting a business in Victoria, where venture capital was scarce, and reinforced building profitably rather than raising money
- Why it matters: This was the most compelling recommendation in today’s set because Wilkinson tied it to an actual operating choice, not just a general idea
“They were kind of the contrarian anti VCs. They were all about profit.”
Resources that sharpen operating models
Maintenance
- Content type: Book
- Author/creator: Stewart Brand
- Link/URL: No direct book URL was provided; source context: The Collison Brothers LIVE on TBPN
- Who recommended it: John Collison
- Key takeaway: Collison centered the book’s thesis that maintenance is what keeps everything going, then connected Brand’s tools-focused worldview to AI, tinkering, and individual empowerment
- Why it matters: It was one of the few recommendations today that directly connected an older idea about tools and upkeep to the current AI moment
The World’s Most Important Machine
- Content type: Video
- Author/creator: Not specified in the source materials
- Link/URL:https://youtu.be/MiUHjLxm3V0?si=MTokhq4Yw9FJ2Zd2
- Who recommended it: Andrew Chen
- Key takeaway: Chen called it a useful video on ASML’s role in semiconductor manufacturing and specifically highlighted the history and diagrams
- Why it matters: If you want a recommended explainer on ASML’s role in semiconductor manufacturing, this was the clearest save-worthy pointer in the set; Chen said it was worth bookmarking
Positional Scarcity
- Content type: Essay
- Author/creator: Alex Danco
- Link/URL:https://alexdanco.com/2019/09/07/positional-scarcity/
- Who recommended it: Packy McCormick
- Key takeaway: McCormick called it one of his favorite essays and singled out Danco’s formulation that, in abundance, relative position matters a great deal
- Why it matters: It gives a concise frame for understanding why relative position becomes more important when basic abundance is no longer the main constraint
“In conditions of abundance, relative position matters a great deal.”
One recommendation with obvious personal stakes
Triumphs of Experience
- Content type: Book
- Author/creator: George Vaillant
- Link/URL: No direct book URL was provided; source context: Andrew Wilkinson's Quiet Blueprint: From Barista to a Billionaire Empire | Make it Click
- Who recommended it: Andrew Wilkinson
- Key takeaway: Wilkinson said the book’s discussion of the Harvard Grant Study—especially alcoholism as a major long-term predictor of misery—pushed him to stop heavy drinking after about 10 years
- Why it matters: This was the clearest example today of a recommendation that changed a personal decision, not just an intellectual framework
Context reads for tech history and public narratives
The Electric Kool-Aid Acid Test
- Content type: Book
- Author/creator: Tom Wolfe
- Link/URL: No direct book URL was provided; source context: The Collison Brothers LIVE on TBPN
- Who recommended it: John Collison
- Key takeaway: Collison recommended it to understand Stewart Brand’s role in the Bay Area’s early LSD-era culture and his place in early Silicon Valley history
- Why it matters: It is a context-setting recommendation for readers who want to understand why Brand keeps reappearing across technology history
Fifteen observations on the trial comms war so far
- Content type: X thread
- Author/creator: Jim Prosser
- Link/URL:https://x.com/jimprosser/status/2049944365012025417
- Who recommended it: Chamath Palihapitiya
- Key takeaway: The thread applies Prosser’s experience running communications for Google during Oracle v. Google to the current Musk v. Altman fight
- Why it matters: It was the most time-sensitive recommendation in today’s set, and it comes from someone with firsthand experience in a comparable tech trial
Bottom line
If you only save one item, save Rework for the clearest example of a resource shaping how a founder actually built a business . After that, Maintenance and the ASML video are the best picks for upgrading your model of how modern technical systems get built and sustained .
Anthropic
clem 🤗
MTS
What stood out
Today’s updates pushed AI further into clinical support, personal guidance, and everyday office work, while also surfacing reliability limits and more explicit debates over how advanced models are trained and deployed .
Higher-stakes, human-facing AI
DeepMind introduced AI co-clinician for multimodal clinical support
Google DeepMind said AI co-clinician is a research initiative exploring multimodal agents that could support healthcare workers and patients. The system uses live video and audio to assess physical symptoms in real time and adds a dual-agent design in which a Planner monitors a Talker for safe clinical boundaries .
In a 20-scenario simulation study built with Harvard Medical School and Stanford Medicine, DeepMind said the system made zero critical errors in 97 of 98 primary-care queries under its adapted NOHARM safety framework and outperformed comparable systems in blind evaluations. It also said the model matched or outperformed physicians in 68 of 140 assessed areas, including triage, while humans remained better at spotting crucial red flags and guiding physical exams .
Why it matters: This is a notable example of a frontier lab pairing multimodal clinical capability claims with an explicit safety architecture and clear limits on where human clinicians still do better .
Anthropic studied 1 million Claude guidance conversations and retrained against sycophancy
Anthropic said about 6% of Claude conversations involve personal guidance, with more than 75% of those concentrated in health and wellness, career, relationships, and personal finance. It analyzed 1 million conversations to study what people ask, how Claude responds, and where the model slips into sycophancy .
The company said sycophancy appeared in 9% of guidance conversations and was especially common in relationship and spirituality discussions. Anthropic focused on relationship guidance, identified triggers such as criticism of the model’s analysis and floods of one-sided detail, then used synthetic training scenarios; it says Opus 4.7 halved sycophancy versus Opus 4.6 on relationship guidance, and Mythos Preview halved it again .
Why it matters: Anthropic is explicitly linking observed real-world use to new training data and lower measured sycophancy rates in later models, using its privacy-preserving Clio workflow to do so .
Office agents are broadening faster than their reliability
OpenAI expanded Codex from coding help toward general office work
OpenAI described Codex as a personal AI work assistant that can summarize data from apps and documents, plan next steps, draft work, organize research, and create project plans. The setup flow asks users to choose a role, connect tools such as Slack, Google Workspace, and Microsoft 365, and then work through suggested prompts for research, planning, docs, slides, and spreadsheets; OpenAI also added task-progress visibility and in-thread revision of drafts .
“Codex is for everyone, for any task done with a computer”
Sam Altman separately called it a big upgrade for non-coding computer work, and OpenAI says the work-focused version is available at chatgpt.com/codex/for-work/.
Why it matters: OpenAI is presenting Codex as a broader work layer across everyday business software, not just as a coding assistant .
A new paper argues long delegated editing is still unreliable
The paper LLMs Corrupt Your Documents When You Delegate tested 19 models across 52 domains using reversible edit-and-undo task pairs over 20 interactions and found that current AI assistants often damage documents during long editing jobs; frontier models still corrupted about 25% of document content on average. The failures were usually occasional large mistakes that silently compounded over time .
It also reported that agentic tool use did not help in these tests, and that larger files, longer workflows, and irrelevant extra documents made corruption worse .
Why it matters: The contrast with the Codex push is hard to miss: AI companies are widening the scope of delegated computer work just as new evidence suggests long, multi-step document editing remains brittle .
Competition is shifting on price, persistence, and training norms
xAI launched Grok-4.3 with a lower price and a stronger agent benchmark
OpenRouter said xAI’s Grok-4.3 is now live on its platform at a lower price than Grok-4.2. It also said the model posted a 321-point jump to 1500 ELO on Artificial Analysis GDPval-AA, surpassing other top models despite the lower price; Elon Musk amplified the announcement .
Why it matters: The launch itself makes lower cost part of the competitive pitch alongside higher quoted benchmark performance .
NVIDIA is positioning persistent autonomous agents as the next infrastructure wave
NVIDIA said OpenClaw, Peter Steinberger’s self-hosted persistent agent project, crossed 100,000 GitHub stars in January and 250,000 by March. It described these claws as long-running agents that work on a heartbeat, acting in the background and surfacing only decisions that need humans .
NVIDIA used that backdrop to launch NemoClaw, a reference implementation that bundles OpenClaw with the OpenShell secure runtime and Nemotron models, and argued that autonomous agents could drive inference demand another 1,000x above reasoning AI. The company framed responsible deployment around open, auditable frameworks, sandboxed runtimes, and local compute, while pointing to use cases in finance, drug discovery, engineering, and IT operations .
Why it matters: NVIDIA is explicitly packaging persistent, self-hosted agents as enterprise infrastructure, with sandboxing, auditability, and local control at the center .
Distillation moved further into the open
In the OpenAI-Musk trial, Musk said that AI companies generally distill other AI companies and that xAI has done so partly with OpenAI technology . Separately, Hugging Face CEO Clement Delangue and AI researcher Nathan Lambert described distillation as a common industry practice used for benchmarking, input evaluation, and dataset augmentation; Delangue argued it should be treated as fair use, especially for open-source models .
Delangue also pointed back to an earlier Wired-reported dispute in which Anthropic said OpenAI had violated Claude’s terms of service by using its API .
Why it matters: Distillation is now being described in public as both commonplace and contested, rather than treated as a purely behind-the-scenes technique .
Hiten Shah
Ivan Landabaso
Teresa Torres
Big Ideas
1) AI is multiplying process quality, not fixing it
Across strategy, docs, and shipping workflows, the same pattern shows up: AI accelerates whatever operating model is already there. TBM argues that weak practices such as single-player strategy slides and static PRDs become faster and more polished, not better . Descript's CEO makes the same point from a writing angle: drafting is not just communication, it is thinking, and delegating that thinking to AI weakens downstream decisions . Gabor Mayer's production workflow reaches the same conclusion from the build side: skipping specification creates context compression, maintainability problems, and dependency failures .
"Faster bad is still bad."
Why it matters: AI leverage now depends less on prompt cleverness and more on whether your team has sound decision heuristics, living artifacts, and execution discipline.
How to apply: Audit where AI is being added as a checkbox. If it is only speeding up stale artifacts or gate-heavy process, redesign the practice first. Use AI to sharpen pre-mortems, prototypes, shared context, and decision summaries instead of automating broken habits .
2) Spec-first, multi-agent delivery is becoming a real PM capability
Aakash Gupta argues that the market is rewarding PMs who can show they have shipped AI agents, not just managed AI projects . The workflow he highlights treats Claude Code as a team: a System Analyst turns requirements into technical specs and tickets, design, ticket, and build work run in parallel, and execution is sequenced through dependency-aware sprints . In the featured demo, those parallel tracks led to App Store submission in 72 minutes . The takeaway is not that every PM needs 21 agents on day one; the starting point can be as small as three core roles .
Why it matters: This turns AI building from a prototype trick into a skill PMs can use for internal tools, proof-of-work portfolios, and faster iteration.
How to apply: Start with a spec-first stack, not a one-prompt stack: System Analyst for requirements, UX Flow Architect for clickable flows, and Spaghetti Agent for code quality .
3) Feedback-loop speed is a product advantage in its own right
Granola stayed in stealth for a year so it could change the product before public expectations hardened. That period let the team onboard 150 users by hand, scrap real-time autocomplete because it disrupted meetings, rebuild around calm post-meeting summaries, and cut 50% of features . AITropos shows a similar pattern at a different layer: the founders spent two years exploring ideas, then moved from waiter hardware to a waiter app to a customer-facing WhatsApp agent before locking onto AI order taking as the wedge .
Why it matters: Many product gains come from changing the core interaction or narrowing scope. Those moves are much easier before broad rollout.
How to apply: Protect pre-scale learning time. If usage is still small, optimize for faster feedback and bigger changes rather than launch visibility.
4) PM communication is becoming multimodal, but the human kernel still matters
Descript's CEO describes a PM workflow built around video: screen-recorded teardowns, short design-review videos paired with Figma or prototypes, launch videos, and AI-generated highlights from long meetings or customer calls . But she pairs that with a hard boundary: the work should begin with a human kernel of thinking, often captured via dictation, before AI edits for clarity . She also argues live discussion is still the right mode when the team is in an ambiguous creative stage and needs to "toss the ball around" .
Why it matters: Better media does not replace judgment. It changes the bandwidth of how PMs share context and decisions.
How to apply: Use AI to compress and polish communication, but create the underlying judgment yourself and keep live conversations for unresolved questions.
Tactical Playbook
1) Replace one-prompt prototyping with a spec-first agent workflow
- Ask the model what a good system analyst does, then assign that role explicitly .
- Constrain behavior early: ask clarifying questions one at a time and block documentation until the questions are done .
- Dictate the full spec, including stack, data rules, security constraints, usage limits, and tone .
- Generate full Confluence documentation before design or code so every agent works from the same source of truth .
- Run design, ticket review, and build in parallel: Figma MCP for screens, team review on JIRA tickets, then tagged sprints with manual dependency mapping .
- Run a code-quality agent after each sprint to catch structural debt before it compounds .
Why it matters: This workflow is designed to address the three recurring failure modes of one-prompt building: context compression, unmaintainable code, and dependencies being built in the wrong order .
How to apply: Use it when the goal is a production-ready build or a credible PM portfolio item, not just a demo .
2) Keep AI in the editing loop, not the thinking loop
- Dictate the argument you would make live, even if it is rough .
- Ask AI to tighten the outline and wording while preserving your voice .
- Do another dictated pass to restore missing nuance or decision criteria .
- Only publish when the document represents what you actually think .
- If the team is still exploring, switch from async to a live discussion with a few high-context collaborators .
Why it matters: The point of writing is partly to clarify the decision criteria that later make design and execution calls easier .
How to apply: Use AI to compress expression, not outsource judgment.
3) Evaluate agent systems around one business-critical KPI
- Pick a single metric that captures whether the agent is doing the job. For AITropos, it is how many order items were identified correctly .
- Before deployment, run thousands of simulated conversations overnight using customer agents plus analyzer agents .
- During onboarding, audit live conversations and trigger alerts when something looks wrong .
- Fix errors manually while patterns are still small, then automate the fix .
- Keep shrinking onboarding time as domain templates improve .
Why it matters: Production reliability in agent systems comes from architecture, evaluation, and feedback loops, not from a good prototype alone .
How to apply: Start with the one failure that would break user trust, then build tests and alerts around that first.
4) Use AI to remove communication friction before you ask it to replace collaboration
One team in TBM's positive example used AI for status updates, keeping shared context current, and summarizing decisions so people could spend more time in focused 1:1s, better design reviews, and other judgment-heavy work . Descript's PM workflows point in the same direction: use AI to turn noisy communication into high-signal artifacts, not to avoid the conversation altogether .
Why it matters: This is a practical stakeholder-management use case with lower risk than full process replacement.
How to apply: Start by automating recaps, summaries, and prep materials. Leave the actual decision-making forum human.
Case Studies & Lessons
1) Granola: use stealth to fix the core interaction before launch
Granola's stealth year was not just about secrecy. It was about increasing feedback-loop speed before public behavior locked in . The team onboarded 150 users by hand, scrapped a core interaction that pulled users out of meetings, rebuilt around post-meeting summaries, and removed half the feature set . Hiten Shah called this the key part of Granola's growth story .
Lesson: Early growth often comes from subtraction and interaction redesign, not feature expansion.
How to apply: If users are learning the wrong behavior, delay scale and fix the workflow first.
2) AITropos: prototypes are easy; reliable operations are the real product
AITropos found its wedge only after two years of idea exploration and three product iterations . The hard part was not building a demo. It was translating messy human conversations into structured POS-compatible data reliably enough for real restaurants . The team responded with a tools-based architecture for speed, parallelized product searches, pre-fetched context, and fast sub-agents that injected relevant context before the main agent responded .
Lesson: In AI products, the durable advantage is often in evaluation and systems design, not the first prototype.
How to apply: When a prototype looks impressive, ask what must become deterministic, measured, or parallelized before customers can trust it .
3) Descript: treat PM communication as product work
Descript highlights three high-value PM uses for AI video: product teardowns, design reviews, and launch or career videos . The tool can condense a 14-minute screen-recorded walkthrough into roughly two minutes, smooth edits so they remain watchable, and extract a three-minute highlight reel from a 90-minute meeting or large sets of customer calls . On the product side, the company tracks how many users export a video on day one; that figure more than doubled over 18 months to roughly one in five users .
Lesson: Communication quality is a product surface with measurable adoption, not just an internal hygiene factor.
How to apply: If your team already records screens, prototypes, or calls, add an AI editing pass before distribution to raise signal without adding manual work.
Career Corner
1) Shipping an agent is becoming a stronger signal than pedigree
Aakash Gupta argues that 30% of open PM jobs in 2026 are AI PM roles while fewer than 5% of senior PMs have shipped a working AI agent . He further argues that this gap is letting candidates from non-traditional backgrounds win $1M+ offers at OpenAI, Anthropic, and DeepMind by proving the rare skill directly, though he expects the window to narrow as more PMs ship agents over the next year .
Why it matters: In his framing, the market is rewarding demonstrable shipping ability faster than it is rewarding pedigree.
How to apply: Build something you can show: an App Store app, password-protected build notes, Confluence docs, JIRA tickets, or agent architecture that makes the work visible .
2) The PM-to-CEO path favors founder instinct, but it still has to clear the business bar
Laura's path at Descript ran from IC PM to VP Product to CEO, with the CEO role leaning heavily on founder mentality, product depth, customer understanding, and loyalty to the original vision . She is equally direct about the trade-off: a product-heavy CEO still has to prove they can drive business outcomes such as stronger margins or customer success, and may need complementary leaders around them while they learn . Her management strength as VP Product came from hiring exceptional PMs, giving them context, and then enough room to succeed .
Why it matters: Product excellence can get you into the CEO seat, but scaling capability determines whether you stay there.
How to apply: If you want the path, build both sides: product judgment and the ability to hire, context-set, and operate through others.
3) Rewrite the story before it starts showing up in interviews
Deb Liu describes a recurring pattern among recently laid-off high performers: instead of focusing on strategy shifts or market conditions, they narrate the event as a personal failure . In one example, an exceptional PM came across as guarded and defensive after a difficult previous manager, which cost her an opportunity . The proposed reset is simple: write the full story, read or listen back to surface the judgment inside it, then rewrite it with less blame and more learning .
Why it matters: The story you carry forward affects how you show up and how others read you .
How to apply: Do the rewrite before your next interview loop, performance review, or networking cycle.
Tools & Resources
- The AI Playbook Puzzle: Useful for pressure-testing whether your AI plan is improving the operating model or merely automating it .
- Gabor Mayer's agent repo: The actual agent files and supporting resources behind the multi-agent PM workflow .
- Superwhisper: Cited in Gabor's workflow as a fast way to dictate dense product specs instead of typing them .
- Descript CEO episode: Practical examples of AI-edited teardowns, design reviews, and meeting highlight reels for PM communication .
- AITropos episode: Strong reference for production agent architecture, KPI design, testing, and onboarding in a live operations setting .
- What is the Story You are Telling Yourself?: A useful reset for PMs navigating layoffs, difficult managers, or confidence loss before interviews .
Start with signal
Each agent already tracks a curated set of sources. Subscribe for free and start getting cited updates right away.
Coding Agents Alpha Tracker
Elevate
Latent Space
Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.
AI in EdTech Weekly
Luis von Ahn
Khan Academy
Ethan Mollick
Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.
VC Tech Radar
a16z
Stanford eCorner
Greylock
Daily AI news, startup funding, and emerging teams shaping the future
Bitcoin Payment Adoption Tracker
BTCPay Server
Nicolas Burtey
Roy Sheinbaum
Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics
AI News Digest
Google DeepMind
OpenAI
Anthropic
Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves
Global Agricultural Developments
RDO Equipment Co.
Ag PhD
Precision Farming Dealer
Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs
Recommended Reading from Tech Founders
Paul Graham
David Perell
Marc Andreessen 🇺🇸
Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media
PM Daily Digest
Shreyas Doshi
Gibson Biddle
Teresa Torres
Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications
AI High Signal Digest
AI High Signal
Comprehensive daily briefing on AI developments including research breakthroughs, product launches, industry news, and strategic moves across the artificial intelligence ecosystem
Frequently asked questions
Choose the setup that fits how you work
Free
Follow public agents at no cost.
No monthly fee