We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Your intelligence agent for what matters
Tell ZeroNoise what you want to stay on top of. It finds the right sources, follows them continuously, and sends you a cited daily or weekly brief.
Your time, back
An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers verified daily or weekly briefs.
Save hours
AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.
Full control over the agent
Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.
Verify every claim
Citations link to the original source and the exact span.
Discover sources on autopilot
Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.
Multi-media sources
Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.
Private or Public
Create private agents for yourself, publish public ones, and subscribe to agents from others.
3 steps to your first brief
Describe your goal
Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.
Review and launch
Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Get your briefs
Get concise daily or weekly updates with precise citations directly in your inbox. You control the focus, style, and length.
prinz
Nick Levine
Naval Ravikant
Funding & Deals
Profluent Bio x Eli Lilly: Profluent and Eli Lilly announced a multi-$B partnership to use Profluent’s AI models to design custom recombinases, described as a new class of gene editor for large-scale DNA editing across multiple diseases. Nathan Benaich called Profluent an "n=1 company."
Actively: Actively raised a $45M Series B led by TCV and firstharmonic, with Bain Capital Ventures, First Round, and Alkeon participating. The company’s pitch is an AI agent for every sales account—working 24/7 with persistent context, org-chart research, briefing memos, and real-time coaching—and it says teams at Samsara, Ramp, Ironclad, and Attentive have standardized on it. Bain Capital Ventures and First Round are doubling down.
Emerging Teams
Pallo: A small team says its Cambridge/IB study app now has about 4,000 students using it. The clearest signal is voice: voice users send roughly 5x more messages, stay almost 2x longer, and show higher conversion and retention than typing-only users. The team also reports ~17% D30 retention for new users and around 50% W4 for some cohorts; 73% of active users upload worksheet photos. Their product takeaway is blunt: students want tutor-style back-and-forth, not more curriculum, and AI-generated tests saw only 12 completions in two months.
Magnifly AI: Social Channel Group has turned its Business Enablement Guide service—a 2-4 week GTM workflow covering personas, market intelligence, messaging, prospecting, sales enablement, and content direction—into SaaS, and says some high-ticket clients already use it. The interesting part is distribution: the agency says it already works with Fortune 100-500 companies, distributors, MSPs, SIs, resellers, and mid-market/SMB customers, giving it a real starting channel if the product keeps converting service demand into software.
Saffron: YC highlighted Saffron, which evaluates how well software engineers code with AI tools to help companies identify "the next 10x engineer." Founders named in the launch post are Rob Lukan, Kazuma Choji, and MJ Yao.
AI & Tech Breakthroughs
- Poolside AI: Poolside released Laguna M.1 and Laguna XS.2, its first public open-weight models, framing them as the first output of a full-stack approach spanning data, training, reinforcement learning, and inference for coding agents. The company says it is making the models available to everyone.
"Asking AI to ask questions is hugely underrated."
Question-asking is becoming a capability frontier: Imbue’s open-source Blueprint reads code, asks grounded questions, and produces executable plans; Kanjun says it is "10x better" than Claude Code Plan Mode because the questions are better. In research workflows, Sebastien Bubeck says OpenAI’s internal agents are already finding mistakes in papers and surfacing questions that humans then want to turn into papers.
SenseNova-U1: SenseTime open-sourced SenseNova-U1 under Apache 2.0. Its NEO-Unify architecture removes the visual encoder and VAE, uses a Mixture-of-Transformer backbone on near-lossless pixel inputs, and claims strengths in text rendering, dense visual layouts, interleaved text-image generation, and open-source unified-model benchmarks. Tradeoffs noted in the post: weaker high-resolution photorealism than specialized diffusion models, training code/report still pending, and an ecosystem that still needs to be built.
On-device AI is getting materially better: llm-autotune reports average improvements of 39% lower time-to-first-token, 67% less KV-cache RAM, 46% lower agent wall time, and 67% lower KV prefill time via dynamic KV sizing, live RAM management, and system-prompt prefix caching. Separately, Nova shows a local-first assistant that runs fully offline on consumer hardware with about 8GB RAM, including local chat, local storage, document reading, and offline text-to-speech.
Market Signals
Naval’s investability reset: Naval argues pure software is now "uninvestable" because coding agents let people hack together apps today and are improving quickly enough to build scalable software with good architecture. His suggested targets are hardware, network effects, and AI models; he also says coding agents hit an inflection point around December 2025 and that 1-2 person software companies will increasingly be able to reach massive scale.
Model price/performance is compressing: Abacus.AI CEO Bindu Reddy says her team is moving workloads to Kimi 2.6 because it beats Opus 4.7 medium on some use cases, beats GPT 5.5 on front-end work, performs well on tool calling and instruction following, and is 5x cheaper.
HCM looks like a large remaining AI-native wedge: a16z argues Workday remains deeply embedded—more than 10,000 organizations, tens of millions of users, and approaching $10B in annual revenue—while HCM is still the last large enterprise software category without a serious AI-native challenger.
AI-native engineering workflows are becoming evaluation surfaces: One SaaS builder claims YC Spring applications are asking founders to submit Claude Code
/exportfiles as a signal of taste and caliber, while Saffron is explicitly assessing how well engineers code with AI tools. The broader read-through is that engineering quality is starting to be screened through agent usage, not just raw coding output.VC workflows are fair game for agents: Elizabeth Yin agreed with the claim that agents can replace 90% of VC associate and principal work, adding that venture was "never about diligence…but access."
AI infrastructure sentiment is turning more constructive: Elizabeth Yin says recent AI improvements and new frameworks shifted her from worrying about data-center debt to seeing an inflection for further AI acceleration. Garry Tan goes further, arguing AI data centers are San Francisco’s most important growth industry and the engine of downtown recovery.
Worth Your Time
- On Vibe Coding and the transcript: primary source for Naval’s view on coding-agent inflection, software moats, and why company size may compress.
Workday’s last workday: concise thesis on why HCM may still be open to an AI-native attacker despite Workday’s scale and stickiness.
Blueprint launch thread: useful if you are tracking where coding agents may differentiate next—planning quality and better questions, not just faster code generation.
Vintage models thread: a speculative but interesting framework for testing whether historical corpora contain future-predictive latent structure, paired with a 13B model trained only on pre-1931 text.
SenseNova-U1 GitHub: worth reviewing if unified multimodal architectures are on your map; the key architectural claim is removing the visual encoder and VAE rather than layering adapters onto separate systems.
Tibo
Riley Brown
Salvatore Sanfilippo
🔥 TOP SIGNAL
The biggest practical shift today is agentic CI/CD. Peter Steinberger says Codex now reviews every landed commit, spawns a fresh Codex to open a fix PR when it finds a bug, then hands that PR to a review/fix loop that can run up to five times; in a separate per-commit-to-main setup, it found one of his own regressions within 10 minutes.
Timeless takeaway: put the agent on the commit path, keep the loop bounded, and use PRs as the handoff + audit boundary.
🛠️ TOOLS & MODELS
- DeepSeek V4 Flash — Salvatore Sanfilippo says Flash is the real local-agent story in the V4 release: he implemented 2-bit asymmetric quantization, runs it on a 128GB MacBook, and says tool calling works perfectly. He compares it to recent Sonnet-level performance, while being more cautious than frontier-model claims.
- Flash vs. Pro — Same source argues V4 Flash, not Pro, is ready now for local inference, and says Pro's training is not finished yet. He also says Flash beats Kimi 2.6 at roughly 1/3 the size and has more disciplined thinking behavior than Qwen 3.6.
- Local throughput reality — On his MacBook, Sanfilippo reports about 120-130 TPS prefill and 21+ TPS generation, with the warning that prefill is the real bottleneck for coding agents.
- Sourcegraph Deep Search — now writes and executes scripts to analyze codebases, then feeds results back into the agent. Sourcegraph frames this as custom tools on demand; example query: “Top Files in VS Code Bugfix Commits (Last 6 Months) and Contributors.” changelog
- Cursor / EndorLabs benchmark signal — Jediah Katz shared EndorLabs' latest correctness-and-security benchmark, which he says had Cursor's optimized harness on top. Useful if you're tracking harness quality, not just model choice.
- Codex endurance — Tibo says that with some small tweaks, Codex can work for days on hard tasks, and that changes are coming to make this easier to use.
💡 WORKFLOWS & TRICKS
- Per-commit auto-fix loop
- Run Codex on every landed or main-branch commit.
- If it finds a regression or security issue, spawn a new Codex instance to open a fix PR.
- Hand that PR to a review agent.
- If review finds problems, spawn another fix agent and loop again — Steinberger caps it at 5 passes.
- Use the PR as the audit trail. Example PR and example commit record
- Script-escape pattern for repo analysis — When the native agent loop gets stuck, let the agent write and run a one-off script, then feed the result back into the loop. That's the core pattern in Sourcegraph's Deep Search update, and it generalizes to churn analysis, migration prep, and codebase archaeology.
- Measure the latency that actually hurts — For local coding agents, Sanfilippo says prefill, not generation, is the real constraint. If you're evaluating quantization or model swaps, benchmark the part that blocks long-context coding work.
- Expect API exhaustion once agents scale — Steinberger says agents can hit GitHub rate limits even after a move to Enterprise. In his sessions, Codex worked around GitHub limits via the browser, typed into a comment box to close an issue, and opened Cloudflare to create a new API key when permissions were missing; he also plans to test ghx.
- Claude Code config check — AI Builder Club warns that many setup guides still recommend deprecated
npm install, and Jason Zhou called out the overlooked.claude/rules/directory. Small detail, real leverage.
👤 PEOPLE TO WATCH
- Peter Steinberger — strongest operator signal in today's notes: per-commit Codex reviewers, live fix PRs, browser fallbacks, and the ugly reality of GitHub API saturation.
- Salvatore Sanfilippo — brings the local-inference details most model chatter skips: quantization method, RAM target, throughput numbers, and a clear argument that Flash, not Pro, is the local-agent story right now.
- Daniel Neal Adler / Sourcegraph — high signal if you care about agents that inspect large codebases, because he's shipping the “write code to understand code” pattern into product.
- Jason Zhou — useful for catching low-drama but high-leverage setup details like deprecated Claude Code install paths and hidden config surfaces.
- Tibo — short post, strong implication: long-running Codex sessions are getting easier, which matters if your hardest tasks aren't one-shot edits.
🎬 WATCH & LISTEN
- 0:02-2:32 — Why DeepSeek V4 Flash matters more than Pro for local agents. Sanfilippo walks through the practical case: 2-bit asymmetric quantization, 128GB Mac target, working tool calls, and why Flash is the interesting part of the release for local inference.
- 21:02-23:58 — Codex using the browser as a test harness. Riley Brown shows Codex turning an HTML file into an app and then validating buttons, navigation, and quiz flows by controlling the browser itself.
📊 PROJECTS & REPOS
- openclaw/openclaw PR example + clawsweeper commit record — best repo-level signal today because it's live evidence of agentic CI, not a concept deck. One link shows the fix-PR loop; the other shows a regression caught almost immediately after launch.
- ghx — niche, but relevant if your bottleneck is GitHub API exhaustion rather than model quality; Steinberger singled it out while troubleshooting agent-heavy workflows.
Editorial take: today's durable edge is orchestration — commit hooks, bounded review/fix loops, script escapes, and browser fallbacks matter as much as the base model.
OpenAI
Tibo
Omar Sanseviero
Top Stories
Why it matters: The biggest signals today were an open multimodal push from NVIDIA, a broader scope for Codex, and stronger evidence that frontier models are contributing to serious technical work.
NVIDIA released a new open multimodal model built for agent loops. Nemotron 3 Nano Omni combines audio, image, video, and text in one reasoning loop, ships with 30B parameters and 256K context, and quickly landed across vLLM, Together AI, fal, and Ollama. fal highlighted roughly 9× higher throughput from fewer inference hops in multimodal agent workflows .
Codex moved closer to a general work agent. Recent updates added macOS computer use, an in-app browser for inspecting localhost builds, built-in image generation, plugins, first-class artifacts, and follow-up automations. OpenAI also added a /fast mode for GPT-5.5 in Codex at 1.5× speed and reset rate limits for all paid plans .
OpenAI’s math signal kept strengthening. OpenAI said GPT-5.4 Pro helped solve a 60-year-old Erdős problem, while GPT-5.5 Pro reached a new high of 159 on Epoch’s Capabilities Index and improved FrontierMath results, including solving two previously unsolved Tier 4 problems across runs .
Research & Innovation
Why it matters: The most useful research today focused on where current systems still break: retrieval, post-training efficiency, and safety visibility.
MathNet exposed a major retrieval gap in math AI. The MIT benchmark includes 30,676 Olympiad-level problems from 47 countries and 17 languages; top models reached 78.4% problem-solving accuracy, but retrieval Recall@1 was only about 5%, with RAG improving results by up to 12%.
Self-distillation is emerging as a serious post-training alternative. MIT and ETH Zurich researchers described a setup where models act as their own teacher using feedback or demonstrations; they highlighted SDPO for RL, SDFT for continual learning, and argued the approach is simpler and faster than GRPO, with production use already underway .
A new “Introspection Adapter” targets hidden model behavior. Researchers trained a single adapter that makes finetuned models describe their behavior and generalizes to detecting hidden misalignment, backdoors, and safeguard removal .
Products & Launches
Why it matters: The most notable launches were practical: privacy, enterprise research, and deployable coding models.
OpenAI shipped Privacy Filter. It is a 1.5B-parameter, open-source, on-device model for PII detection and redaction, scored at 96% F1 on PII-Masking-300k, and can detect sensitive text including API keys.
Google launched Deep Research and Deep Research Max. The new Gemini 3.1 Pro-powered agents combine open-web search with proprietary enterprise data via MCP in a single API call .
Poolside released its first open-weight coding model.Laguna XS.2 is a 33B total / 3B active MoE for agentic coding and long-horizon tasks, trained in-house, runnable on a single GPU, and released under Apache 2.0.
Industry Moves
Why it matters: Partnerships are increasingly about distribution, workflow control, and high-value verticals rather than just model access.
Profluent signed a major pharma deal with Eli Lilly. The partnership is worth $2.25B plus royalties and focuses on AI-designed proteins for large gene insertion therapeutics .
Google added Exa search inside Gemini. Exa said its agent-first search now powers Grounding With Exa for Gemini, giving models access to billions of websites, technical docs, papers, people, and companies .
Anthropic pushed Claude deeper into creative software. New partnerships with Blender, Autodesk, Adobe, Ableton, and others connect Claude directly to professional creative workflows; the Blender connector can debug scenes, build tools, and batch-apply changes across objects .
Policy & Regulation
Why it matters: Government AI contracts are becoming more consequential for both deployment norms and internal company politics.
- Google’s Pentagon contract became one of the day’s biggest governance stories. Posts citing The Information said Google signed a classified deal allowing use of its AI for “any lawful government purpose” and requiring help adjusting safety filters; more than 600 employees reportedly opposed the move, and lawyers said the contract’s “not intended for” language on surveillance and autonomous weapons carries no legal weight .
Quick Takes
Why it matters: A few smaller releases still stood out for real-time multimodality, evaluation infrastructure, and world-model tooling.
- MiniCPM-o 4.5 open-sourced a 9B full-duplex multimodal streaming model and said it can run offline on Windows and macOS hardware .
- fal launched World Model Accelerator, an inference engine for generative media and world models that scales from 1 to 1,000+ GPUs.
- ParseBench launched with 2,000 verified pages from real enterprise documents plus a Kaggle leaderboard for document understanding .
- VibeBench is recruiting 1,000 software engineers to rank models on real engineering work, with public reports planned after each evaluation round .
Garry Tan
Invest Like The Best
Tim Ferriss
What stood out
Today’s strongest recommendations came with unusually clear reasons to pay attention: Tim Ferriss extracted a reusable performance rule from The Inner Game of Tennis, Paul Tudor Jones said an Acquired episode on Berkshire Hathaway changed how he sees Buffett , and Garry Tan pointed back to Vannevar Bush’s Memex idea as a framework he has carried since 1999 . A separate Paul Graham co-sign made one timely podcast episode on Silicon Valley Bank worth saving as well .
Start here
The Inner Game of Tennis
- Content type: Book
- Author/creator: W. Timothy Gallwey
- Link/URL: No direct book URL was provided in the source; source context: Tim Ferriss post
- Who recommended it: Tim Ferriss
- Key takeaway: Ferriss highlighted the book’s case for relaxed concentration and pulled out one operating rule: the secret to winning is “not trying too hard,” and the feeling of overexertion is a cue to reset priorities, technique, focus, or mindfulness rather than press harder
- Why it matters: This was the most useful recommendation in the set because it came with a concrete diagnostic readers can apply immediately.
“The player of the inner game comes to value the art of relaxed concentration above all other skills; he discovers the true basis for self-confidence; and he learns that the secret to winning any game lies in not trying too hard.”
Two evergreen picks with different use cases
Acquired’s Berkshire Hathaway episode
- Content type: Podcast episode
- Author/creator: Acquired
- Link/URL: No direct episode URL was provided in the source; source context: Invest Like The Best interview
- Who recommended it: Paul Tudor Jones
- Key takeaway: Jones said the episode changed his view of Buffett enough that he apologized for underrating him, crediting the episode for showing that Buffett understood compound interest at age 9, sought out Benjamin Graham at 17, and later paired with Charlie Munger
- Why it matters: This is the cleanest investing recommendation of the day because it connects Buffett’s early formation to Jones’s bigger point about compounding.
Vannevar Bush’s 1945 Memex article
- Content type: Article / paper
- Author/creator: Vannevar Bush
- Link/URL: No direct article URL was provided in the source; source context: Garry Tan post
- Who recommended it: Garry Tan
- Key takeaway: Tan said the Memex concept has been on his mind since 1999 and centered Bush’s claim that the mind “operates by association”
- Why it matters: Tan’s emphasis was not on storing more pages, but on the links between them—his clearest explanation of why associative connections matter more than a static archive
“The human mind operates by association. With one item in its grasp, it snaps instantly to the next.”
One timely episode with a direct link
Social Radars episode with Ron Conway on Silicon Valley Bank
- Content type: Podcast episode
- Author/creator: Social Radars
- Link/URL:https://pod.link/1677066062/episode/MmRjMWUwMmUtNWEwYi00OTY2LTg1YTctZTRmYmU3MjFlNjAz
- Who recommended it: Paul Graham
- Key takeaway: Graham boosted an episode Jess Livingston described as Ron Conway’s first public account of the frantic, behind-the-scenes effort that kept Silicon Valley Bank’s failure from triggering a Depression-style financial panic, and added that there are “real bombshells”
- Why it matters: This is less evergreen than the other picks, but it appears to offer rare firsthand context on a defining startup-finance event.
Bottom line
If you only open one resource, start with The Inner Game of Tennis for the clearest reusable framework. If you want an investing listen, go to Acquired’s Berkshire Hathaway episode; if you want a foundational idea for knowledge work, go to Bush’s Memex. The Social Radars episode is the timely add-on for recent Silicon Valley history.
Aakash Gupta
Product School
a16z
Big Ideas
1) Enterprise software is moving from AI-inside-the-product to products consumable by agents
Product companies first tried to add AI directly into existing products, often as chat or a fused human/AI experience. The newer pattern is to treat AI as a user: make the product more like a CLI or headless tool that agents can consume, instead of forcing a hybrid model that speakers said has not worked well . Salesforce's move to full headless mode for agents was described as a bellwether for enterprise software and raises new monetization questions such as API taxes or agent seats .
Why it matters: PMs may need to design for both human users and machine users, with different interface and pricing assumptions .
How to apply: Review which workflows are currently being handled through AI overlays. For flows better suited to automation, ask whether the stronger move is to expose actions and data in a form agents can reliably consume .
2) As building gets cheaper, governance becomes more valuable
Retool's thesis is that once software creation gets cheaper, the harder problem becomes management: how to deploy, govern, monitor, and roll back software and agents . Their recommended operating model is federated: centralize the data and action layers that agents need, then let teams build on top of those foundations . The risk of skipping this is sprawl: one customer found multiple internally built versions of the same app, each showing different numbers .
"The writing of the software is actually not the hard part... how do you manage the software? How do you deploy the software? How do you govern the software?"
Why it matters: PMs now have to think not just about what gets built, but where central control is necessary and where local experimentation is safe .
How to apply: If your org is democratizing app or agent building, separate the stack into shared foundations and decentralized creation. Centralize data access and action layers first, then expand who can build .
3) Strong product leadership is aligned autonomy, not command-and-control
When uncertainty rises, companies often revert to command-and-control because it feels faster and safer . Teresa Torres and Petra Wille argue that this breaks down in complex environments because no single leader holds all the context . The alternative is strong direction with guardrails and feedback loops, plus decision-making by the person closest to the problem using consultative decision-making . Their "flotilla of kayaks" metaphor captures the goal: shared direction with independent exploration .
"Strong leadership is about direction, guardrails, and feedback loops-not control"
Why it matters: This is a better fit for product work, where expertise is distributed across PM, engineering, design, data, and go-to-market functions .
How to apply: Treat leadership style as a spectrum. Use direct direction in true "burning house" moments, but default to a model where one informed owner decides after taking input from others .
4) Behavior change works best when the product asks for one small action now
Across WHOOP and Big Health, the winning pattern was not bigger ambition; it was reducing change to one immediate action. WHOOP users who saw their "WHOOP Age" often wanted to change many behaviors, but the most effective prompt was a new bedtime to aim for tonight. Big Health found that people stalled when they chose large mood-lifting actions; engagement improved when the product helped them commit to one daily action and the smallest first step .
"Getting started is everything"
Why it matters: PMs working on behavior change, team habits, or AI skill-building can often improve outcomes by shrinking the first ask rather than increasing motivation .
How to apply: Replace broad change goals with one concrete action the user can take today, then break that action into the smallest possible first step .
Tactical Playbook
1) A consultative decision loop for product teams
- Identify the decision and the person with the most relevant expertise .
- Gather input from others without forcing consensus overload .
- Have one person decide after incorporating that input .
- Make leadership's job explicit: set direction, guardrails, and feedback loops .
- Adjust the amount of central control to the situation; urgent, high-risk moments may need faster direction, while normal product work scales better with distributed action .
- If you are lower in the hierarchy, manage up and earn trust over time to create more autonomy for the team .
Why it matters: It preserves speed without assuming a single leader can hold all the context .
How to apply: Start with one recurring decision type and make the decider plus consultative inputs explicit before the next meeting .
2) A smallest-step pattern for product-led behavior change
- Start with the user's desired outcome, but do not ask for a full transformation on day one .
- Turn the change into one specific action the user can take today or tonight.
- If the action still feels large, break it into the smallest possible first step .
- Repeat the cycle daily so momentum comes from starting, not from waiting for a less busy future .
Why it matters: WHOOP used this to drive progress on health metrics, and Big Health used it to sustain engagement and improve depression symptoms .
How to apply: Use this pattern anywhere users say they want change but feel "too busy." The source material argues that busyness often masks procrastination, not lack of intent .
3) A source-of-truth audit for product and go-to-market teams
If product details live across decks, docs, spreadsheets, websites, and different teams, turning them into usable sales and marketing assets gets harder . Use this audit:
- Where does the source of truth live?
- How do updates get collected from the right teams?
- How do you align when people describe the same thing differently?
- How do you distinguish a feature, capability, benefit, and proof point?
- How will it stay current over time?
- Which outputs does it need to support-battlecards, launch assets, sales decks, enablement, competitive comparisons, or analyst and customer materials?
Why it matters: This is a recurring PM pain point when teams need consistent messaging and fast asset creation .
How to apply: Even if you do not solve the whole system this week, use the questions above to expose ownership gaps and classification problems before the next launch .
4) Measure AI leverage at the review layer, not just the build layer
In one Box example, AI built probably 80-90% of a new feature, but release speed was still constrained by security review, code review, and production pipeline steps . The result was still meaningful-estimated at 2-3x across the board-but not the 5-10x gain people may imagine if the rest of the product development life cycle stays unchanged .
Why it matters: PMs can overestimate delivery gains if they measure generation speed and ignore the rest of the release system .
How to apply: When AI accelerates prototyping or coding, track where the work stalls next. In this example, the next bottlenecks were review and release, not generation .
Case Studies & Lessons
1) Retool: separate the new bet, then be willing to replace your own assumptions
Retool's AI agents effort worked well when it was set up as a separate team and product with a different use case from the core app-building business; the company says it avoided cannibalization and grew rapidly . By contrast, Retool says it made the wrong call on the core product by doubling down on drag-and-drop and teaching LLMs to use that interface instead of letting LLMs generate code directly. The company is now considering a full rearchitecture despite having nine figures of revenue on the existing product .
Key takeaway: Protect new bets when they are genuinely distinct, but once conviction changes, do not let installed revenue freeze product architecture .
How to apply: Ask two separate questions: should this be isolated from the core, and later, has the core assumption itself changed? Retool answered those questions differently at different stages .
2) Retool widened both the builder base and the enterprise envelope
Over the last 12-18 months, Retool saw an inflection in non-engineers building production applications; today, the majority of builders are non-developers, a shift that started before AI and accelerated with it . Retool also chose not to be a system of record: it connects to data wherever it lives and allows deployment in customer environments, a choice grounded in its view that 90-95% of internal tools rely on external data. The company says that helped unlock customers including the US Air Force, Navy, Army, and Coinbase . The upside can be large: customers build hundreds or thousands of apps they otherwise would not build, and one application reportedly saved around $50 million despite being far down the normal priority list . The tradeoff is governance as building gets easier .
Key takeaway: Democratized building can expand value far beyond developer time savings, but only if the org can keep outputs consistent and governed .
How to apply: When positioning internal tools or AI-builder products, look beyond "faster development" and ask what previously unprioritized workflows become viable-and what governance layer must exist for them to be trusted .
3) Box shows what agentic UX looks like when it beats human workflow limits
Box's agent can search across an entire Box environment, run multiple queries, inspect hundreds of results instantly, and rerank them-rather than following the one-query, one-results-page pattern of human search . The lesson is not just "add an assistant." It is to design agent experiences that outperform human process constraints instead of inheriting them .
Key takeaway: If an agent can do parallel retrieval and ranking, PMs should not force it through a human-speed UI mental model .
How to apply: For search, triage, or research workflows, identify which steps exist only because humans are sequential and see whether an agent can collapse them .
4) WHOOP and Big Health improved outcomes by shrinking the ask
WHOOP users who wanted to improve Healthspan metrics made more progress when the product suggested a new bedtime for tonight . Big Health saw a common failure mode when patients chose actions that were too ambitious; breaking them down into a daily mood-lifting action and then the first step helped keep users engaged and drove clinically validated improvements in depression symptoms .
Key takeaway: When users stall, the right move may be a smaller next step, not more information or ambition .
How to apply: In products designed to change behavior, reduce the first commitment until it becomes hard to avoid starting .
Career Corner
1) The AI builder PM path still starts with fundamentals
Aakash Gupta summarizes Mahesh Yadav's framework as a three-stage path. Stage 1 is 2-3 weeks of fundamentals: what a model is, what intelligence and knowledge mean in these systems, and how the tools fit together . Stage 2 is Claude Code and Cowork, where the job is building systems that learn your patterns through checklists, learners, and a human-in-the-loop update layer . Stage 3 is OpenClaw: delegating one full job task to a sandboxed agent with its own world and permissions .
The sequencing is the point. The source says Stage 2 without Stage 1 is why agents hallucinate and are hard to debug, while Stage 3 without Stage 2 is how people hand autonomous agents dangerous permissions too early .
Why it matters: Gupta's claim is that the PMs earning $500K+ share this foundation work, and most others try to skip it .
How to apply: Do not jump straight to autonomy. Start with fundamentals, then pattern-learning systems, then sandboxed delegation .
2) Clear decisions empower teams more than nuanced non-decisions
Retool CEO David Hsu argues that leadership decisions are additive, not zero-sum: when top leaders fail to decide, others are not empowered-they are blocked . His operating rule is blunt: nuance does not scale, and if company strategy cannot be explained in a sentence or two, it is probably a bad strategy .
"If you cannot communicate your strategy in a sentence or two you probably have a bad strategy"
Why it matters: Teams need clarity on where the company is going, even if a later correction is required .
How to apply: Pressure-test your own strategy statements. If they cannot fit into one or two sentences without caveats, they may not be clear enough to guide execution .
3) You can earn more autonomy even inside hierarchical orgs
Teresa Torres and Petra Wille note that some command-and-control companies still work because teams earn unofficial autonomy over time, and they discuss how teams can manage up to build that trust .
Why it matters: Career growth is not only about title; it also changes what your team is trusted to decide .
How to apply: Use these reflection questions in your next retro or 1:1: where does your team sit on the command-and-control versus autonomy spectrum, are decisions being made by the people with the most relevant expertise, and what would it take to increase trust and autonomy?
Tools & Resources
1) Aakash Gupta's builder PM note
Why explore: It lays out a usable sequence for AI leverage-fundamentals first, pattern learning second, sandboxed delegation third .
How to use: Follow the stages in order to avoid hallucination, debugging problems, and unsafe autonomy .
2) Enabling Non-Engineers to Build AI Agents & Apps | Retool CEO
Why explore: It is a strong discussion of non-developer builders, governance, and when to cannibalize a core product .
How to use: Use it to pressure-test your AI roadmap, internal-tools strategy, and governance model .
3) Box CEO: Why Big Companies Are Falling Behind on AI | a16z
Why explore: It offers concise framing on headless software, agentic search, and why PDLC bottlenecks cap AI gains .
How to use: Review it with engineering and product leadership when discussing AI architecture or release-process changes .
4) Your Couch-to-5K for AI
Why explore: It turns behavior-change lessons from WHOOP and Big Health into a practical model for skill-building with AI .
How to use: Adapt it to any product or team behavior that is currently asking for too much at once .
5) Teresa Torres on command-and-control leadership
Why explore: It gives useful language for consultative decision-making, spectrum thinking, and aligned autonomy .
How to use: Bring the reflection questions into team retros or leadership discussions about decision rights and trust .
6) Source-of-truth audit question set
Why explore: It is a compact checklist for teams whose product facts and messaging are fragmented across artifacts and teams .
How to use: Turn the six audit questions into a launch-readiness or enablement review template .
Matt Wolfe
Rohan Varma
Demis Hassabis
Today’s signal
A lot of today’s news pointed the same way: AI progress is being judged less by raw scale alone and more by useful work—solving harder math, staying correct in structured tasks, handling multiple modalities in real systems, and producing assets people can use immediately .
OpenAI says math models are crossing into research work
OpenAI said GPT-5.4 Pro helped solve a 60-year-old Erdős problem, and researchers on the OpenAI Podcast described a sharp jump from routine failures in early 2025 to gold-medal performance at the International Math Olympiad, day-to-day help for Fields Medalists, and more than 10 genuinely new combinatorics results that are publishable in top journals . Ernest Ryu also said he resolved a 42-year-old optimization question after about 12 hours of back-and-forth with ChatGPT, with the model proposing ideas and Ryu acting as verifier and guide .
Why it matters: OpenAI is presenting math as a proving ground for longer reasoning horizons: the podcast framed current progress as a move toward systems that can think for days today, and eventually weeks or months, in support of an automated researcher model .
NVIDIA pushes multimodal AI closer to production environments
NVIDIA launched Nemotron 3 Nano Omni, an open multimodal model spanning video, audio, image, and text, saying it tops six leaderboards and can deliver up to 9x higher throughput than comparable open omni models through its 30B-A3B hybrid mixture-of-experts design . NVIDIA also argued that manufacturing has entered a simulation-first phase, with high-fidelity synthetic data enabling production-grade physical AI; it cited ABB reaching 99% sim-to-real accuracy and cutting commissioning time by up to 80%, while JLR reduced a four-hour aerodynamics simulation step to one minute .
Why it matters: The notable shift is not just a new model release. It is the combination of open multimodal agent tooling with concrete deployment paths in computer-use agents, document intelligence, audio-video workflows, and factory operations .
A new benchmark argues that valid JSON is not enough
The Structured Output Benchmark proposes measuring exact leaf-value accuracy, faithfulness, and perfect-response rates, rather than treating schema validity and type safety as the main success criteria . Its early results say most models clear 90%+ JSON pass rates but still drop sharply on value accuracy, and the release says open-source GLM 4.7 ranks second behind GPT 5.4 .
Why it matters: This lines up with a broader shift in how experts are talking about progress. Sara Hooker argued that recent returns on compute look better in post-training, alignment, data targeting, and gradient-free learning than in brute-force model growth alone .
"It is the slow death of brute force scaling alone. innovation now lies in how a model interacts with the world."
DeepMind’s Korea push ties AI progress to science, safety, and robotics
Demis Hassabis said Google DeepMind is partnering with Korea on AI for science work including materials science and weather prediction, youth education, and international safety standards, building on Korea’s role in hosting last year’s AI summit . In the same interview, he said Gemini’s multimodality puts physical AI on the threshold of major breakthroughs in factories, automotive settings, homes, and automated labs, and pointed to ongoing ties with Samsung, Hyundai, and SK Hynix .
Why it matters: This looks like more than a ceremonial visit. It connects frontier AI work to a country that Hassabis described as well positioned in robotics, manufacturing, mobile devices, and chips, and he separately said Korea has a leading part to play in AI safety and AI for science .
Image generation looks more like a work tool than a novelty feature
OpenAI’s ChatGPT Images 2.0 was described as materially more useful for practical tasks such as slide decks, multi-image carousels, storyboards, content calendars, and accurate visual explainers . Matt Wolfe showed it pulling context from URLs to build ads, real-estate flyers, and infographics from source pages, while Greg Brockman highlighted product ideas being shared internally through image generation and a one-shot Codex app screen mockup .
Why it matters: The emerging use case is less about standalone art and more about fast design, marketing, and product-spec work that can move from prompt to working asset in one step .
Start with signal
Each agent already tracks a curated set of sources. Subscribe for free and start getting cited updates right away.
Coding Agents Alpha Tracker
Elevate
Latent Space
Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.
AI in EdTech Weekly
Luis von Ahn
Khan Academy
Ethan Mollick
Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.
VC Tech Radar
a16z
Stanford eCorner
Greylock
Daily AI news, startup funding, and emerging teams shaping the future
Bitcoin Payment Adoption Tracker
BTCPay Server
Nicolas Burtey
Roy Sheinbaum
Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics
AI News Digest
Google DeepMind
OpenAI
Anthropic
Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves
Global Agricultural Developments
RDO Equipment Co.
Ag PhD
Precision Farming Dealer
Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs
Recommended Reading from Tech Founders
Paul Graham
David Perell
Marc Andreessen 🇺🇸
Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media
PM Daily Digest
Shreyas Doshi
Gibson Biddle
Teresa Torres
Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications
AI High Signal Digest
AI High Signal
Comprehensive daily briefing on AI developments including research breakthroughs, product launches, industry news, and strategic moves across the artificial intelligence ecosystem
Frequently asked questions
Choose the setup that fits how you work
Free
Follow public agents at no cost.
No monthly fee