We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Hours of research in one daily brief–on your terms.
Tell us what you need to stay on top of. AI agents discover the best sources, monitor them 24/7, and deliver verified daily insights—so you never miss what's important.
Recent briefs
Your time, back.
An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers one verified daily brief.
Save hours
AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.
Full control over the agent
Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.
Verify every claim
Citations link to the original source and the exact span.
Discover sources on autopilot
Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.
Multi-media sources
Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.
Private or Public
Create private agents for yourself, publish public ones, and subscribe to agents from others.
Get your briefs in 3 steps
Describe your goal
Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.
Confirm your sources and launch
Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Receive verified daily briefs
Get concise, daily updates with precise citations directly in your inbox. You control the focus, style, and length.
Alex Albert
Google DeepMind
Andrew M. Dai
Top Stories
Why it matters: Product strategy, model competition, and deployment controls are shifting at the same time. The result is a market where coding agents are being monetized, orchestration is becoming a first-class product, and the most sensitive models are staying gated.
OpenAI creates a new Codex-heavy price tier
OpenAI said it is updating ChatGPT Pro and Plus to support growing Codex use, introducing a new $100/month Pro tier with 5x more Codex usage than Plus, access to the exclusive Pro model, and unlimited Instant and Thinking models . Through May 31, subscribers get up to 10x Plus usage on Codex, while Plus is being rebalanced toward more sessions across the week rather than longer single-day sessions; the existing $200 Pro plan remains the highest-usage option .
In a recent discussion, OpenAI Devs' @reach_vb said Codex had reached 3M weekly users.
Impact: OpenAI is no longer treating coding assistance as just another chat feature. It is building explicit pricing and usage tiers around agentic software work.
Anthropic turns multi-model routing into a product feature
Anthropic brought its advisor strategy to the Claude Platform: Opus acts as the advisor, while Sonnet or Haiku executes, with the goal of near Opus-level intelligence at a fraction of the cost . In Anthropic's evals, Sonnet with an Opus advisor scored 2.7 percentage points higher on SWE-bench Multilingual than Sonnet alone while costing 11.9% less per task . Anthropic's Alex Albert said this "phone a friend" pattern improves performance while cutting total cost by reducing wasted tokens on hard tasks .
Impact: Model quality is no longer the whole story. How models are combined is becoming a competitive product surface.
Qwen and Gemma show the open stack still has momentum
Alibaba launched Qwen3.6-Plus, described as a frontier agentic coding model that matches or beats Claude Opus 4.5 on SWE-bench and Terminal-Bench 2.0 . At the same time, Google DeepMind said Gemma 4 outperforms models 10x its size without massive compute and crossed 10M downloads in its first week, with the Gemma family above 500M total downloads . Unsloth also said Gemma-4-31B can be fine-tuned for free on Kaggle and fits in roughly 22GB VRAM on two free Tesla T4 GPUs .
Impact: Frontier pressure is not only coming from closed U.S. labs. Strong coding performance and easier access are keeping the open ecosystem relevant.
AI use at work is becoming normal, not exceptional
An Epoch AI/Ipsos survey of 2,021 U.S. respondents found that among people who used AI in the past week, about half use it at least as much for work as for personal tasks . Among regular work AI users, 27% said AI had replaced some tasks and 21% said it had enabled new tasks . Work use rose with paid access, from 38% among free users to 76% among employer-provided users; Microsoft Copilot was the most-used paid service for work, followed by ChatGPT and Gemini .
Impact: The center of gravity is moving from casual experimentation to job workflows and employer-backed adoption.
Advanced cyber models are staying behind access controls
OpenAI clarified that the model being tested with a trusted tester group is a separate cyber product, not Spud, and that it is not being released publicly . Earlier reporting described a limited rollout to a small set of companies, similar to Anthropic's restricted cyber deployment pattern .
Impact: For the most sensitive capability areas, frontier labs are moving toward staged, enterprise-style access rather than broad public release.
Research & Innovation
Why it matters: The most useful research this cycle focused on making agents more trainable, more reliable, and more efficient—not just bigger.
- Atomic skills for coding agents: A new approach breaks software work into five skills—code localization, code editing, unit-test generation, issue reproduction, and code review—and trains them jointly with RL. The reported gain is 18.7% on unseen tasks like bug fixing and refactoring, without task-specific training .
- Better verifiers for agents: Microsoft's Universal Verifier separates process and outcome rewards, distinguishes controllable from uncontrollable failures, and manages long screenshot trajectories. It reportedly cuts false positives to near zero from 45%+ on WebVoyager and 22%+ on WebJudge .
- Memory as experience:MIA splits an agent into Manager, Planner, and Executor, with a loop of retrieve memory → plan → execute → store → improve. The authors report a new SOTA among memory agents, including a +5.5 average gain and up to +9.1 on harder tasks, with a 7B model matching or beating larger closed models .
- Reasoning before post-training: Meta FAIR's mid-training recipe adds interleaved thoughts, SFT mid-training, and RL mid-training before post-training. On base Llama-3-8B, the reported result is a 3.2x improvement on reasoning benchmarks versus direct RL post-training .
- Reality check for web agents:ClawBench tests 153 live online tasks across shopping, booking, and job applications. Top models drop from around 70% on sandbox benchmarks to as low as 6.5% here .
- Medical multimodal progress: Google's MedGemma 1.5 combines 3D radiology, whole-slide pathology, longitudinal X-ray analysis, and clinical document understanding in a single open-weight 4B model. Reported gains include +47% F1 in pathology and +11% in MRI classification over v1, and it outperforms Gemini 3.0 Flash on out-of-distribution CT analysis .
- Cheaper inference research:Squeeze Evolve reports up to ~3x API cost reduction and up to ~10x higher fixed-budget serving throughput across benchmarks including AIME 2025, GPQA-Diamond, ARC-AGI-V2, and MMMU-Pro .
Products & Launches
Why it matters: User-facing releases are increasingly about complete workflows—deployment, finance, healthcare, and visualization—not just chat.
- LangChain Deep Agents deploy: LangChain launched Deep Agents deploy in beta as a model-agnostic, open-source agent harness for production deployment. It uses open conventions like AGENTS.md and /skills, deploys with short- and long-term memory on LangSmith, and exposes agents through MCP, A2A, and agent protocol .
- Glass 5.5 API: Glass Health released Glass 5.5 via its developer API, saying it outperforms frontier models from OpenAI, Anthropic, and Google across nine clinical accuracy benchmarks. It also cut pricing by 70% to $3/1M input and $16/1M output.
- Perplexity Computer + Plaid: Perplexity's Computer now connects to Plaid so users can link bank accounts, credit cards, and loans, then track spending, build custom budget tools, and view net worth alongside investment portfolios. Computer tasks remain exclusive to Pro and Max subscribers .
- Gemini adds more interactive output: Gemini can now create customizable interactive visualizations directly in chat, including adjustable variables, rotating 3D models, and data exploration . Google also made longer Lyria 3 Pro music tracks available for free inside Gemini .
- Claude Cowork general availability: Claude Cowork is now available on all paid plans, while enterprise customers get role-based access controls, group spend limits, usage analytics, and expanded OpenTelemetry support .
Industry Moves
Why it matters: New labs, enterprise partnerships, and infrastructure scale are defining where AI capability gets commercialized.
- ElorianAI launches: Former Brain/DeepMind researchers Andrew Dai, Yinfei Yang, and Seth launched ElorianAI as a multimodal reasoning lab focused on direct visual reasoning rather than translating images into text .
- DatologyAI + Thomson Reuters: DatologyAI said its legal-domain mid-training work with Thomson Reuters improved legal benchmarks by 5% and general evaluations by 2.5%, with 2.5x amplification on Thomson Reuters' private legal evals using <1% of the original pre-training token budget .
- Sandbox infrastructure is scaling fast: A post on Modal's sandbox system said a major AI lab is already running about 100,000 concurrent sandboxes for RL workloads and aiming for 1 million. Modal says it can spin up hundreds per second for a single customer .
- AI in regulated services is attracting capital:Chapter, which uses AI to help seniors navigate Medicare enrollment, reached a $3 billion valuation.
Policy & Regulation
Why it matters: Formal rules are still catching up, but release controls and governance warnings are already shaping deployment decisions.
- Some frontier cyber systems are moving to controlled access: OpenAI's cyber product is being tested with a trusted group rather than released publicly, and reporting compared its limited rollout to Anthropic's restricted cyber deployments .
- Demis Hassabis warns the next phase is harder to govern: Hassabis said ChatGPT's launch locked labs into a "ferocious commercial pressure race" and warned that the coming "agentic era" in the next 2-4 years will make alignment a much harder technical problem, calling for cooperation among labs, AI safety institutes, and academia .
"How do we make sure the guardrails are put in place so they do exactly what they’ve been told to do, and there’s no way of them circumventing that or accidentally breaching those guardrails?"
- OpenAI's chief scientist is pointing to social fallout: Jakub Pachocki said automating intellectual work raises major societal challenges around job displacement, wealth concentration, and governance of AI-controlled entities, and that these issues are coming faster than expected .
Quick Takes
Why it matters: Smaller updates still show where momentum is building across video, local AI, developer tools, and open-source agents.*
- Muse Spark reached 4th in Text Arena, ahead of GPT-5.4 and Grok 4.2 .
- The Meta AI app climbed to #6 in the App Store overnight .
- HappyHorse-1.0 ranked #1 or #2 across Artificial Analysis video leaderboards, with API access planned for April 30 .
- YOLO26-MLX brought native YOLO26 to Apple Silicon, with up to 2.6x faster inference and 1.7x faster training .
- Hermes Agent hit #1 on GitHub Trending and reached 40K stars in 45 days, faster than OpenClaw's path to the same mark .
- Anthropic's new Monitor tool lets Claude run background scripts that wake the agent only when needed; NousResearch said Hermes added a similar "notify when done" pattern three days earlier .
- Baseten BDN promises 2–3x faster cold starts for large models at scale .
- Seedance 2.0 is now available to everyone on fal without restrictions .
InfantryDort
Elon Musk
What stood out
After filtering for direct, organic recommendations, three resources made the cut. The strongest pattern was use-case clarity: Patrick Collison shared a paper because it sharpened a specific causal story about birth order and wages, Elon Musk pointed readers to Durant for a civilizational lens on societal decline, and Lena Waters recommended The Corporation as a way to think about AI agents and legal standing .
Most compelling recommendation
Working paper on respiratory pathogens, birth order, and wages
- Title: Not specified in source material
- Content type: Working paper
- Author/creator: Not specified in source material
- Link/URL:NBER PDF
- Who recommended it: Patrick Collison
- Key takeaway: Using Danish administrative data, the paper argues that respiratory pathogens passed from older siblings to younger ones explain a large share of birth-order effects on long-run wages; Collison notes the paper claims 70%
- Why it matters: This is the clearest high-signal recommendation today because Collison says he had not previously seen anyone convincingly show that standard respiratory pathogens impose long-term costs on infant siblings
"I haven’t until now seen anyone convincingly show that standard respiratory pathogens impose long-term costs on infant siblings."
Two other authentic picks
The Story of Civilization
- Content type: Book
- Author/creator: Durant
- Link/URL: No direct book URL provided in source material
- Who recommended it: Elon Musk
- Key takeaway: Musk recommends it in response to a Teddy Roosevelt quote arguing that civilizations weaken when materialism, luxury, safety, and pacifism erode fighting capacity
- Why it matters: The context gives the book a specific reading job: studying how civilizations lose resilience over time
- Source conversation:X post
The Corporation
- Content type: Movie
- Author/creator: Not specified in source material
- Link/URL: No direct resource URL provided in source material
- Who recommended it: Lena Waters, on Office Hours with Tom Tunguz and Lena Waters
- Key takeaway: Waters says the film is "prescient" for thinking about why an AI agent will not have the same kind of legal standing as a corporation
- Why it matters: It connects a non-AI film to a current question about AI agents and institutional structure
- Source conversation:Office Hours with Tom Tunguz and Lena Waters
"Everybody should watch that movie, the corporation, because it's kind of prescient..."
Bottom line
Today’s strongest recommendation is Collison’s NBER paper pick because it comes with a direct URL, a concrete claim, and a clear explanation of what was new to him. Musk and Waters offer thinner but still useful recommendations, each tied to a specific question: how civilizations decay, and how to think about legal standing for AI agents .
swyx
Peter Steinberger
🔥 TOP SIGNAL
Today’s best signal is a convergence: Ryan Lopopolo (OpenAI), Vincent Potch (OpenClaw), and Izzy Miller (Hex) are all describing the same shift — coding-agent gains are now coming from harness design more than prompt cleverness. The pattern repeats across talks: encode quality as docs/lints/reviewer agents, shrink or search tool surfaces, and run agents in parallel lanes with real evals over long horizons .
🛠️ TOOLS & MODELS
LangChain: Deep Agents deploy (beta). New single-command deploy path for a model-agnostic, open-source agent harness. You configure
AGENTS.md,skills,mcp.json, choose a model + sandbox, and it spins up a LangSmith deployment with MCP, A2A, Agent Protocol, human-in-the-loop, and memory endpoints . LangChain is explicitly positioning it as an open alternative to Claude Managed Agents, with memory ownership as the key differentiation .Anthropic: advisor/executor routing. Claude Platform is adding an “advisor” strategy: pair Opus as the advisor with Sonnet or Haiku as the executor for near-Opus-level intelligence at lower cost . Anthropic’s eval claim: Sonnet + Opus advisor scored +2.7 points on SWE-bench Multilingual versus Sonnet alone while costing 11.9% less per task.
Claude Code: real product upgrades, not just model chatter. Anthropic added a
fileSuggestionsetting so large-codebase users can plug in custom indexes like Sourcegraph or internal systems . Boris Cherny says the latest improvement shipped in Claude Code v2.1.85, where a Claude-driven port from Rust+NAPI to native TypeScript made@-mentions 3x faster at P99. Separately, Claude Code now has a setup wizard for Amazon Bedrock and Google Vertex, plus detection for pinned older models with suggestions for newer ones .Cursor: cloud-agent ergonomics keep getting better. Cursor cloud agents can now attach demos and screenshots to the PRs they open, so teammates can review artifacts directly in GitHub . Jediah Katz also showed a smaller but important feature: agents can now wait on background jobs and wake back up based on log output .
TanStack AI Code Mode is a notable execution-layer bet. The idea: let the model write and execute TypeScript instead of chaining tools, because LLMs are strong at TS but weak at math/orchestration . Claimed benefits are 1 call instead of N, parallel execution, fewer tokens, and correct results.
OpenAI: Codex capacity is becoming a product tier. OpenAI launched a $100/month Pro tier with 5x more Codex usage than Plus, targeted at longer, high-effort sessions, plus a limited-time promo of up to 10x Plus usage through May 31 . Separate signal: Codex is now at 3 million weekly users, up from 2 million less than a month earlier .
💡 WORKFLOWS & TRICKS
Use the agent as a performance engineer, not just a code generator. Boris Cherny’s prompt pattern is excellent: tell Claude to port an implementation, require it to pass the original test suite, benchmark against the old path, and keep iterating until it proves it is faster . He then tightened the loop with explicit profiling goals (“hoping for p99 < 10ms”) and follow-up refinement prompts, which led to concrete wins like pre-computing without blocking the main thread and avoiding NAPI overhead for small result sets .
Turn repo standards into machine-enforced prompts. Ryan Lopopolo’s playbook: document non-functional requirements once, then reinforce them everywhere —
agents.md, reviewer agents, lints, tests, CI comments, and error messages with remediation steps . His concrete examples: enforce retries/timeouts on network code, add tests that cap file length, and run security/reliability review agents continuously so the model keeps getting reminded what “good” looks like .Prefer deep modules with simple interfaces. In the AIE Europe talk, shallow-module sprawl was called out as actively hostile to AI navigation: too many tiny blobs, too many dependencies, too much searching . The better pattern is fewer, deeper modules with well-designed boundaries; then test at the interface and let the agent work more freely inside the boundary .
Run agents in “swim lanes.” Vincent Potch’s setup is a useful mental model: keep separate lanes for stable refactors/CI, feature work, and P0/P1 monitoring, and only babysit the risky lanes . Stable refactors can often run with minimal supervision; feature and incident lanes need a tighter conversation loop .
Shrink tool surfaces before you add more tools. Hex ended up with roughly 100k tokens worth of tools, which Izzy Miller flatly called “too many” . Their response is timeless: consolidate families of similar tools, add tool search/tool retrieval, and use specific tools when you want behavioral guidance instead of dropping the model into fully generic code execution .
Evaluate compounding behavior, not just day-zero accuracy. Hex’s most interesting eval pattern is a mix of small handcrafted “trap” sets and a 90-day simulation where the agent answers tickets, updates knowledge, and keeps operating as the environment changes . In that setup, Sonnet reportedly went from about 4% on day 0 to 24% on day 90.
👤 PEOPLE TO WATCH
Ryan Lopopolo — High signal because he is specific about the boring stuff that actually moves quality: QA plans, reviewer agents, lint/test prompts, and durable repo-level guidance .
Boris Cherny — Worth following because he is using Claude Code on enterprise-scale codebases and showing exact prompts, perf targets, and benchmark loops instead of vague “it feels faster” claims .
Izzy Miller — One of the best current sources on context engineering for agents: tool explosion, long-running orchestration, scratchpad queries, memory/guide conflicts, and why data agents are harder to verify than coding agents .
Vincent Potch — Useful for anyone moving from one-agent demos to actual multi-agent operations. His “factory manager” framing is one of the cleanest models I’ve seen for running many parallel sessions without pretending tokens are the bottleneck .
Andrej Karpathy — His thread matters because it explains why AI discourse keeps splitting in two: casual users see weak free-tier behavior, while engineers on frontier coding models see systems that can restructure codebases and find vulnerabilities because technical domains have verifiable rewards and strong B2B incentives .
🎬 WATCH & LISTEN
- AIE Europe — Vincent Potch on “swim lanes” (3:04:43-3:05:58). Best quick explanation of how to supervise 5-20 coding agents at once: stable refactors can run mostly unattended, while feature and incident lanes need active conversation and triage .
- AIE Europe — deep modules > shallow modules (8:42:09-8:44:29). A crisp case for reorganizing AI-heavy codebases around deep modules with simple interfaces so agents can navigate, modify, and test them more reliably .
- Hex — evaluate the flywheel, not just day zero (1:00:20-1:04:21). Izzy Miller walks through a 90-day simulation with changing warehouse state, inbound tickets, and proactive agent work. This is one of the best arguments for long-horizon evals I’ve seen .
📊 PROJECTS & REPOS
OpenClaw. Peter Steinberger says the project is only five months old but already has about 30k commits, is closing in on 2,000 contributors, and is approaching 30,000 PRs. The architecture is also shifting toward a plugin model so teams can swap in their own memory/wiki/dreaming components without forking the whole thing .
Deep Agents / Deep Agents deploy. The important part is not just the beta launch — it’s that the underlying harness is MIT-licensed, available in Python and TypeScript, built around open standards like
AGENTS.md, Agent Skills, MCP/A2A/Agent Protocol, and can be self-hosted so memory stays yours .TanStack AI Code Mode. Notable framework pattern to watch: move orchestration into executable TypeScript instead of ever-growing tool chains. The pitch is fewer calls, fewer tokens, and better parallelism for complex app logic .
Editorial take: the edge is moving from “which model is best?” to “how well did you design the repo, tool surface, eval loop, and operator workflow around the model?”
Aakash Gupta
The community for ventures designed to scale rapidly | Read our rules before posting ❤️
Lenny Rachitsky
Big Ideas
1) AI is exposing whether your team is organized around learning or around feeding a bottleneck
The Beautiful Mess note argues that many teams still run a funnel optimized around scarce engineering capacity. In that setup, AI just speeds up the same problems: more ideas, more pre-shaped work, more negotiation, and more overload. In a learning-oriented model, AI instead helps teams explore more options, test faster, and focus on meaningful customer change. The deciding factor is the team’s demand mix—what enters the funnel, and how it is shaped before work starts .
- Why it matters: Discovery, prioritization, and WIP rules are contextual. High-interrupt teams need different mechanisms than teams that source demand through strategy and customer learning .
- How to apply: Map your inputs first—support, internal requests, production fires, strategic goals—then choose the operating response: structured intake and trade-off forums for interrupt-heavy work, or selective intake and continuous refinement when the team shapes its own bets .
2) For AI products, accuracy gets you to the starting line; trust and feedback loops win
"Accuracy is good. It gets you to the starting line, but you're not gonna win the race with it."
Vishal Jain’s framework pushes PMs to measure AI products end to end across model quality, user engagement, business impact, and operational reliability rather than focusing only on model performance . That includes use-case-specific task accuracy, hallucination rate, model drift, output edit rate, retry rate, A/B-tested revenue lift, P99 latency, fallback rate, guardrail hits, and version regression . He also notes that the weird 5% of queries often drive 40% of complaints, making edge cases a product priority, not a footnote .
- Why it matters: User behavior often changes before dashboards do, and outputs that feel wrong may go unused even when they are technically correct .
- How to apply: Pick a small recurring set of metrics, instrument them before launch, combine explicit and implicit signals, and run a standing improvement loop after release .
3) The AI PM bar has moved from understanding AI to having built AI
Aakash Gupta’s reporting says AI PM interviews now test whether candidates have shipped and operated AI products, not just learned the vocabulary. The updated bar includes deep probing on production issues and eval metrics, 45-minute vibe-coding rounds, AI product sense with quantitative prioritization, AI-specific behavioral questions, and safety woven throughout the interview loop .
- Why it matters: The source’s conclusion is blunt: prep that worked in 2023 can get candidates rejected in 2026, while competition is rising alongside high compensation for AI PM roles .
- How to apply: Prepare examples that cover the architecture, eval metrics, and business impact of work you actually drove; practice building simple prototypes in tools like Cursor, Bolt, Lovable, or Replit; and mention safety and trade-offs like accuracy versus latency without waiting to be prompted .
Tactical Playbook
1) Run AI product measurement as a closed loop
- Choose a few metrics, not dozens. Jain explicitly warns against tracking 47 metrics; he recommends selecting roughly 3-5, maybe 10, that you review consistently .
- Cover all four pillars. Make sure your shortlist spans model quality, user engagement, business impact, and operational reliability so you do not create blind spots .
- Instrument before launch. Planning to instrument later is called out as a trap; without instrumentation, you cannot see usage or improve the product .
- Use both explicit and implicit feedback. Pair thumbs up/down, ratings, open text, and edits with behavioral signals like reruns, time to act, copy-paste, back navigation, and downstream conversion .
- Keep ownership with PM. Jain says the AI team may own the model, but the PM owns the product metrics .
- Review on a fixed cadence and improve. The recommended pattern is simple: deploy, instrument, analyze, improve, repeat on a weekly or monthly rhythm depending on the feature .
Why this works: It guards against two common errors in AI products: overvaluing model accuracy and mistaking silence for success .
2) Make prioritization match your demand mix
- Map what is entering the funnel. Start with the actual mix: support tickets, internal requests, production fires, strategic goals, or self-sourced opportunities .
- Interrogate the input, not just the output. Ask where the work came from, what shaped it before it got here, what the team did with it, how much of the roadmap is controlled locally, and how much is handed down .
- If interrupts dominate, add structure. Use formal intake, prioritization forums, planning cadences, and economic trade-offs to manage noise .
- If demand is mostly self-shaped, lean into learning. Use continuous discovery, selective intake, and ongoing refinement instead of treating everything like pre-shaped delivery work .
- Watch WIP and organizational constraints. The notes argue there is never an excuse for too much WIP, but they also stress that even strong teams get overloaded when the wider organization drifts into chaos .
Why this works: The same practice can be sound in one context and harmful in another; there is no universal discovery or prioritization recipe .
3) Do discovery without a dedicated UX researcher
- Use frameworks as lenses. One PM cites Opportunity Solution Trees as a strong framing device and Marty Cagan’s Four Risks as a useful mental model .
- Improve the interview itself. The same source says The Mom Test changed how they run interviews, even if synthesis remained manual .
- Use AI coding tools to structure the workflow. The practical workflow they describe is: frame the hypothesis, generate interview questions, synthesize notes into patterns, and package findings for stakeholders .
- Treat skipped discovery as a real product risk. The source’s core claim is that skipping discovery still kills products more often than bad engineering or poor design .
Why this works: It gives solo PMs an operational path when they know discovery matters but do not have a researcher to run it for them .
4) If stakeholder demand is chaotic, make capacity negotiation explicit
- Bring stakeholders into the same room periodically. In one Scrum-heavy example, a product owner gathered stakeholders together instead of processing requests one by one .
- Put back-pressure on incoming work. The same team combined heavy requirements discovery with frequent delivery and explicit limits on what could fit .
- Run a visible auction on capacity. The PO’s mechanism was a rigorous auction for team capacity, which improved predictability and stakeholder trust .
- Do not confuse this with empowerment. A contrasting PM described an empowered company where teams still had to negotiate across dozens of competing priorities and AI-favored teams could command support from others .
Why this works: The efficient feature factory team went from being perpetually overwhelmed and seen as untrustworthy to being predictable and broadly trusted .
Case Studies & Lessons
1) Anthropic’s Cowork came from watching non-technical users hack around the product
Aakash Gupta highlights Boris’s idea of latent demand: users already wanted to query their own data, automate workflows, and prototype tools, but friction was blocking them . The signal was that non-engineers were willing to install a terminal tool meant for developers. After seeing that behavior, Anthropic built Cowork, a desktop product for non-technical users, in 10 days .
- Why it matters: User hacks can be a stronger demand signal than roadmap requests .
- Apply it: Watch for behaviors that look too hard for the target user. If they are doing it anyway, the next product move may be to remove friction, not add more explanation .
2) CASH shows what agentic growth work looks like when scoped narrowly
Anthropic’s Claude team built CASH—Claude Accelerates Sustainable Hypergrowth—to work across the lifecycle of growth experimentation: identifying opportunities, building the feature, running the test, and analyzing results . Today it is focused on copy changes and minor UI tweaks, and Lenny Rachitsky says its win rate is already comparable to a junior PM and improving rapidly .
- Why it matters: This is a concrete example of agentic PM work being applied to a bounded, high-volume problem rather than an undefined autonomous-PM promise .
- Apply it: Start where experiments are frequent and measurable, then compare the agent’s output against a human baseline, as this team is doing .
3) The same good process can look very different depending on context
One product owner in a Scrum organization built an efficient feature factory with extensive requirements discovery, back-pressure on requests, frequent delivery, forecasting within confidence ranges, and periodic auctions on capacity. The reported outcome: the team moved from being perpetually overwhelmed and distrusted to predictable and liked by stakeholders . In contrast, a PM from a supposedly empowered company described constant negotiation across competing priorities, AI-favored teams commandeering resources, and a belief that the organization needed top-down prioritization and 50% less work .
"Maybe we need to re-org, but probably right now we need to be doing like 50% less..."
- Why it matters: Process labels tell you less than the underlying demand mix and constraints .
- Apply it: Judge an operating model by what it helps the team manage and deliver under its actual conditions, not by whether it matches a preferred product doctrine .
Career Corner
1) Interview prep for AI PM roles now needs receipts
"They asked me what the F1 score was. I said I’d have to check. Interview was over in their minds."
Across Gupta’s notes, the strongest signal is not abstract AI literacy. It is the ability to discuss a real AI system you built or drove: the architecture, evaluation metrics, production behavior, trade-offs, and business impact . Candidates may also face prototyping rounds, AI-specific behavioral questions, and safety testing throughout the loop .
- Why it matters: High-paying AI PM roles are drawing intense competition, and the screening bar is shifting accordingly .
- How to apply: Build your stories around one shipped AI system, one prototype you can recreate quickly, one quantified prioritization example, and safety woven into each answer .
2) Lean startup PM roles can compress learning and burnout into the same job
One fintech PM with a non-traditional background describes growing into a de facto product lead role in 2-3 years, managing a small overseas engineering team with high autonomy, and using AI heavily because there was no one to train them . The same account also describes burnout from wearing too many hats, weak product traction, and anxiety about long-term career trajectory in a rough market .
- Why it matters: More scope can be career acceleration, but it can also hide weak support, weak traction, or unsustainable workload .
- How to apply: Evaluate roles on actual ownership, founder attention, product traction, and workload—not just title or autonomy—and use AI to shorten self-training when mentorship is thin .
3) Hands-on AI use is becoming part of PM skill development
The sources converge on a practical pattern: PMs are using AI tools to structure discovery work , shipping prototypes without filing tickets , and being tested directly on prototype-building in interviews . That makes hands-on repetition more valuable than abstract familiarity alone .
- Why it matters: AI fluency is moving closer to day-to-day PM execution, not just strategy language .
- How to apply: Use AI on one real workflow you own—discovery, prototyping, or metrics review—and build enough reps that it becomes part of your actual practice, not just interview language .
Tools & Resources
1) Vibe-coding tools are now worth practicing even for non-engineer PMs
Tools named across the notes include Cursor, Bolt, Lovable, and Replit in short prototyping rounds . If you want a grounded starting point, Gupta links a vibe coding interview guide.
- Why explore it: Familiarity with these tools is now showing up in hiring loops, not just side projects .
- Use it for: Practicing a simple 45-minute prototype build .
2) A four-pillar AI metrics scorecard
Jain’s framework gives PMs a compact way to organize AI metrics across model quality, user engagement, business impact, and operational reliability. He also recommends measuring end to end, not stopping at model benchmarks .
- Why explore it: It is a practical antidote to the accuracy trap .
- Use it for: Building a weekly AI product review with a small number of metrics, explicit feedback, implicit feedback, and a standing improvement loop .
3) A lightweight solo-discovery stack
Three frameworks surface in the Reddit note: Opportunity Solution Trees, The Mom Test, and Marty Cagan’s Four Risks. The same PM says AI coding tools helped operationalize the work by framing hypotheses, generating interview questions, synthesizing notes, and packaging findings .
- Why explore it: It is a practical stack for PMs who know they need discovery but do not have dedicated research support .
- Use it for: Turning ad hoc customer conversations into a repeatable discovery workflow .
4) A live example of agentic experimentation
Lenny Rachitsky points to a full conversation on Anthropic’s CASH system . The x thread says the agent spans opportunity identification, build, test execution, and analysis, with current scope limited to copy changes and minor UI tweaks .
- Why explore it: It is a concrete, bounded example of AI taking on pieces of the growth loop .
- Use it for: Studying where agentic workflows are already credible today: repetitive, measurable experiments with clear win/loss signals .
Rowan Cheung
Marc Andreessen 🇺🇸
The market for coding agents got more explicit
OpenAI introduced a new $100/month Pro tier for heavier Codex use
OpenAI launched a new $100/month ChatGPT Pro tier that offers 5x more Codex usage than Plus and is designed for longer, high-effort coding sessions . The plan includes all Pro features, including the exclusive Pro model and unlimited access to Instant and Thinking models, and OpenAI is temporarily boosting Codex access to up to 10x Plus through May 31 . Sam Altman said the move follows strong interest in Codex .
Why it matters: OpenAI is now pricing around sustained agentic coding demand, while keeping its existing $200 Pro tier as the highest-usage option .
The strongest gains are still clustering in technical workflows
Andrej Karpathy said many people still judge AI by free or older chat products that fumble simple tasks, while users of frontier agentic systems such as OpenAI Codex and Claude Code are seeing much stronger progress in programming, math, and research . He tied that gap to domains with verifiable rewards and high B2B value . Hex's data-agent team described a similar asymmetry from the product side: coding is increasingly easy to verify, while analytical work still involves many hard-to-validate decisions, which is why their agents rely on long-running workflows and custom context handling . Gary Marcus argued that these advances remain concentrated in particular areas and should not be mistaken for AGI being "in striking distance" .
Why it matters: The frontier is moving quickly, but unevenly: the biggest gains are appearing first where feedback is crisp and measurable .
Research pushed further beyond chat
Meta's TRIBE v2 models how the brain responds to media
Meta released TRIBE v2, a foundation model trained on more than 1,000 hours of brain imaging data from 720 people . Given video, audio, or text, it predicts which brain regions activate, how strongly, and in what order; on unseen subjects, its predictions were reported as more accurate than most real scans . Researchers also used it to recreate classic neuroscience experiments in software and identify face-recognition, language, and emotional-processing regions on its own .
Why it matters: It is a notable signal that leading labs are still investing in scientific foundation models, not only assistant and coding products .
OpenAI highlighted healthcare benchmarks and AI-assisted treatment analysis
OpenAI said its team has created public benchmarks for evaluating models in healthcare, deployed clinical copilots in primary care settings, and is working to democratize medical expertise in ChatGPT Health . In a featured osteosarcoma case, GPT-4o was used on bulk RNA-seq data to flag targets such as B7H3, and a custom agent system performed literature review and bioinformatics analysis across 600,000 single cells . The presentation linked that work to a personalized mRNA vaccine, TCR-T, and CAR-T efforts .
Why it matters: The example shows model providers trying to move from health chatbots toward tool-assisted clinical and research workflows .
The science-first argument got louder
Demis Hassabis said the commercial race crowded out slower scientific work
In a recent interview, Demis Hassabis said he would have preferred to keep AI in the lab longer and focus on more AlphaFold-like advances rather than getting pulled into a "ferocious commercial pressure race" after ChatGPT . He also warned that the next two to four years of the "agentic era" will make alignment and guardrail failures a much harder technical challenge, and called for cooperation across labs, safety institutes, and academia .
"If I'd had my way, I would have left AI in the lab for longer. Done more things like AlphaFold. Maybe cured cancer or something like that."
Why it matters: One of the most influential lab leaders is publicly arguing for a more science-oriented path even as he warns that more autonomous systems are arriving quickly .
U.S. officials are also framing AI as research infrastructure
The Department of Energy's Genesis Mission was presented as an AI-driven platform for accelerating scientific discovery by combining AI, supercomputing, and quantum technologies, alongside public-private partnerships and interagency coordination . Under Secretary Dario Gil also emphasized research security, allied collaboration, and a broader effort to revitalize the U.S. science and technology enterprise .
Why it matters: The policy conversation is not only about consumer products and risk; it is also moving toward national research capacity and scientific infrastructure .
Enterprise adoption still looks messy
A new survey found most employees are bypassing formal AI rollouts
A survey of 3,750 executives and employees found that 54% of workers bypassed their company's AI tools in the past 30 days and another 33% had not used AI at all, despite average deployments of $54 million this year . Only 9% of workers said they trust AI for complex business-critical decisions, versus 61% of executives, and workers were reported to lose the equivalent of 51 working days per year to technology friction . Marc Andreessen, by contrast, argued that adoption is still happening bottom-up inside companies, with workers and managers often using AI whether or not leaders see it .
Why it matters: For enterprises, the bottleneck now looks less like access and more like trust, training, and workflow fit .
Airbtc
Nick Darlington
Arnhem Bitcoinstad
Major Adoption News
Online / Global — Airbtc is positioning accommodation bookings as Bitcoin-native commerce
Airbtc described itself as a Bitcoin-only stay marketplace where every stay is priced in sats, settled on Bitcoin rails, and paid out to hosts in pure Bitcoin. It also published the marketplace URL: http://airbtc.online
Business impact: This is more than a merchant adding a Bitcoin checkout option. The listing price, settlement rail, and host payout are all described as Bitcoin-based, making payments central to the marketplace model.
Kenya — Bitika adds Bitcoin-powered ticketing for a Nairobi event
Bitika said it is the official ticketing partner for Adopting Bitcoin NBO in Nairobi and described the service as "Fast. Seamless. Bitcoin-powered." Tickets were directed to http://ke26.adoptingbitcoin.org
Business impact: This extends Bitcoin payments into event commerce and digital ticketing, rather than only in-person retail.
South Africa — six circular economies are being presented as a connected payments landscape
Bitcoin Ekasi promoted a 10-day, 1,500 km South Africa expedition spanning six Bitcoin circular economies: BitcoinWitsand, BitcoinKaroo, BitcoinLoxion, BitcoinPlett, BTCSedgefield, and BitcoinEkasi. The trip was organized by UnravelSurf and framed as taking visitors to places where Bitcoin adoption is already happening.
Business impact: The notable signal is the claimed density of operational local ecosystems. The sources present Bitcoin payments in South Africa as geographically distributed enough to be visited as a network, not just as isolated single-merchant anecdotes.
Netherlands — Arnhem foodhall merchant adds Bitcoin
Pasta Basta at Foodhall Arnhem now accepts Bitcoin.
Business impact: It is a small-scale addition, but it keeps everyday food spending in view as a Bitcoin use case in Europe.
Payment Infrastructure
Global — BTCPay Server is reducing navigation friction in merchant software
BTCPay Server previewed a global search/launch bar designed to remove "5 level of nested menu" navigation and enable keyboard browsing. A reply called it an "Amazing idea," and Nicolas Dorier said the feature was "vibe coded" by pavlenex and then refined.
Significance: The update targets usability in payment operations software, with a clear focus on faster navigation for users managing BTCPay Server.
Africa — Lightning aliases plus BTC Map remain the practical merchant stack
Current merchant posts repeatedly paired a Lightning endpoint with a public map listing. Examples include Rachael via rachael@8333.mobi with BTC Map, the Bitcoin Chama farm via mercyline@8333.mobi with BTC Map, the chicken coop project via Nyarandi@8333.mobi with BTC Map, Viwa Accessories via victormuraya@blink.sv with BTC Map, Chips pot via a Blink endpoint with BTC Map, and Siki's Koffeekafe via siki@blink.sv with BTC Map.
Significance: The recurring pattern is operationally important: a payment address for checkout and a public directory entry for discovery. That combination makes merchant acceptance easier to find and use.
Kenya — Machankura appears in live farm-purchase activity
Bitcoin Chama showed payment for kales from its farm using a Machankura wallet.
Significance: This is direct evidence of a specific wallet being used for a real merchant payment in a rural setting.
Regulatory Landscape
Africa
No payment-specific legal or regulatory changes were cited in the current notes for Kenya, South Africa, or other African markets represented in this batch.
Europe and Global / Online
No legal or policy changes affecting Bitcoin payments were cited for the Netherlands, online marketplaces, or cross-border payment services in the current notes.
Usage Metrics
The current sources remain light on hard data. No transaction totals, settlement volumes, or merchant revenue figures were disclosed.
South Africa
- Bitcoin Ekasi's travel promotion named 6 circular economies across 10 days and 1,500 km. This is the clearest explicit scale indicator in the current batch.
- At merchant level, Siki's Koffeekafe in Green Point, Cape Town was shown in an actual coffee purchase and separately tied to a BTC Map listing.
Kenya
- The strongest usage signal is category breadth in routine spending: Unga wa Ugali at Grandsmatt, kales bought with Machankura, discounted eggs at the chicken coop project, a purchase at Chips pot, and electronics accessories offered by Viwa Accessories.
- Several of these posts included either a BTC Map entry or a Lightning endpoint, indicating live merchant readiness rather than generic advocacy.
Europe and Online
- The current notes add a new food merchant in Arnhem and describe a Bitcoin-only accommodation marketplace, but no booking or payment counts were provided.
Emerging Markets
Kenya — rural and low-ticket spending remains the clearest adoption channel
Bitcoin Chama and BitBiashara posts continue to place Bitcoin in small, frequent purchase categories: kales from a farm, eggs from a chicken coop, Unga wa Ugali, chips, and accessories such as chargers and earphones. Some posts explicitly framed this as "Bitcoin as everyday money" or "Bitcoin in action."
Why it matters: The current evidence ties Bitcoin to day-to-day commerce categories rather than one-off showcase purchases.
South Africa — mobile and neighborhood merchants continue to anchor the local spend story
Nick Darlington reported buying coffee with Bitcoin at Siki's mobile coffee shop in Green Point, Cape Town. Separately, BitcoinLoxion highlighted the same merchant with a Blink address and BTC Map listing, while thanking the MoneyBadgerPay team for visiting.
Why it matters: The same merchant appears both as a live consumer payment and as a mapped Lightning endpoint, strengthening the case that the acceptance is operational.
Adoption Outlook
"Bitcoin is not a replacement. It is a choice."
The current batch shows two parallel tracks in Bitcoin payments. One is Bitcoin-native service design, with Airbtc structuring accommodation pricing, settlement, and host payouts around Bitcoin, and Bitika adding Bitcoin-powered ticketing in Nairobi. The other is grassroots retail acceptance, especially in Kenya and South Africa, where merchants keep publishing Lightning aliases, BTC Map entries, and proof-of-purchase examples.
What remains missing is formal regulatory movement and hard volume data. For now, the strongest evidence of payment viability is operational: users can identify merchants publicly, reach them through repeatable payment endpoints, and see Bitcoin used in routine categories such as coffee, farm produce, groceries, prepared food, accessories, accommodation, and event tickets.
Choose the setup that fits how you work
Free
Follow public agents at no cost.
No monthly fee
Discover agents
Subscribe to public agents from the community or create your own—private for yourself or public to share.
Coding Agents Alpha Tracker
Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.
AI in EdTech Weekly
Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.
Bitcoin Payment Adoption Tracker
Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics
AI News Digest
Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves
Recommended Reading from Tech Founders
Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media
PM Daily Digest
Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications