Hours of research in one daily brief–on your terms.

Tell us what you need to stay on top of. AI agents discover the best sources, monitor them 24/7, and deliver verified daily insights—so you never miss what's important.

Setup your daily brief agent
Discovering relevant sources...
Syncing sources 0/180...
Extracting information
Generating brief

Recent briefs

Verification Loops Tighten Up as Claude/OpenClaw Friction Surfaces
Apr 6
5 min read
85 docs
ovoDRIZZYxo
Emanuele Di Pietro
Theo - t3.gg
+7
Verification-first agent design was the real signal today: self-QA loops, trace-driven harness learning, and real software-verification budgets. Also inside: OpenClaw's GPT 5.4 dev update, Claude/OpenClaw friction, task-based model routing, and the AI-assisted build lessons behind syntaqlite.

🔥 TOP SIGNAL

Today's clearest alpha: serious coding-agent setups are moving from one-shot generation to verification loops. Peter Steinberger's new OpenClaw self-QA workflow has an orchestrator assign a task, verify the result, and spawn a repair subagent on failure; LangChain describes the same general move as harness improvement from traces, and Andrew Yates says Dropbox has been running a "Ralph loop" Dark Factory since October while Geoffrey Huntley says companies are now spending engineer-salary-level budgets to automate software verification .

🛠️ TOOLS & MODELS

  • OpenClaw — GPT 5.4 dev-channel upgrade. steipete says the claw harness now has GPT 5.4 upgrades; test with openclaw update --channel dev. Early user feedback moved from near-frustration to "way better" / "GOD MODE"
  • Claude Max / Claude Code — harness gating is now concrete. In testing, adding the exact system-prompt string A personal assistant running inside OpenClaw. triggered a 400 saying third-party apps draw from extra usage, not plan limits. Simon Willison says exact-string prompt filtering is a step too far; separately, Theo says Claude Code now refuses the system-fix tasks he mainly kept his subscription for, while Codex still does the work
  • T3Code fork — task-specific handoff. Emanuele DPT's experimental open-source feature routes UI-heavy threads to Claude and logic-heavy threads to Codex. Push to main is planned, and Theo says these increasingly elaborate forks are exactly the mindset he wants encouraged in T3Code itself
  • Salesforce's model mix — real scale, bounded claims. Marc Benioff says Salesforce's 15,000 engineers use coding models from Anthropic, OpenAI Codex, Cursor, and others, plus agents that engineers supervise. His productivity number is more than 30%, not 100%, because models are still not autonomous

💡 WORKFLOWS & TRICKS

  • Self-QA your harness

    1. Add a synthetic message channel to your own agent.
    2. Let an orchestrator define a concrete task.
    3. Verify the result automatically.
    4. If verification fails, spin up a subagent to analyze and fix.
    5. steipete says he built this OpenClaw loop in about six hours and found it better than old-school end-to-end tests
  • Route by task type, not brand loyalty

    • Send UI-heavy work to Claude.
    • Send logic-heavy work to Codex.
    • Keep the handoff explicit so the thread can continue in the model that fits the task
  • Use AI where answers are checkable; keep architecture human-owned

    1. Use AI to crush tedious implementation work — Lalit Maganti used Claude Code to get past 400+ SQLite grammar rules and into concrete prototypes fast
    2. Be skeptical when the task has no objectively checkable answer — Maganti says AI led him into dead ends and encouraged deferring key design decisions
    3. If the prototype proves the idea but the architecture is muddy, throw it away and rebuild with more human-in-the-loop design decisions
  • Let traces improve the system at multiple layers

    1. Run the agent on real tasks and evaluate outcomes.
    2. Store traces.
    3. Use a coding agent to propose harness code changes from those traces.
    4. Update context separately via persistent memory — agent-level files like SOUL.md, tenant-level memory, offline "dreaming," or hot-path updates
  • Plan for supervised agents, not full autonomy

    • Salesforce's benchmark is the right planning assumption for now: engineers supervise coding agents, and even at 15,000-engineer scale the gain Benioff reports is more than 30%, not 100%

👤 PEOPLE TO WATCH

  • Peter Steinberger — shipping OpenClaw internals in public: self-QA loops, dev-channel GPT 5.4 harness changes, and concrete "make GPT better" tweaks rooted in prior Codex work
  • Lalit Maganti — one of the best firsthand build logs in the batch: fast AI-assisted parser implementation, then a disciplined reset once architecture quality slipped. Start with syntaqlite and the full post
  • Simon Willison — worth following because he tests vendor behavior directly. Today he highlighted the exact-string OpenClaw trigger and argued prompt-based billing filters go too far
  • Theo + Emanuele DPT — useful signal on model routing in the wild: an open-source T3Code fork that hands UI work to Claude and logic to Codex, with Theo explicitly wanting that extension mindset inside the main tool

🎬 WATCH & LISTEN

  • 10:25-11:40 — Marc Benioff on the real ceiling of coding agents today. Best calibration clip in the pack: Salesforce says engineers across a 15,000-person org are using coding models and agents, but the human role becomes supervisory rather than disappearing. The number to keep in your head is more than 30% productivity, not autonomy

📊 PROJECTS & REPOS

  • syntaqlite — high-fidelity SQLite parser, formatter, and verifier. The build story is the signal: eight years of wanting, then three months with Claude Code to get it built
  • Deep Agents — LangChain's open-source, model-agnostic base harness. They say traces plus LangSmith CLI and Skills were used to improve it on terminal bench, and it supports user-scoped memory plus background consolidation
  • T3Code Claude/Codex handoff fork — experimental open-source feature, push to main planned. The practical signal is the routing rule itself: different models for UI vs. logic work
  • OpenClaw dev channel — not a new repo, but a live harness update worth testing if you use it: GPT 5.4 upgrades are available via openclaw update --channel dev

Editorial take: the edge is shifting out of raw model IQ and into the wrapper — verification loops, trace-driven harness updates, and blunt task routing between models .

Autonomous Research Advances as Anthropic Pushes into Biotech and Rethinks Agent Access
Apr 6
9 min read
427 docs
vLLM
LightSeek Foundation
Boris Cherny
+36
Autonomous research systems, Anthropic’s biotech acquisition, and tighter controls on agent compute dominated this cycle. The brief also covers Gemma 4’s spread into local developer workflows, new long-context research, evolving agent infrastructure, and policy moves in China and Maine.

Top Stories

Why it matters: This cycle’s biggest signals were about autonomous research, vertical expansion into biotech, the economics of agent usage, wider local-model distribution, and early labor-market measurements.

ASI-Evolve claims end-to-end autonomous AI research

Shanghai Jiao Tong University researchers released ASI-Evolve, an open-sourced system described as running the full AI research loop itself: reading papers, forming hypotheses, designing and running experiments, analyzing results, and iterating without human intervention . In neural architecture search, it ran 1,773 rounds, generated 1,350 candidates, and produced 105 models that beat the best human-designed baseline; the top model exceeded DeltaNet by +0.97 points. The same framework reportedly improved data curation by +3.96 average benchmark points and +18 on MMLU, and produced RL algorithms that beat GRPO by up to +12.5 on competition math .

"This is the first system to demonstrate AI-driven discovery across all three foundational components of AI development in a single framework."

A biomedicine test also showed +6.94 points in drug-target prediction on unseen drugs . One critic argued the work is not the first effort of its kind and said frontier labs still rely on data intuitions that may not be offloaded to a scaffold .

Impact: The paper presents this as a single framework improving architecture, data, and algorithms rather than optimizing only one part of the stack .

Anthropic acquires Coefficient Bio for biotech workflows

Anthropic acquired Coefficient Bio for about $400M. The sub-10-person startup builds AI to plan drug R&D, manage clinical regulatory strategy, and identify new drug opportunities . The team joins Anthropic’s healthcare and life sciences group, which already works with Sanofi, Novo Nordisk, AbbVie, and others .

"I’m talking about using AI to perform, direct, and improve upon nearly everything biologists do."

Posts around the deal frame it as execution on Dario Amodei’s "virtual biologist" idea, with Coefficient Bio covering drug discovery, clinical trials, and regulatory submissions end to end .

Impact: The deal pushes Anthropic further from being only a general-model vendor and deeper into healthcare-specific workflows .

Claude access rules now reflect harness economics

A notice said Claude subscriptions will no longer cover usage on third-party tools like OpenClaw, though users can still buy extra usage bundles or use a Claude API key . A later analysis argued some harnesses send repeated low-value requests with long contexts—often over 100K tokens—making costs tens of times higher than a subscription price . Another post framed Anthropic’s position as allowing products that complement Claude Code but not direct competitors , a characterization one critic rejected .

Impact: The dispute is no longer just about model quality. It is about who gets subsidized compute, who has to pay API rates, and how efficiently agent frameworks use context and caching .

Gemma 4 is turning into a distribution story

Gemma 4 is now integrated into Android Studio Agent mode for local feature development, refactoring, and bug fixing . Separate posts highlighted 1,500 free daily requests to Gemma 4 31B in Google AI Studio , and one user described Gemma running locally on a Pixel phone with no connectivity . Gemma 4 was also cited as the #1 trending model on Hugging Face.

Impact: Gemma 4 is showing up across IDEs, hosted inference, and offline edge use, which is a stronger adoption signal than benchmarks alone .

Goldman Sachs sees a net labor-market drag from AI substitution

Goldman Sachs estimated that, over the past year, AI substitution reduced monthly payroll growth by roughly 25,000 and raised unemployment by 0.16 percentage points, while augmentation added about 9,000 jobs and lowered unemployment by 0.06 points. Netting the two implies a 16,000 monthly drag on payroll growth and a 0.1 point boost to unemployment, with the negative effects concentrated among less experienced workers .

Impact: The note argues that today’s net labor effect is already negative and is falling disproportionately on entry-level workers .

Research & Innovation

Why it matters: The strongest technical work this cycle focused on cheaper long-context inference, better credit assignment for reasoning, and more formal ways to generate theory with LLMs.

Long-context methods keep chipping away at attention cost

  • HISA replaces a flat sparse-attention token scan with a two-stage block-then-token pipeline, eliminating the indexing bottleneck at 64K context without extra training .
  • Screening Is Enough / Multiscreen replaces softmax-style global competition with threshold-based screening, matching Transformer-like validation loss with 40% fewer parameters and reducing inference latency by up to 3.2x at 100K context .
  • Commentary around this work framed sparse attention as a form of maximum inner product search, while noting that approaches with better theoretical complexity still have to work on GPUs and at datacenter scale to matter in practice .

New training methods target deeper reasoning and smaller working contexts

  • FIPO uses discounted future-KL signals in policy updates, pushing average chain-of-thought length past 10,000 tokens and reaching 56.0% AIME 2024 Pass@1 on Qwen2.5-32B.
  • SKILL0 tries to internalize agentic skills into model weights instead of retrieving them at runtime, reporting gains of over 9% on ALFWorld and 6% on Search-QA while cutting context usage to under 0.5K tokens per step.
  • Principia introduces benchmarks and training recipes for deriving mathematical objects, with gains from on-policy judge training and verifiers that also transfer to standard numerical and multiple-choice math benchmarks .

LLMs are starting to participate in theoretical science workflows

steepest-descent-lean formalizes convergence bounds and hyperparameter scaling laws in Lean using Codex . The work reproduces prior-style results under weaker assumptions, including support for Nesterov momentum and decoupled weight decay, and recovers a fixed-token-budget scaling law of BS ≍ T²⁄³. Its stated workflow is simple: formalize a peer-reviewed proof, ask an LLM to weaken assumptions and re-derive theorems, then keep only the changes that preserve or better match empirical results . The repo is here: steepest-descent-lean.

Products & Launches

Why it matters: Useful product progress this cycle was less about one big model launch and more about better infrastructure around coding agents, memory, routing, and interface layers.

GitNexus adds a code graph for agent workflows

GitNexus indexes a codebase into a graph using Tree-sitter, mapping calls, imports, inheritance, execution flows, and blast radius before code changes . The pitch is that agents get the repo’s dependency structure precomputed at index time, so smaller models can answer architecture questions without repeated exploration . Setup is a single command: npx gitnexus analyze. The project was cited as already reaching 9.4K GitHub stars and 1.2K forks.

New building blocks are landing for agent memory and control planes

  • Memvid offers a single-file memory layer for agents with instant retrieval and portable, versioned long-term memory without a database .
  • Plano is an open-source AI-native proxy and data plane for agentic apps, with built-in orchestration, safety, observability, and smart LLM routing .

Hermes Agent expands its interfaces

Hermes Agent added support for OAuth-authenticated MCP servers, can expose an OpenAI-compatible endpoint for use with OpenWebUI as a chat interface , and now ships a Manim skill for generating programmatic math and technical animations via /manim-video . One demo combined the Manim skill with Math Code to produce an explanatory video for Jordan’s Lemma.

DESIGN.md turns visual style into plain text for coding agents

The awesome-design-md repo packages design-system descriptions from 31 real websites into markdown files that agents can read directly, covering colors, typography, spacing, buttons, shadows, and responsive rules . The project was presented as a way to avoid repetitive default AI UI aesthetics, and it has now been integrated with Hermes Agent .

Industry Moves

Why it matters: The business story this cycle was about sustainable economics, talent concentration, hardware experimentation, and the changing labor and data supply chains behind AI systems.

New pricing models are emerging for agent-heavy workloads

Alongside Anthropic’s tighter subscription rules, MiMo launched a Token Plan that supports third-party harnesses through token quotas and frames the model as long-term, stable delivery rather than open-ended subscription usage . The surrounding commentary argued the market is moving toward a combination of more token-efficient agent harnesses and more efficient models, not simply cheaper tokens .

PrimeIntellect added open-source training talent

Open-source researcher Elie Bakouch said he is joining PrimeIntellect to work on pre/mid training, citing the team’s open-frontier mission and the leverage of a small focused group . Peers called the hire an "unbelievable get" and a "phenomenal choice".

Neuromorphic computing patent activity is accelerating

A PatSnap-cited note said neuromorphic computing has moved from academic prototype to commercial product, with 596 patents filed through early 2026 and a 401% surge in activity during 2025 .

A once-important data-labeling channel in China is weakening

Data-labelling workshops in rural Guizhou that were once part of China’s poverty alleviation effort and helped build AI systems are now struggling as state support and industry demand have fallen . One commentator suggested similar programs could still be repurposed toward graduate unemployment, but that was presented as an open question rather than a current policy .

Policy & Regulation

Why it matters: The policy signal this cycle was less about sweeping AI laws and more about concrete requirements on how companies govern AI internally and how jurisdictions handle AI infrastructure growth.

Beijing now requires AI ethics committees

New Beijing rules require all Chinese companies engaging in AI activities to establish internal AI ethics committees, effective immediately . The final version removed earlier wording that made such committees conditional on circumstances , and the move follows a 2023 ethics review system that had been criticized as too narrow and too perfunctory for AI-specific issues . One commentator said the plain reading could be especially hard on smaller startups and questioned how it will be enforced .

Maine is moving to pause large data-center projects

Maine is on track to become the first U.S. state to pause construction of large data centers—projects over 20 megawatts—until November 2027 while it studies environmental and energy impacts . Commentary around the move acknowledged concerns about rising electricity costs while arguing that infrastructure limits should not become a blanket brake on AI development . The linked report is from the Wall Street Journal: Maine data center ban.

Quick Takes

Why it matters: These smaller items are useful signals for where local deployment, inference optimization, tooling behavior, and public understanding are moving next.*

  • A local grounded reasoning demo paired Gemma 4 with Falcon Perception, using Gemma to decide what to inspect and Falcon to return pixel-accurate coordinates; one example checked whether a soccer player was offside, fully local on M3 hardware .
  • TorchSpec said its kimi-k2.5-eagle3 draft model hit 40K downloads on Hugging Face in two weeks, and vLLM said it adopted the open-source EAGLE3 draft model for low-latency inference on Kimi 2.5.
  • A weekend project showed a fully local coding agent using Qwen3.5 30B A3B, llama.cpp/lemonade, ngrok, and OpenHands; the builder said performance was better than expected .
  • Claude Code now reportedly throws an error when asked to analyze its own source code . Separately, one user said the tool now refuses some system-fixing workflows they previously relied on, while Codex still accepts them .
  • François Chollet highlighted a tutorial for fine-tuning Gemma on TPU v5 using Kinetic + Keras + JAX, with a quick-start repo here: kinetic-finetuning-on-cloud-tpu.
  • Ryan Greenblatt argued that statements like "AGI is here" or "we’re far from AGI" are not meaningful unless the speaker defines the term being used .
  • The documentary The AI Doc is now showing in hundreds of theaters, and one commentator said non-technical viewers valued its plain explanation of how LLMs work .
Codegen Catch-Up Leads a Day of AI and Decision-Making Picks
Apr 6
3 min read
156 docs
نادي ريادة الأعمال
Amjad Masad
Bill Gates
+3
Today's recommendations split between frontier AI literacy and stronger judgment. Garry Tan's codegen interview pick leads the list, followed by Bill Gates' Steve Wolfram book recommendation, three books from Anthropic's Amol Avasarala, and one founder-endorsed article on entrepreneurial mindset.

Today's signal

The clearest pattern today is a split between AI literacy and better judgment. Garry Tan and Bill Gates point readers toward resources that help explain current model capability and uncertainty, while Amol Avasarala and Amjad Masad recommend reading that sharpens mindset, reflection, and decision-making.

Most compelling recommendation

Newest @steipete / @lexfridman interview

  • Content type: Interview
  • Author/creator: @steipete and @lexfridman
  • Link/URL: Not provided in the source material
  • Who recommended it: Garry Tan
  • Key takeaway: Tan says it is "probably the best way to catch up to what is really going on" and calls it "a perfect encapsulation of the extreme edge of what is possible now in codegen."
  • Why it matters: This is the strongest recommendation in today's set because it combines a clear use case—getting up to speed quickly—with a concrete thesis about where the frontier is headed: "just in time smart personal software is here, just not evenly distributed."

"Probably the best way to catch up to what is really going on is listen to the newest @steipete @lexfridman interview"

AI mechanics and limits

How ChatGPT Works

  • Content type: Book
  • Author/creator: Steve Wolfram
  • Link/URL: Not provided in the source material
  • Who recommended it: Bill Gates
  • Key takeaway: Gates recommends it as a useful resource on language model mechanics, while stressing Wolfram's point that the way these systems represent knowledge is still not fully understood.
  • Why it matters: The recommendation does more than point to an explainer; it also frames the current limit of expert understanding.

"the way that it's actually representing knowledge we don't fully understand."

Books on mindset and decision quality

Amol Avasarala's recommendations form a coherent mini-list: two books on reframing internal experience, and one on making product decisions more explicitly probabilistic.

The Joy of Living

  • Content type: Book
  • Author/creator: Yongi Mingya Ringpoche
  • Link/URL: Not provided in the source material
  • Who recommended it: Amol Avasarala
  • Key takeaway: It helps readers think about life experience differently and offers tactics for changing how they think about things.
  • Why it matters: Avasarala says he has recommended it repeatedly and that people have "really, really enjoyed it."

Awareness

  • Content type: Book
  • Author/creator: Anthony DeMello
  • Link/URL: Not provided in the source material
  • Who recommended it: Amol Avasarala
  • Key takeaway: He describes it as offering similar value from a different angle.
  • Why it matters: He groups it with The Joy of Living as one of the books he consistently recommends.

Thinking in Bets

  • Content type: Book
  • Author/creator: Annie Duke
  • Link/URL: Not provided in the source material
  • Who recommended it: Amol Avasarala
  • Key takeaway: It is tactically useful in product because it pushes teams to convert vague timing claims into explicit probability estimates.
  • Why it matters: Avasarala gives a concrete operating example: asking for a percentage likelihood instead of accepting "I don't know, it'll get done in time."

Lower-context but still useful

Article on entrepreneurial mindset

  • Content type: Article
  • Author/creator: Not provided in the source material
  • Link/URL:http://x.com/i/article/2040473188568649728
  • Who recommended it: Amjad Masad
  • Key takeaway: Masad's endorsement is brief but explicit: he called it a "great article" on entrepreneurial mindset.
  • Why it matters: There is less context than the other picks, but this one is immediately accessible through a direct article link.
Activation, AI-Native UX, and the New PM Operating Model
Apr 6
12 min read
57 docs
Hiten Shah
The community for ventures designed to scale rapidly | Read our rules before posting ❤️
Aakash Gupta
+5
This brief covers the main PM shifts emerging from AI-native product work: activation is becoming the key growth lever, interface strategy is being rethought beyond simple chat-first assumptions, and AI-heavy teams may need more PM structure and capacity, not less. It also includes practical plays for onboarding, experimentation, leadership, and career progression.

Big Ideas

1) AI-native design is not the same as chat-first design

Andrew Chen’s test is simple: stop pitching products as “X but with AI,” and instead ask how the experience would be designed if AI existed from day one .

"the best products ask “if AI existed from day one, how would this experience be designed?”"

One thread says some SaaS products are already collapsing dense homepages into a single prompt field, which shifts defensibility from UI complexity to backend strengths like API surface, data model, and integrations . That same note points to Snowflake, Databricks, and Stripe as examples of companies that already treated the UI as a thin layer over a deeper engine .

A second thread adds an important constraint: many B2B SaaS users still do not want chat as the entry point. They want curated data and tools surfaced for them, and the builder’s job is curation .

Why it matters: PMs should rethink entry points for AI products, but not assume a prompt bar is always the answer.

How to apply:

  • Start with the first-value moment: should users see something useful immediately, or ask for it?
  • If you simplify the UI, audit what sits behind it: API quality, data model, and integrations become more important
  • Use AI-native design as the standard, but avoid lazy “existing product + AI” framing

2) In AI products, activation is the highest-leverage growth problem

In the Anthropic interview, activation is framed as critical because day-zero/day-one experience is often the highest-leverage input into long-term retention . The challenge is harder in AI because model capabilities improve so quickly that users often fail to discover what the product can actually do; the interview calls this “capability overhang” .

Anthropic’s response is to ask users who they are and what they care about, then use that information to recommend the right product or feature . The broader claim is that good friction can improve conversion when it personalizes the path to value . Lenny’s summary of the episode labels activation the single highest-leverage growth problem in AI .

Why it matters: In AI products, better models do not automatically create better user outcomes. Discovery of value is now a product problem.

How to apply:

  • Treat onboarding as routing, not just account creation
  • Ask a few high-signal questions early if they help match users to the right feature or workflow
  • Judge onboarding by downstream activation and retention, not just time-to-first-screen

3) AI-heavy product orgs may need more PM capacity, not less

Anthropic’s view is that engineers are currently getting the biggest leverage gains from AI tools like Claude Code, with engineering productivity described as roughly 2-3x higher . The consequence is that PMs and designers can end up managing the equivalent of a much larger engineering team, putting those functions under strain .

Their response is not to remove PMs. It is to hire more of them, while also hiring product-minded engineers who can act as mini-PMs on smaller projects .

Why it matters: The AI-era org question is not just “how many engineers can one PM support?” It is also “how much product direction and coordination is required when build capacity expands faster than planning capacity?”

How to apply:

  • Recalculate PM:engineering ratios if AI tools materially change engineering throughput
  • Hire for product-minded engineers who can own small, bounded workstreams
  • Move PM focus upward toward direction-setting and cross-functional alignment when execution speed increases

4) In exponential AI products, growth teams are biasing toward bigger swings

Anthropic says AI-first products should spend much more time on larger bets than a traditional growth team would, with roughly 50-70% of effort going to bigger swings instead of mostly small-to-medium optimizations . The reasoning is that if product value is expected to increase dramatically as model capabilities improve, the upside of finding the next major use case can outweigh many small wins . Small optimizations still matter and compound, but they are treated as secondary .

Why it matters: Prioritization rules change when the product’s value curve is changing quickly.

How to apply:

  • Keep a portfolio of small experiments, but reserve real capacity for larger product swings
  • Use this bias only where product value is truly AI-driven, not as a blanket rule for every software business
  • Revisit prioritization often as model capabilities change what is possible

Tactical Playbook

1) Design onboarding with good friction

Anthropic, Masterclass, Mercury, and Calm are all cited as cases where extra steps, quizzes, or broken-out screens improved conversion when they helped users understand why the product was for them .

How to do it:

  1. Ask a small number of questions that reveal user intent or identity
  2. Use those answers to recommend the right feature, content, or product path
  3. Split cognitively heavy forms into smaller steps when needed
  4. Remove friction that adds no value, but keep friction that improves relevance and comprehension
  5. Validate with conversion and funnel-completion data rather than intuition alone

Why it matters: Faster is not always better. More guided can outperform more minimal when users need help finding value .

2) Match process rigor to project size

Anthropic uses a clear execution rule: if a project is about two engineering weeks or less, the engineer can effectively act as the PM, with the PM advising as needed . Small changes may only need Slack messages and quick back-and-forth, while larger work gets a formal kickoff and, when useful, an AI-generated PRD built from prior examples .

How to do it:

  1. Define a size threshold for lightweight vs. heavy process
  2. For small work, rely on fast conversation and prototyping instead of default documentation
  3. For larger or riskier work, run a cross-functional kickoff with legal, safeguards, and other key stakeholders
  4. Use AI to draft PRDs from previous documents when documentation is needed
  5. Keep PMs accountable for larger bets and engineers accountable for bounded execution work

Why it matters: Faster building only helps if the process does not bottleneck small work or under-structure high-risk work.

3) Operationalize AI experimentation with a four-stage loop

Anthropic’s CASH initiative breaks experimentation into four parts: identifying opportunities, building, testing against quality and brand bars, and analyzing results after launch . The team scores model performance at each stage and started with narrow use cases like copy changes and minor UI tweaks . Human review is still in the loop, especially for brand-sensitive outputs .

How to do it:

  1. Separate the workflow into opportunity identification, build, test, and analysis
  2. Measure AI performance at each stage instead of treating “AI experimentation” as one block
  3. Start with high-volume, low-scope work such as copy or small UI changes
  4. Keep human approval where brand or stakeholder risk is high
  5. Track whether time spent is falling and results are improving week over week

Why it matters: AI is already useful in parts of the experimentation loop, but not equally across all parts.

4) Replace siloed growth reviews with one scorecard

A startup founder proposed a PLG Growth Scorecard because growth reviews were fragmented across Mixpanel, Stripe, HubSpot, and spreadsheets, leaving traffic, activation, and MRR disconnected from one another . The scorecard covers seven self-serve stages: Awareness, Acquisition, Activation, Conversion, Engagement, Retention, and Expansion .

How to do it:

  1. Map your funnel across all seven stages
  2. Assign each metric to a named owner across Marketing, Product, Sales, RevOps, or CS
  3. Add goal and trend tracking for every stage
  4. Choose a North Star metric; the example defaults to Activation Rate
  5. Use the full view to diagnose where the funnel actually leaks

Why it matters: PMs can make better trade-offs when they can see the full self-serve system, not just the product slice.

Case Studies & Lessons

1) Anthropic: hypergrowth creates “success disasters”

Lenny’s post says Anthropic grew from $1B to $19B ARR in a year and added $6B in ARR in February alone . In the interview, Amol Avasare says about 70% of his time goes to what Anthropic calls “success disasters”: urgent scaling problems across acquisition, activation, and monetization created by rapid growth . The remaining 30% goes to more proactive work such as product prioritization, pricing, and new-product funnels .

The team is roughly 40 people, organized with cross-cutting horizontals like growth platform and monetization, plus audience-focused pods for B2B, Claude Code, knowledge workers, and API users .

Key takeaway: At sufficient scale, growth stops looking like a clean experimentation backlog and starts looking like systems management. Org design has to support both firefighting and focused audience work .

2) Mercury: a quarter spent on quality produced a significant onboarding uplift

While at Mercury, Avasare says the team spent an entire quarter fixing onboarding quality for a complex regulated flow and explicitly set aside the usual growth-metric mindset for that period . The result was a significant uplift in onboarding start-to-completion . His broader lesson from that experience is that quality drives growth .

Key takeaway: When a critical flow is broken or confusing, quality work can outperform another quarter of metric chasing .

3) CASH: AI is already useful for narrow, high-volume growth work

Anthropic’s internal CASH effort is still early, but it is already producing results on small-scale experiments such as copy changes and minor UI tweaks . Avasare describes the current win rate as closer to a junior PM than a senior PM, while noting that progress has been rapid and human approval remains in place today .

Key takeaway: The near-term opportunity is not full PM automation. It is targeted automation of repetitive experiment loops where volume is high and risk is manageable .

Career Corner

1) In AI product work, PM advantage comes from tool fluency, adaptability, and interdisciplinary depth

Avasare’s career advice is to stay on top of the tools, understand what each new model release changes, and apply that learning to your own work . He also argues that PMs should lean into their strongest interdisciplinary edge, whether that is design, finance, sales, or something else, because mixed-skill operators become unusually valuable when roles blur . His warning is that 50-70% of old playbooks may no longer apply in AI-heavy environments .

How to apply:

  • Build a habit of testing new tools and releases directly
  • Double down on the cross-functional skill that makes you unusually useful
  • Assume some prior PM habits will need to be rewritten, not merely updated

2) Cold outreach still works when it is specific and tested

Avasare says he got his Anthropic role by cold emailing Mike Krieger, arguing the company needed a growth team . His tactics: use a tested subject line and message, reach out where others are not overwhelming the recipient, keep the pitch short, and follow up multiple times if it matters .

How to apply:

  • Lead with a crisp point of view on the company’s need
  • Keep the message short: who you are, why you fit, why you should talk
  • Follow up persistently when the opportunity matters

3) An adjacent operator role can be a bridge into PM

In one r/ProductManagement thread, a Data Analyst opportunity was described as owning tools, managing data pipelines, fixing bugs, shipping enhancements, and potentially building new capabilities over time . A commenter’s advice was to take that role, learn PM while building new capabilities, gradually evolve the work toward full PM scope, then negotiate the title change internally with manager support .

How to apply:

  • Favor adjacent roles with real ownership over tools or workflows
  • Start practicing PM as soon as you are shaping new capabilities
  • Use internal mobility and manager sponsorship to formalize the transition

4) When leadership feedback is vague, treat the executive like a user and force clarity

In another r/ProductManagement thread about VP-level expectations, commenters suggested treating the CEO or manager like a user: figure out what they say they want, then uncover the underlying need . The practical advice was to define the yardstick for success, run experiments to show course correction, and socialize a draft plan quickly . Some commenters also recommended getting external coaching from experienced leaders .

The same thread also raised a warning: vague expectations, unclear KPIs, and treating ambiguity as a failure rather than part of the role can indicate level mismatch or broader trouble .

How to apply:

  • Turn fuzzy feedback into explicit success metrics and review checkpoints
  • Socialize a draft plan early rather than waiting for perfect clarity
  • If expectations remain subjective and unstable, treat that as data about fit, not just performance

Tools & Resources

1) PLG Growth Scorecard

What it is: A unified dashboard across Awareness, Acquisition, Activation, Conversion, Engagement, Retention, and Expansion, with named owners, goals, trends, and a configurable North Star metric .

Why explore it: It replaces the common “five-dashboard scramble” where traffic, product, and revenue reviews do not line up .

Try it: Start with Activation Rate if you need one leading indicator, then add cross-stage leak detection .

2) The CASH experiment loop

What it is: Anthropic’s framework for AI-assisted growth experimentation: identify opportunities, build, test against brand/quality, and analyze outcomes .

Why explore it: It gives PMs a practical way to break AI experimentation into measurable stages instead of treating it as one black box .

Try it: Pilot it on copy changes or small UI tweaks, and keep a human approval step for brand-sensitive output .

3) A lightweight kickoff + AI-generated PRD pattern

What it is: A process pattern where small work happens in Slack and prototypes, while larger work gets a proper kickoff plus a lightweight AI-generated PRD built from prior documents .

Why explore it: It keeps teams from over-documenting small changes while still adding structure where risk is higher .

Try it: Define one size threshold in engineering weeks and one kickoff template for cross-functional work .

4) Loom AI for product demo cleanup

What it is: A tool recommendation from r/startups for automatically trimming pauses and generating transcripts and timestamps for product demos .

Why explore it: Demo editing can take longer than recording; this reduces cleanup time .

Try it: Use it for internal walkthroughs, stakeholder demos, and early customer-facing product tours .

5) Prototype before you pitch

What it is: Andrew Chen argues that we should hear fewer investor pitches based on a “drawing on a napkin” because, if you can draw it, you can often prompt it into existence now .

Why explore it: The bar for pre-product storytelling is moving toward something interactive or tangible .

Try it: Before a concept review or fundraising conversation, turn the sketch into a thin prototype first .

X Targets Agent Builders as Local AI and Model Auditing Advance
Apr 6
3 min read
154 docs
X Freeze
Machine Learning
clem 🤗
+1
Infrastructure was the clearest theme today: X repackaged its API around agent workflows, Gemma 4 gained more local deployment traction, and open-source work pushed both model serving and auditing forward.

Builder platforms and deployment

X is repositioning its API for AI agents

The X API was presented as a major update for AI agents and builders, with pay-per-use pricing, native XMCP Server + Xurl support for agents, official Python and TypeScript XDKs, and a free API Playground for simulated testing . X also said purchases can return up to 20% in xAI API credits and pointed developers to docs.x.com.

Why it matters: This is a real packaging change, not just a feature tweak: X is trying to make its real-time data and action surface easier to use as agent infrastructure .

Kreuzberg pushed document intelligence deeper into code workflows

Kreuzberg v4.7.0 introduced tree-sitter-based code intelligence for 248 formats, including AST-level extraction of functions, classes, imports, symbols, and docstrings, with scope-aware chunking for repo analysis, PR review, and codebase indexing . The release also reported markdown-quality gains across 23 formats, added a TOON wire format that cuts prompt tokens by 30-50%, and shipped integration as a document backend for OpenWebUI .

Why it matters: The project is explicitly positioning itself as infrastructure for agents, with better extraction quality and smaller prompt payloads aimed at making code and document analysis more reliable and cheaper .

Project: GitHub

Gemma 4 kept picking up local deployment paths

NVIDIA said it is accelerating Gemma 4 for local agentic AI across hardware "from RTX to Spark" , and Hugging Face CEO Clement Delangue said Gemma 4 had reached the top spot on Hugging Face . Separately, an open-source FPGA project reported roughly 450 tokens per second on an AMD Kria KV260 using a custom 36-core heterogeneous pipeline and a smaller distilled INT4/KAN runtime model, though the team said the quantized weights are still a work in progress .

Why it matters: The notable signal here is ecosystem traction: vendor support and community experimentation are creating more concrete local and edge paths around Gemma 4 .

Resources: NVIDIA blog · FPGA repo

Research and evaluation

A pure-Triton MoE kernel posted inference-time wins

A fused MoE dispatch kernel written in pure Triton reported faster forward-pass performance than Stanford's CUDA-optimized Megablocks on Mixtral-8x7B at inference batch sizes, with gains of 131% at 32 tokens and 124% at 128 tokens on A100 . The writeup attributes this to fused gate+up projection that removes about 470MB of intermediates and cuts memory traffic by 35%, plus block-scheduled grouped GEMM that handles variable-sized expert batches without padding; tests also passed on AMD MI300X without code changes .

Why it matters: This is the kind of low-level serving work that can materially improve MoE deployment efficiency at the inference-relevant batch sizes many teams care about .

Code: GitHub · Writeup: subhadipmitra.com

Reference-free auditing may make hidden behaviors easier to detect

A new AuditBench result used Ridge regression from early-layer to late-layer activations and treated the residuals as candidates for planted behavior, avoiding the need for a clean reference model . Reported AUROCs were 0.889 for hardcode_test_cases, 0.844 for animal_welfare, 0.833 for anti_ai_regulation, and 0.800 for secret_loyalty, with 3 of 4 matching or exceeding reference-based methods .

Why it matters: The study was small, but it suggests targeted behaviors may be auditable even when a base model is unavailable, which is a practical constraint in many real evaluation settings .

Code: GitHub · Writeup: Substack

AI-Native School Models Expand as Education Tools Shift Toward Scaffolding and Guardrails
Apr 6
9 min read
1246 docs
Ethan Mollick
Justin Reich
MacKenzie Price
+24
This brief covers the week’s strongest education AI signals: school models built around AI tutoring and compressed schedules, a new wave of tools that guide research and study rather than just answer, and a sharper move toward assignment-level governance, safety boundaries, and evidence-based caution.

AI-native school models are moving from pilots to full operating systems

The biggest signal this week is that AI is starting to define whole learning models, not just classroom tasks. Across Alpha School and Once, AI is being used to restructure time, staffing, and tutoring rather than simply add a chatbot to existing lessons .

Alpha leaders described a mastery-based model where students spend about two hours each morning on AI-driven academics in math, science, and reading, while guides focus on motivation at roughly 1:15, or 1:5 in K-2 . The system assesses what a student knows, identifies gaps, and generates lessons at the right level; Joe Liemandt said the lesson engine uses the curriculum plus a student’s knowledge graph and interest graph, with cognitive load theory planned for 2026 . Alpha also draws a hard line between guided lesson generation and open-ended academic chatbots, which its leaders argue mostly encourage cheating rather than learning . Operationally, the product goes as far as surfacing a “waste meter” when students skip explanations or use time inefficiently .

In interviews, Alpha leaders reported top 1% standardized-test performance across grades and subjects, an average senior SAT of 1550, and movement from bottom-half entrants to above the 90th percentile within two years . Those are school-reported outcomes, and a news segment noted that some educators remain skeptical because AI-based school models are still seen as unproven .

Expansion is moving on multiple fronts. Liemandt said Alpha would have 25 campuses this year and make Time Back broadly accessible in 2026, while Mackenzie Price said Alpha expected about 50 campuses in 2026 and noted a $1 billion capital commitment from Liemandt . Variants are already appearing in specialized formats: Texas Sports Academy says voucher-eligible families can access Alpha academics through its program, and Bennett School pairs two hours of AI-powered learning with elite baseball development . Texas Sports Academy has also cited individual gains from 6th- to 11th-grade reading and from the 42nd to the 82nd percentile .

A narrower, more human-centered implementation comes from Once, which uses AI software to help support staff deliver one-on-one early reading tutoring to children ages 3 to 7 . Its origin story is practical: pandemic-era pilots suggested that 15 minutes of daily tutoring from non-experts could help kindergarten-age children learn to read, and the company is now trying to scale that approach through software inside schools .

“young children learn best from adults, like actual in-person human-to-human instruction”

The strongest new tools guide process rather than replace it

The most useful product pattern this week was not broader generation. It was more scaffolding.

Microsoft’s Search Progress asks students to evaluate source reputation and consequence while they research, then gives teachers visibility into searches, links opened, and sources saved . Built with the Digital Inquiry Group, it is explicitly framed as a way to make research thinking visible at a moment when Microsoft argues students’ baseline media-literacy skills are weak and PISA is preparing a 2029 assessment on media and AI literacy .

Microsoft’s Study and Learn Agent applies the same idea to tutoring. In preview, it shifts Copilot from answer engine to coach: instead of solving a problem outright, it asks what the student has tried, gives just enough explanation to move them forward, and can generate flashcards, quizzes, and study plans grounded in uploaded notes or files . The limitations are clear too: it is still in preview, requires Copilot Chat to be enabled, and is currently for students 13+ .

On the teacher workflow side, Microsoft’s free Teach Module is expanding from drafting into modification: aligning activities to recognized standards in 40+ countries and U.S. states, differentiating instructions, adjusting reading level while preserving key terms, and adding real-world examples . One current constraint is localization: presenters said grade levels are U.S.-based for now and only becoming more localized over the coming months .

Ellis pushes this scaffolding pattern into educator support. It uses a retrieval-augmented system built on trusted sources such as CAST, Understood, NCLD, Digital Promise, and the Reading League to generate classroom strategies and action plans from a teacher’s scenario . Its boundaries matter as much as its features: it stores scenarios for follow-up, strips or replaces student names, and stops the conversation when self-harm or suicidal ideation appears, directing educators back to school protocols and crisis supports .

For self-directed learners, NotebookLM added topic summaries and next-study suggestions after quizzes and flashcards, plus a regenerate option for more practice on selected topics . At the more advanced end, Andrej Karpathy described using LLMs to compile source materials into a markdown wiki in Obsidian, query it for complex questions, and feed outputs back into the knowledge base — powerful for research, but still, in his words, closer to a “hacky collection of scripts” than a mainstream learning product .

Governance is shifting from bans to assignment-level rules and disclosure

Policy is also getting more concrete.

Pineville ISD shifted from “acceptable use” to “responsible use,” arguing that platform-specific rules become obsolete too quickly as AI gets embedded into existing tools . Its most practical move is an assignment-level AI scale that runs from no AI use to AI-focused projects, with teachers choosing the level per task . Microsoft is building the same concept into product workflow: Assignments will let teachers mark expected AI use as none, partial, or full, and attach an explicit prompt when full AI use is allowed .

In higher education, Lance Eaton and Carol Damm’s new transparency framework argues institutions should document their own GenAI use if they expect students to disclose theirs, and that improving export and import features across AI tools could make that record-keeping more realistic .

The urgency is real. One EdSurge essay cited a May 2025 study finding that 84% of high school students used generative AI for schoolwork, and pointed to reporting on pervasive, undisclosed AI use to grade and give feedback on student writing in some New Orleans schools . At the institutional level, Google and IDC warned that uneven adoption inside universities is creating a new digital divide: some students get AI-enabled learning and AI safety practice, while others get neither because faculty, departments, and institutions lack a shared strategy .

Some institutions are now responding at curriculum level. Purdue is moving toward an AI skills graduation requirement, Ohio State wants every freshman through an AI literacy course, and Microsoft noted that PISA’s 2029 assessment will cover media and AI literacy .

Governance also has to cover new harms, not just plagiarism. Laura Knight described a recent UK school deepfake incident involving sexualized images of teachers and warned that AI “friend” chatbots can pull vulnerable children toward attachment and monetized intimacy . Her recommendation is less screen-time rhetoric and more scenario-based professional development, peer support, coaching, and digital self-regulation .

Research is sharpening the line between useful support and unsafe substitution

Research this week reinforced a simple rule: guided assistance can help, but automation is weak where judgment, relationships, or fairness matter.

Where AI is helping

  • In a UK math RCT with 165 students, both human and AI tutors beat written hints; the AI performed slightly better on novel problems and strong Socratic questioning, but human tutors were better at reading emotion and adjusting pace .
  • A Wharton and National Taiwan University study of 770 high-school Python learners found proactive adaptive problem selection outperformed reactive chatbots and produced gains equal to 6-9 extra months of learning .
  • India’s Shiksha Copilot reduced lesson-plan creation from 45-90 minutes to 15, but the study still emphasized teacher-AI collaboration and found English outputs stronger than local-language ones .

Where caution is warranted

  • More AI-driven revision is not automatically better. In a University of Queensland study, hybrid feedback produced more revisions, but all feedback types ended with similar quality, confidence, and grades .
  • A Stanford analysis of four LLMs giving feedback on 600 eighth-grade essays found the same writing received different feedback when models were told the student was low ability, high ability, Asian, male, or female; the practical recommendation was to minimize demographic data in prompts .
  • Thirteen AI detectors tested on 280,000 student works produced an average 41% false-positive rate on short texts, making them unsafe for high-stakes use .
  • Hidden prompt injections still manipulated older and smaller judge models in a new Wharton report, even if most frontier models resisted; Gemini 3 was the only tested frontier model reported as susceptible .
  • Chatbots were not a substitute for human contact: in a two-week RCT with 300 first-year students, only daily conversations with another human reduced loneliness; chatbot chats performed no better than journaling .

That is why Justin Reich argues schools should stop looking for universal AI “best practice” and instead run local experiments, compare student work over time, and decide where AI belongs in core versus peripheral curriculum .

What This Means

  • For school operators: AI is starting to change schedule design, staffing, and specialization. If you are evaluating new models, pair the claims with local experiments and work-sample review rather than copying operator narratives at face value .
  • For teachers and instructional designers: the practical wins are scaffolds and modifications — source evaluation, guided study, differentiated instructions, reading-level adjustment, and lesson planning .
  • For higher ed and L&D teams: the middle path is getting clearer. Ethan Mollick describes AI tutors outside class and more exercises, simulations, grading, and reflection inside class, while institutions like Ohio State and Purdue are moving AI literacy into the curriculum itself .
  • For self-directed learners: source-grounded study is getting better, from NotebookLM’s quiz guidance to LLM-built personal knowledge bases, but the best workflows still depend on curated source sets and active note-building .
  • For school leaders and compliance teams: assignment-level AI expectations and disclosure are likely more durable than blanket bans, especially when detector tools still misfire on short student work .
  • For buyers and investors: the strongest product signals this week were source grounding, teacher control, privacy boundaries, and human fallback — not broader claims of autonomy .

Watch This Space

  • AI-native school expansion. Alpha says Time Back will open more broadly in 2026, and Liemandt says specialized academies are expanding across new schools, sports, and cities .
  • AI literacy becoming a formal requirement. Purdue is moving to an AI skills graduation requirement, Ohio State wants every freshman through AI literacy, and PISA will assess media and AI literacy in 2029 .
  • Personal study stacks and memory-aware workflows. NotebookLM’s quiz upgrade, author-created llms.txt reading experiences, Karpathy’s LLM wikis, and new work on memory-aware agents all point toward more cumulative, source-bound self-study workflows .
  • Student-built learning software. A high school student-built 3D chemistry app prompted Liemandt to predict that students will soon learn from apps built by other students .
  • AI-specific safeguarding. Deepfake sexualized imagery and synthetic-intimacy chatbots are likely to push schools toward more explicit AI safety education, not just generic screen-time rules .

Your time, back.

An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers one verified daily brief.

Save hours

AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.

Full control over the agent

Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.

Verify every claim

Citations link to the original source and the exact span.

Discover sources on autopilot

Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.

Multi-media sources

Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.

Private or Public

Create private agents for yourself, publish public ones, and subscribe to agents from others.

Get your briefs in 3 steps

1

Describe your goal

Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.

Stay updated on space exploration and electric vehicle innovations
Daily newsletter on AI news and research
Track startup funding trends and venture capital insights
Latest research on longevity, health optimization, and wellness breakthroughs
Auto-discover sources

2

Confirm your sources and launch

Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.

Discovering relevant sources...
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter · Gergely Orosz
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile ·
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS · Ben Thompson
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter · Gergely Orosz
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile ·
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS · Ben Thompson

3

Receive verified daily briefs

Get concise, daily updates with precise citations directly in your inbox. You control the focus, style, and length.

Verification Loops Tighten Up as Claude/OpenClaw Friction Surfaces
Apr 6
5 min read
85 docs
ovoDRIZZYxo
Emanuele Di Pietro
Theo - t3.gg
+7
Verification-first agent design was the real signal today: self-QA loops, trace-driven harness learning, and real software-verification budgets. Also inside: OpenClaw's GPT 5.4 dev update, Claude/OpenClaw friction, task-based model routing, and the AI-assisted build lessons behind syntaqlite.

🔥 TOP SIGNAL

Today's clearest alpha: serious coding-agent setups are moving from one-shot generation to verification loops. Peter Steinberger's new OpenClaw self-QA workflow has an orchestrator assign a task, verify the result, and spawn a repair subagent on failure; LangChain describes the same general move as harness improvement from traces, and Andrew Yates says Dropbox has been running a "Ralph loop" Dark Factory since October while Geoffrey Huntley says companies are now spending engineer-salary-level budgets to automate software verification .

🛠️ TOOLS & MODELS

  • OpenClaw — GPT 5.4 dev-channel upgrade. steipete says the claw harness now has GPT 5.4 upgrades; test with openclaw update --channel dev. Early user feedback moved from near-frustration to "way better" / "GOD MODE"
  • Claude Max / Claude Code — harness gating is now concrete. In testing, adding the exact system-prompt string A personal assistant running inside OpenClaw. triggered a 400 saying third-party apps draw from extra usage, not plan limits. Simon Willison says exact-string prompt filtering is a step too far; separately, Theo says Claude Code now refuses the system-fix tasks he mainly kept his subscription for, while Codex still does the work
  • T3Code fork — task-specific handoff. Emanuele DPT's experimental open-source feature routes UI-heavy threads to Claude and logic-heavy threads to Codex. Push to main is planned, and Theo says these increasingly elaborate forks are exactly the mindset he wants encouraged in T3Code itself
  • Salesforce's model mix — real scale, bounded claims. Marc Benioff says Salesforce's 15,000 engineers use coding models from Anthropic, OpenAI Codex, Cursor, and others, plus agents that engineers supervise. His productivity number is more than 30%, not 100%, because models are still not autonomous

💡 WORKFLOWS & TRICKS

  • Self-QA your harness

    1. Add a synthetic message channel to your own agent.
    2. Let an orchestrator define a concrete task.
    3. Verify the result automatically.
    4. If verification fails, spin up a subagent to analyze and fix.
    5. steipete says he built this OpenClaw loop in about six hours and found it better than old-school end-to-end tests
  • Route by task type, not brand loyalty

    • Send UI-heavy work to Claude.
    • Send logic-heavy work to Codex.
    • Keep the handoff explicit so the thread can continue in the model that fits the task
  • Use AI where answers are checkable; keep architecture human-owned

    1. Use AI to crush tedious implementation work — Lalit Maganti used Claude Code to get past 400+ SQLite grammar rules and into concrete prototypes fast
    2. Be skeptical when the task has no objectively checkable answer — Maganti says AI led him into dead ends and encouraged deferring key design decisions
    3. If the prototype proves the idea but the architecture is muddy, throw it away and rebuild with more human-in-the-loop design decisions
  • Let traces improve the system at multiple layers

    1. Run the agent on real tasks and evaluate outcomes.
    2. Store traces.
    3. Use a coding agent to propose harness code changes from those traces.
    4. Update context separately via persistent memory — agent-level files like SOUL.md, tenant-level memory, offline "dreaming," or hot-path updates
  • Plan for supervised agents, not full autonomy

    • Salesforce's benchmark is the right planning assumption for now: engineers supervise coding agents, and even at 15,000-engineer scale the gain Benioff reports is more than 30%, not 100%

👤 PEOPLE TO WATCH

  • Peter Steinberger — shipping OpenClaw internals in public: self-QA loops, dev-channel GPT 5.4 harness changes, and concrete "make GPT better" tweaks rooted in prior Codex work
  • Lalit Maganti — one of the best firsthand build logs in the batch: fast AI-assisted parser implementation, then a disciplined reset once architecture quality slipped. Start with syntaqlite and the full post
  • Simon Willison — worth following because he tests vendor behavior directly. Today he highlighted the exact-string OpenClaw trigger and argued prompt-based billing filters go too far
  • Theo + Emanuele DPT — useful signal on model routing in the wild: an open-source T3Code fork that hands UI work to Claude and logic to Codex, with Theo explicitly wanting that extension mindset inside the main tool

🎬 WATCH & LISTEN

  • 10:25-11:40 — Marc Benioff on the real ceiling of coding agents today. Best calibration clip in the pack: Salesforce says engineers across a 15,000-person org are using coding models and agents, but the human role becomes supervisory rather than disappearing. The number to keep in your head is more than 30% productivity, not autonomy

📊 PROJECTS & REPOS

  • syntaqlite — high-fidelity SQLite parser, formatter, and verifier. The build story is the signal: eight years of wanting, then three months with Claude Code to get it built
  • Deep Agents — LangChain's open-source, model-agnostic base harness. They say traces plus LangSmith CLI and Skills were used to improve it on terminal bench, and it supports user-scoped memory plus background consolidation
  • T3Code Claude/Codex handoff fork — experimental open-source feature, push to main planned. The practical signal is the routing rule itself: different models for UI vs. logic work
  • OpenClaw dev channel — not a new repo, but a live harness update worth testing if you use it: GPT 5.4 upgrades are available via openclaw update --channel dev

Editorial take: the edge is shifting out of raw model IQ and into the wrapper — verification loops, trace-driven harness updates, and blunt task routing between models .

Autonomous Research Advances as Anthropic Pushes into Biotech and Rethinks Agent Access
Apr 6
9 min read
427 docs
vLLM
LightSeek Foundation
Boris Cherny
+36
Autonomous research systems, Anthropic’s biotech acquisition, and tighter controls on agent compute dominated this cycle. The brief also covers Gemma 4’s spread into local developer workflows, new long-context research, evolving agent infrastructure, and policy moves in China and Maine.

Top Stories

Why it matters: This cycle’s biggest signals were about autonomous research, vertical expansion into biotech, the economics of agent usage, wider local-model distribution, and early labor-market measurements.

ASI-Evolve claims end-to-end autonomous AI research

Shanghai Jiao Tong University researchers released ASI-Evolve, an open-sourced system described as running the full AI research loop itself: reading papers, forming hypotheses, designing and running experiments, analyzing results, and iterating without human intervention . In neural architecture search, it ran 1,773 rounds, generated 1,350 candidates, and produced 105 models that beat the best human-designed baseline; the top model exceeded DeltaNet by +0.97 points. The same framework reportedly improved data curation by +3.96 average benchmark points and +18 on MMLU, and produced RL algorithms that beat GRPO by up to +12.5 on competition math .

"This is the first system to demonstrate AI-driven discovery across all three foundational components of AI development in a single framework."

A biomedicine test also showed +6.94 points in drug-target prediction on unseen drugs . One critic argued the work is not the first effort of its kind and said frontier labs still rely on data intuitions that may not be offloaded to a scaffold .

Impact: The paper presents this as a single framework improving architecture, data, and algorithms rather than optimizing only one part of the stack .

Anthropic acquires Coefficient Bio for biotech workflows

Anthropic acquired Coefficient Bio for about $400M. The sub-10-person startup builds AI to plan drug R&D, manage clinical regulatory strategy, and identify new drug opportunities . The team joins Anthropic’s healthcare and life sciences group, which already works with Sanofi, Novo Nordisk, AbbVie, and others .

"I’m talking about using AI to perform, direct, and improve upon nearly everything biologists do."

Posts around the deal frame it as execution on Dario Amodei’s "virtual biologist" idea, with Coefficient Bio covering drug discovery, clinical trials, and regulatory submissions end to end .

Impact: The deal pushes Anthropic further from being only a general-model vendor and deeper into healthcare-specific workflows .

Claude access rules now reflect harness economics

A notice said Claude subscriptions will no longer cover usage on third-party tools like OpenClaw, though users can still buy extra usage bundles or use a Claude API key . A later analysis argued some harnesses send repeated low-value requests with long contexts—often over 100K tokens—making costs tens of times higher than a subscription price . Another post framed Anthropic’s position as allowing products that complement Claude Code but not direct competitors , a characterization one critic rejected .

Impact: The dispute is no longer just about model quality. It is about who gets subsidized compute, who has to pay API rates, and how efficiently agent frameworks use context and caching .

Gemma 4 is turning into a distribution story

Gemma 4 is now integrated into Android Studio Agent mode for local feature development, refactoring, and bug fixing . Separate posts highlighted 1,500 free daily requests to Gemma 4 31B in Google AI Studio , and one user described Gemma running locally on a Pixel phone with no connectivity . Gemma 4 was also cited as the #1 trending model on Hugging Face.

Impact: Gemma 4 is showing up across IDEs, hosted inference, and offline edge use, which is a stronger adoption signal than benchmarks alone .

Goldman Sachs sees a net labor-market drag from AI substitution

Goldman Sachs estimated that, over the past year, AI substitution reduced monthly payroll growth by roughly 25,000 and raised unemployment by 0.16 percentage points, while augmentation added about 9,000 jobs and lowered unemployment by 0.06 points. Netting the two implies a 16,000 monthly drag on payroll growth and a 0.1 point boost to unemployment, with the negative effects concentrated among less experienced workers .

Impact: The note argues that today’s net labor effect is already negative and is falling disproportionately on entry-level workers .

Research & Innovation

Why it matters: The strongest technical work this cycle focused on cheaper long-context inference, better credit assignment for reasoning, and more formal ways to generate theory with LLMs.

Long-context methods keep chipping away at attention cost

  • HISA replaces a flat sparse-attention token scan with a two-stage block-then-token pipeline, eliminating the indexing bottleneck at 64K context without extra training .
  • Screening Is Enough / Multiscreen replaces softmax-style global competition with threshold-based screening, matching Transformer-like validation loss with 40% fewer parameters and reducing inference latency by up to 3.2x at 100K context .
  • Commentary around this work framed sparse attention as a form of maximum inner product search, while noting that approaches with better theoretical complexity still have to work on GPUs and at datacenter scale to matter in practice .

New training methods target deeper reasoning and smaller working contexts

  • FIPO uses discounted future-KL signals in policy updates, pushing average chain-of-thought length past 10,000 tokens and reaching 56.0% AIME 2024 Pass@1 on Qwen2.5-32B.
  • SKILL0 tries to internalize agentic skills into model weights instead of retrieving them at runtime, reporting gains of over 9% on ALFWorld and 6% on Search-QA while cutting context usage to under 0.5K tokens per step.
  • Principia introduces benchmarks and training recipes for deriving mathematical objects, with gains from on-policy judge training and verifiers that also transfer to standard numerical and multiple-choice math benchmarks .

LLMs are starting to participate in theoretical science workflows

steepest-descent-lean formalizes convergence bounds and hyperparameter scaling laws in Lean using Codex . The work reproduces prior-style results under weaker assumptions, including support for Nesterov momentum and decoupled weight decay, and recovers a fixed-token-budget scaling law of BS ≍ T²⁄³. Its stated workflow is simple: formalize a peer-reviewed proof, ask an LLM to weaken assumptions and re-derive theorems, then keep only the changes that preserve or better match empirical results . The repo is here: steepest-descent-lean.

Products & Launches

Why it matters: Useful product progress this cycle was less about one big model launch and more about better infrastructure around coding agents, memory, routing, and interface layers.

GitNexus adds a code graph for agent workflows

GitNexus indexes a codebase into a graph using Tree-sitter, mapping calls, imports, inheritance, execution flows, and blast radius before code changes . The pitch is that agents get the repo’s dependency structure precomputed at index time, so smaller models can answer architecture questions without repeated exploration . Setup is a single command: npx gitnexus analyze. The project was cited as already reaching 9.4K GitHub stars and 1.2K forks.

New building blocks are landing for agent memory and control planes

  • Memvid offers a single-file memory layer for agents with instant retrieval and portable, versioned long-term memory without a database .
  • Plano is an open-source AI-native proxy and data plane for agentic apps, with built-in orchestration, safety, observability, and smart LLM routing .

Hermes Agent expands its interfaces

Hermes Agent added support for OAuth-authenticated MCP servers, can expose an OpenAI-compatible endpoint for use with OpenWebUI as a chat interface , and now ships a Manim skill for generating programmatic math and technical animations via /manim-video . One demo combined the Manim skill with Math Code to produce an explanatory video for Jordan’s Lemma.

DESIGN.md turns visual style into plain text for coding agents

The awesome-design-md repo packages design-system descriptions from 31 real websites into markdown files that agents can read directly, covering colors, typography, spacing, buttons, shadows, and responsive rules . The project was presented as a way to avoid repetitive default AI UI aesthetics, and it has now been integrated with Hermes Agent .

Industry Moves

Why it matters: The business story this cycle was about sustainable economics, talent concentration, hardware experimentation, and the changing labor and data supply chains behind AI systems.

New pricing models are emerging for agent-heavy workloads

Alongside Anthropic’s tighter subscription rules, MiMo launched a Token Plan that supports third-party harnesses through token quotas and frames the model as long-term, stable delivery rather than open-ended subscription usage . The surrounding commentary argued the market is moving toward a combination of more token-efficient agent harnesses and more efficient models, not simply cheaper tokens .

PrimeIntellect added open-source training talent

Open-source researcher Elie Bakouch said he is joining PrimeIntellect to work on pre/mid training, citing the team’s open-frontier mission and the leverage of a small focused group . Peers called the hire an "unbelievable get" and a "phenomenal choice".

Neuromorphic computing patent activity is accelerating

A PatSnap-cited note said neuromorphic computing has moved from academic prototype to commercial product, with 596 patents filed through early 2026 and a 401% surge in activity during 2025 .

A once-important data-labeling channel in China is weakening

Data-labelling workshops in rural Guizhou that were once part of China’s poverty alleviation effort and helped build AI systems are now struggling as state support and industry demand have fallen . One commentator suggested similar programs could still be repurposed toward graduate unemployment, but that was presented as an open question rather than a current policy .

Policy & Regulation

Why it matters: The policy signal this cycle was less about sweeping AI laws and more about concrete requirements on how companies govern AI internally and how jurisdictions handle AI infrastructure growth.

Beijing now requires AI ethics committees

New Beijing rules require all Chinese companies engaging in AI activities to establish internal AI ethics committees, effective immediately . The final version removed earlier wording that made such committees conditional on circumstances , and the move follows a 2023 ethics review system that had been criticized as too narrow and too perfunctory for AI-specific issues . One commentator said the plain reading could be especially hard on smaller startups and questioned how it will be enforced .

Maine is moving to pause large data-center projects

Maine is on track to become the first U.S. state to pause construction of large data centers—projects over 20 megawatts—until November 2027 while it studies environmental and energy impacts . Commentary around the move acknowledged concerns about rising electricity costs while arguing that infrastructure limits should not become a blanket brake on AI development . The linked report is from the Wall Street Journal: Maine data center ban.

Quick Takes

Why it matters: These smaller items are useful signals for where local deployment, inference optimization, tooling behavior, and public understanding are moving next.*

  • A local grounded reasoning demo paired Gemma 4 with Falcon Perception, using Gemma to decide what to inspect and Falcon to return pixel-accurate coordinates; one example checked whether a soccer player was offside, fully local on M3 hardware .
  • TorchSpec said its kimi-k2.5-eagle3 draft model hit 40K downloads on Hugging Face in two weeks, and vLLM said it adopted the open-source EAGLE3 draft model for low-latency inference on Kimi 2.5.
  • A weekend project showed a fully local coding agent using Qwen3.5 30B A3B, llama.cpp/lemonade, ngrok, and OpenHands; the builder said performance was better than expected .
  • Claude Code now reportedly throws an error when asked to analyze its own source code . Separately, one user said the tool now refuses some system-fixing workflows they previously relied on, while Codex still accepts them .
  • François Chollet highlighted a tutorial for fine-tuning Gemma on TPU v5 using Kinetic + Keras + JAX, with a quick-start repo here: kinetic-finetuning-on-cloud-tpu.
  • Ryan Greenblatt argued that statements like "AGI is here" or "we’re far from AGI" are not meaningful unless the speaker defines the term being used .
  • The documentary The AI Doc is now showing in hundreds of theaters, and one commentator said non-technical viewers valued its plain explanation of how LLMs work .
Codegen Catch-Up Leads a Day of AI and Decision-Making Picks
Apr 6
3 min read
156 docs
نادي ريادة الأعمال
Amjad Masad
Bill Gates
+3
Today's recommendations split between frontier AI literacy and stronger judgment. Garry Tan's codegen interview pick leads the list, followed by Bill Gates' Steve Wolfram book recommendation, three books from Anthropic's Amol Avasarala, and one founder-endorsed article on entrepreneurial mindset.

Today's signal

The clearest pattern today is a split between AI literacy and better judgment. Garry Tan and Bill Gates point readers toward resources that help explain current model capability and uncertainty, while Amol Avasarala and Amjad Masad recommend reading that sharpens mindset, reflection, and decision-making.

Most compelling recommendation

Newest @steipete / @lexfridman interview

  • Content type: Interview
  • Author/creator: @steipete and @lexfridman
  • Link/URL: Not provided in the source material
  • Who recommended it: Garry Tan
  • Key takeaway: Tan says it is "probably the best way to catch up to what is really going on" and calls it "a perfect encapsulation of the extreme edge of what is possible now in codegen."
  • Why it matters: This is the strongest recommendation in today's set because it combines a clear use case—getting up to speed quickly—with a concrete thesis about where the frontier is headed: "just in time smart personal software is here, just not evenly distributed."

"Probably the best way to catch up to what is really going on is listen to the newest @steipete @lexfridman interview"

AI mechanics and limits

How ChatGPT Works

  • Content type: Book
  • Author/creator: Steve Wolfram
  • Link/URL: Not provided in the source material
  • Who recommended it: Bill Gates
  • Key takeaway: Gates recommends it as a useful resource on language model mechanics, while stressing Wolfram's point that the way these systems represent knowledge is still not fully understood.
  • Why it matters: The recommendation does more than point to an explainer; it also frames the current limit of expert understanding.

"the way that it's actually representing knowledge we don't fully understand."

Books on mindset and decision quality

Amol Avasarala's recommendations form a coherent mini-list: two books on reframing internal experience, and one on making product decisions more explicitly probabilistic.

The Joy of Living

  • Content type: Book
  • Author/creator: Yongi Mingya Ringpoche
  • Link/URL: Not provided in the source material
  • Who recommended it: Amol Avasarala
  • Key takeaway: It helps readers think about life experience differently and offers tactics for changing how they think about things.
  • Why it matters: Avasarala says he has recommended it repeatedly and that people have "really, really enjoyed it."

Awareness

  • Content type: Book
  • Author/creator: Anthony DeMello
  • Link/URL: Not provided in the source material
  • Who recommended it: Amol Avasarala
  • Key takeaway: He describes it as offering similar value from a different angle.
  • Why it matters: He groups it with The Joy of Living as one of the books he consistently recommends.

Thinking in Bets

  • Content type: Book
  • Author/creator: Annie Duke
  • Link/URL: Not provided in the source material
  • Who recommended it: Amol Avasarala
  • Key takeaway: It is tactically useful in product because it pushes teams to convert vague timing claims into explicit probability estimates.
  • Why it matters: Avasarala gives a concrete operating example: asking for a percentage likelihood instead of accepting "I don't know, it'll get done in time."

Lower-context but still useful

Article on entrepreneurial mindset

  • Content type: Article
  • Author/creator: Not provided in the source material
  • Link/URL:http://x.com/i/article/2040473188568649728
  • Who recommended it: Amjad Masad
  • Key takeaway: Masad's endorsement is brief but explicit: he called it a "great article" on entrepreneurial mindset.
  • Why it matters: There is less context than the other picks, but this one is immediately accessible through a direct article link.
Activation, AI-Native UX, and the New PM Operating Model
Apr 6
12 min read
57 docs
Hiten Shah
The community for ventures designed to scale rapidly | Read our rules before posting ❤️
Aakash Gupta
+5
This brief covers the main PM shifts emerging from AI-native product work: activation is becoming the key growth lever, interface strategy is being rethought beyond simple chat-first assumptions, and AI-heavy teams may need more PM structure and capacity, not less. It also includes practical plays for onboarding, experimentation, leadership, and career progression.

Big Ideas

1) AI-native design is not the same as chat-first design

Andrew Chen’s test is simple: stop pitching products as “X but with AI,” and instead ask how the experience would be designed if AI existed from day one .

"the best products ask “if AI existed from day one, how would this experience be designed?”"

One thread says some SaaS products are already collapsing dense homepages into a single prompt field, which shifts defensibility from UI complexity to backend strengths like API surface, data model, and integrations . That same note points to Snowflake, Databricks, and Stripe as examples of companies that already treated the UI as a thin layer over a deeper engine .

A second thread adds an important constraint: many B2B SaaS users still do not want chat as the entry point. They want curated data and tools surfaced for them, and the builder’s job is curation .

Why it matters: PMs should rethink entry points for AI products, but not assume a prompt bar is always the answer.

How to apply:

  • Start with the first-value moment: should users see something useful immediately, or ask for it?
  • If you simplify the UI, audit what sits behind it: API quality, data model, and integrations become more important
  • Use AI-native design as the standard, but avoid lazy “existing product + AI” framing

2) In AI products, activation is the highest-leverage growth problem

In the Anthropic interview, activation is framed as critical because day-zero/day-one experience is often the highest-leverage input into long-term retention . The challenge is harder in AI because model capabilities improve so quickly that users often fail to discover what the product can actually do; the interview calls this “capability overhang” .

Anthropic’s response is to ask users who they are and what they care about, then use that information to recommend the right product or feature . The broader claim is that good friction can improve conversion when it personalizes the path to value . Lenny’s summary of the episode labels activation the single highest-leverage growth problem in AI .

Why it matters: In AI products, better models do not automatically create better user outcomes. Discovery of value is now a product problem.

How to apply:

  • Treat onboarding as routing, not just account creation
  • Ask a few high-signal questions early if they help match users to the right feature or workflow
  • Judge onboarding by downstream activation and retention, not just time-to-first-screen

3) AI-heavy product orgs may need more PM capacity, not less

Anthropic’s view is that engineers are currently getting the biggest leverage gains from AI tools like Claude Code, with engineering productivity described as roughly 2-3x higher . The consequence is that PMs and designers can end up managing the equivalent of a much larger engineering team, putting those functions under strain .

Their response is not to remove PMs. It is to hire more of them, while also hiring product-minded engineers who can act as mini-PMs on smaller projects .

Why it matters: The AI-era org question is not just “how many engineers can one PM support?” It is also “how much product direction and coordination is required when build capacity expands faster than planning capacity?”

How to apply:

  • Recalculate PM:engineering ratios if AI tools materially change engineering throughput
  • Hire for product-minded engineers who can own small, bounded workstreams
  • Move PM focus upward toward direction-setting and cross-functional alignment when execution speed increases

4) In exponential AI products, growth teams are biasing toward bigger swings

Anthropic says AI-first products should spend much more time on larger bets than a traditional growth team would, with roughly 50-70% of effort going to bigger swings instead of mostly small-to-medium optimizations . The reasoning is that if product value is expected to increase dramatically as model capabilities improve, the upside of finding the next major use case can outweigh many small wins . Small optimizations still matter and compound, but they are treated as secondary .

Why it matters: Prioritization rules change when the product’s value curve is changing quickly.

How to apply:

  • Keep a portfolio of small experiments, but reserve real capacity for larger product swings
  • Use this bias only where product value is truly AI-driven, not as a blanket rule for every software business
  • Revisit prioritization often as model capabilities change what is possible

Tactical Playbook

1) Design onboarding with good friction

Anthropic, Masterclass, Mercury, and Calm are all cited as cases where extra steps, quizzes, or broken-out screens improved conversion when they helped users understand why the product was for them .

How to do it:

  1. Ask a small number of questions that reveal user intent or identity
  2. Use those answers to recommend the right feature, content, or product path
  3. Split cognitively heavy forms into smaller steps when needed
  4. Remove friction that adds no value, but keep friction that improves relevance and comprehension
  5. Validate with conversion and funnel-completion data rather than intuition alone

Why it matters: Faster is not always better. More guided can outperform more minimal when users need help finding value .

2) Match process rigor to project size

Anthropic uses a clear execution rule: if a project is about two engineering weeks or less, the engineer can effectively act as the PM, with the PM advising as needed . Small changes may only need Slack messages and quick back-and-forth, while larger work gets a formal kickoff and, when useful, an AI-generated PRD built from prior examples .

How to do it:

  1. Define a size threshold for lightweight vs. heavy process
  2. For small work, rely on fast conversation and prototyping instead of default documentation
  3. For larger or riskier work, run a cross-functional kickoff with legal, safeguards, and other key stakeholders
  4. Use AI to draft PRDs from previous documents when documentation is needed
  5. Keep PMs accountable for larger bets and engineers accountable for bounded execution work

Why it matters: Faster building only helps if the process does not bottleneck small work or under-structure high-risk work.

3) Operationalize AI experimentation with a four-stage loop

Anthropic’s CASH initiative breaks experimentation into four parts: identifying opportunities, building, testing against quality and brand bars, and analyzing results after launch . The team scores model performance at each stage and started with narrow use cases like copy changes and minor UI tweaks . Human review is still in the loop, especially for brand-sensitive outputs .

How to do it:

  1. Separate the workflow into opportunity identification, build, test, and analysis
  2. Measure AI performance at each stage instead of treating “AI experimentation” as one block
  3. Start with high-volume, low-scope work such as copy or small UI changes
  4. Keep human approval where brand or stakeholder risk is high
  5. Track whether time spent is falling and results are improving week over week

Why it matters: AI is already useful in parts of the experimentation loop, but not equally across all parts.

4) Replace siloed growth reviews with one scorecard

A startup founder proposed a PLG Growth Scorecard because growth reviews were fragmented across Mixpanel, Stripe, HubSpot, and spreadsheets, leaving traffic, activation, and MRR disconnected from one another . The scorecard covers seven self-serve stages: Awareness, Acquisition, Activation, Conversion, Engagement, Retention, and Expansion .

How to do it:

  1. Map your funnel across all seven stages
  2. Assign each metric to a named owner across Marketing, Product, Sales, RevOps, or CS
  3. Add goal and trend tracking for every stage
  4. Choose a North Star metric; the example defaults to Activation Rate
  5. Use the full view to diagnose where the funnel actually leaks

Why it matters: PMs can make better trade-offs when they can see the full self-serve system, not just the product slice.

Case Studies & Lessons

1) Anthropic: hypergrowth creates “success disasters”

Lenny’s post says Anthropic grew from $1B to $19B ARR in a year and added $6B in ARR in February alone . In the interview, Amol Avasare says about 70% of his time goes to what Anthropic calls “success disasters”: urgent scaling problems across acquisition, activation, and monetization created by rapid growth . The remaining 30% goes to more proactive work such as product prioritization, pricing, and new-product funnels .

The team is roughly 40 people, organized with cross-cutting horizontals like growth platform and monetization, plus audience-focused pods for B2B, Claude Code, knowledge workers, and API users .

Key takeaway: At sufficient scale, growth stops looking like a clean experimentation backlog and starts looking like systems management. Org design has to support both firefighting and focused audience work .

2) Mercury: a quarter spent on quality produced a significant onboarding uplift

While at Mercury, Avasare says the team spent an entire quarter fixing onboarding quality for a complex regulated flow and explicitly set aside the usual growth-metric mindset for that period . The result was a significant uplift in onboarding start-to-completion . His broader lesson from that experience is that quality drives growth .

Key takeaway: When a critical flow is broken or confusing, quality work can outperform another quarter of metric chasing .

3) CASH: AI is already useful for narrow, high-volume growth work

Anthropic’s internal CASH effort is still early, but it is already producing results on small-scale experiments such as copy changes and minor UI tweaks . Avasare describes the current win rate as closer to a junior PM than a senior PM, while noting that progress has been rapid and human approval remains in place today .

Key takeaway: The near-term opportunity is not full PM automation. It is targeted automation of repetitive experiment loops where volume is high and risk is manageable .

Career Corner

1) In AI product work, PM advantage comes from tool fluency, adaptability, and interdisciplinary depth

Avasare’s career advice is to stay on top of the tools, understand what each new model release changes, and apply that learning to your own work . He also argues that PMs should lean into their strongest interdisciplinary edge, whether that is design, finance, sales, or something else, because mixed-skill operators become unusually valuable when roles blur . His warning is that 50-70% of old playbooks may no longer apply in AI-heavy environments .

How to apply:

  • Build a habit of testing new tools and releases directly
  • Double down on the cross-functional skill that makes you unusually useful
  • Assume some prior PM habits will need to be rewritten, not merely updated

2) Cold outreach still works when it is specific and tested

Avasare says he got his Anthropic role by cold emailing Mike Krieger, arguing the company needed a growth team . His tactics: use a tested subject line and message, reach out where others are not overwhelming the recipient, keep the pitch short, and follow up multiple times if it matters .

How to apply:

  • Lead with a crisp point of view on the company’s need
  • Keep the message short: who you are, why you fit, why you should talk
  • Follow up persistently when the opportunity matters

3) An adjacent operator role can be a bridge into PM

In one r/ProductManagement thread, a Data Analyst opportunity was described as owning tools, managing data pipelines, fixing bugs, shipping enhancements, and potentially building new capabilities over time . A commenter’s advice was to take that role, learn PM while building new capabilities, gradually evolve the work toward full PM scope, then negotiate the title change internally with manager support .

How to apply:

  • Favor adjacent roles with real ownership over tools or workflows
  • Start practicing PM as soon as you are shaping new capabilities
  • Use internal mobility and manager sponsorship to formalize the transition

4) When leadership feedback is vague, treat the executive like a user and force clarity

In another r/ProductManagement thread about VP-level expectations, commenters suggested treating the CEO or manager like a user: figure out what they say they want, then uncover the underlying need . The practical advice was to define the yardstick for success, run experiments to show course correction, and socialize a draft plan quickly . Some commenters also recommended getting external coaching from experienced leaders .

The same thread also raised a warning: vague expectations, unclear KPIs, and treating ambiguity as a failure rather than part of the role can indicate level mismatch or broader trouble .

How to apply:

  • Turn fuzzy feedback into explicit success metrics and review checkpoints
  • Socialize a draft plan early rather than waiting for perfect clarity
  • If expectations remain subjective and unstable, treat that as data about fit, not just performance

Tools & Resources

1) PLG Growth Scorecard

What it is: A unified dashboard across Awareness, Acquisition, Activation, Conversion, Engagement, Retention, and Expansion, with named owners, goals, trends, and a configurable North Star metric .

Why explore it: It replaces the common “five-dashboard scramble” where traffic, product, and revenue reviews do not line up .

Try it: Start with Activation Rate if you need one leading indicator, then add cross-stage leak detection .

2) The CASH experiment loop

What it is: Anthropic’s framework for AI-assisted growth experimentation: identify opportunities, build, test against brand/quality, and analyze outcomes .

Why explore it: It gives PMs a practical way to break AI experimentation into measurable stages instead of treating it as one black box .

Try it: Pilot it on copy changes or small UI tweaks, and keep a human approval step for brand-sensitive output .

3) A lightweight kickoff + AI-generated PRD pattern

What it is: A process pattern where small work happens in Slack and prototypes, while larger work gets a proper kickoff plus a lightweight AI-generated PRD built from prior documents .

Why explore it: It keeps teams from over-documenting small changes while still adding structure where risk is higher .

Try it: Define one size threshold in engineering weeks and one kickoff template for cross-functional work .

4) Loom AI for product demo cleanup

What it is: A tool recommendation from r/startups for automatically trimming pauses and generating transcripts and timestamps for product demos .

Why explore it: Demo editing can take longer than recording; this reduces cleanup time .

Try it: Use it for internal walkthroughs, stakeholder demos, and early customer-facing product tours .

5) Prototype before you pitch

What it is: Andrew Chen argues that we should hear fewer investor pitches based on a “drawing on a napkin” because, if you can draw it, you can often prompt it into existence now .

Why explore it: The bar for pre-product storytelling is moving toward something interactive or tangible .

Try it: Before a concept review or fundraising conversation, turn the sketch into a thin prototype first .

X Targets Agent Builders as Local AI and Model Auditing Advance
Apr 6
3 min read
154 docs
X Freeze
Machine Learning
clem 🤗
+1
Infrastructure was the clearest theme today: X repackaged its API around agent workflows, Gemma 4 gained more local deployment traction, and open-source work pushed both model serving and auditing forward.

Builder platforms and deployment

X is repositioning its API for AI agents

The X API was presented as a major update for AI agents and builders, with pay-per-use pricing, native XMCP Server + Xurl support for agents, official Python and TypeScript XDKs, and a free API Playground for simulated testing . X also said purchases can return up to 20% in xAI API credits and pointed developers to docs.x.com.

Why it matters: This is a real packaging change, not just a feature tweak: X is trying to make its real-time data and action surface easier to use as agent infrastructure .

Kreuzberg pushed document intelligence deeper into code workflows

Kreuzberg v4.7.0 introduced tree-sitter-based code intelligence for 248 formats, including AST-level extraction of functions, classes, imports, symbols, and docstrings, with scope-aware chunking for repo analysis, PR review, and codebase indexing . The release also reported markdown-quality gains across 23 formats, added a TOON wire format that cuts prompt tokens by 30-50%, and shipped integration as a document backend for OpenWebUI .

Why it matters: The project is explicitly positioning itself as infrastructure for agents, with better extraction quality and smaller prompt payloads aimed at making code and document analysis more reliable and cheaper .

Project: GitHub

Gemma 4 kept picking up local deployment paths

NVIDIA said it is accelerating Gemma 4 for local agentic AI across hardware "from RTX to Spark" , and Hugging Face CEO Clement Delangue said Gemma 4 had reached the top spot on Hugging Face . Separately, an open-source FPGA project reported roughly 450 tokens per second on an AMD Kria KV260 using a custom 36-core heterogeneous pipeline and a smaller distilled INT4/KAN runtime model, though the team said the quantized weights are still a work in progress .

Why it matters: The notable signal here is ecosystem traction: vendor support and community experimentation are creating more concrete local and edge paths around Gemma 4 .

Resources: NVIDIA blog · FPGA repo

Research and evaluation

A pure-Triton MoE kernel posted inference-time wins

A fused MoE dispatch kernel written in pure Triton reported faster forward-pass performance than Stanford's CUDA-optimized Megablocks on Mixtral-8x7B at inference batch sizes, with gains of 131% at 32 tokens and 124% at 128 tokens on A100 . The writeup attributes this to fused gate+up projection that removes about 470MB of intermediates and cuts memory traffic by 35%, plus block-scheduled grouped GEMM that handles variable-sized expert batches without padding; tests also passed on AMD MI300X without code changes .

Why it matters: This is the kind of low-level serving work that can materially improve MoE deployment efficiency at the inference-relevant batch sizes many teams care about .

Code: GitHub · Writeup: subhadipmitra.com

Reference-free auditing may make hidden behaviors easier to detect

A new AuditBench result used Ridge regression from early-layer to late-layer activations and treated the residuals as candidates for planted behavior, avoiding the need for a clean reference model . Reported AUROCs were 0.889 for hardcode_test_cases, 0.844 for animal_welfare, 0.833 for anti_ai_regulation, and 0.800 for secret_loyalty, with 3 of 4 matching or exceeding reference-based methods .

Why it matters: The study was small, but it suggests targeted behaviors may be auditable even when a base model is unavailable, which is a practical constraint in many real evaluation settings .

Code: GitHub · Writeup: Substack

AI-Native School Models Expand as Education Tools Shift Toward Scaffolding and Guardrails
Apr 6
9 min read
1246 docs
Ethan Mollick
Justin Reich
MacKenzie Price
+24
This brief covers the week’s strongest education AI signals: school models built around AI tutoring and compressed schedules, a new wave of tools that guide research and study rather than just answer, and a sharper move toward assignment-level governance, safety boundaries, and evidence-based caution.

AI-native school models are moving from pilots to full operating systems

The biggest signal this week is that AI is starting to define whole learning models, not just classroom tasks. Across Alpha School and Once, AI is being used to restructure time, staffing, and tutoring rather than simply add a chatbot to existing lessons .

Alpha leaders described a mastery-based model where students spend about two hours each morning on AI-driven academics in math, science, and reading, while guides focus on motivation at roughly 1:15, or 1:5 in K-2 . The system assesses what a student knows, identifies gaps, and generates lessons at the right level; Joe Liemandt said the lesson engine uses the curriculum plus a student’s knowledge graph and interest graph, with cognitive load theory planned for 2026 . Alpha also draws a hard line between guided lesson generation and open-ended academic chatbots, which its leaders argue mostly encourage cheating rather than learning . Operationally, the product goes as far as surfacing a “waste meter” when students skip explanations or use time inefficiently .

In interviews, Alpha leaders reported top 1% standardized-test performance across grades and subjects, an average senior SAT of 1550, and movement from bottom-half entrants to above the 90th percentile within two years . Those are school-reported outcomes, and a news segment noted that some educators remain skeptical because AI-based school models are still seen as unproven .

Expansion is moving on multiple fronts. Liemandt said Alpha would have 25 campuses this year and make Time Back broadly accessible in 2026, while Mackenzie Price said Alpha expected about 50 campuses in 2026 and noted a $1 billion capital commitment from Liemandt . Variants are already appearing in specialized formats: Texas Sports Academy says voucher-eligible families can access Alpha academics through its program, and Bennett School pairs two hours of AI-powered learning with elite baseball development . Texas Sports Academy has also cited individual gains from 6th- to 11th-grade reading and from the 42nd to the 82nd percentile .

A narrower, more human-centered implementation comes from Once, which uses AI software to help support staff deliver one-on-one early reading tutoring to children ages 3 to 7 . Its origin story is practical: pandemic-era pilots suggested that 15 minutes of daily tutoring from non-experts could help kindergarten-age children learn to read, and the company is now trying to scale that approach through software inside schools .

“young children learn best from adults, like actual in-person human-to-human instruction”

The strongest new tools guide process rather than replace it

The most useful product pattern this week was not broader generation. It was more scaffolding.

Microsoft’s Search Progress asks students to evaluate source reputation and consequence while they research, then gives teachers visibility into searches, links opened, and sources saved . Built with the Digital Inquiry Group, it is explicitly framed as a way to make research thinking visible at a moment when Microsoft argues students’ baseline media-literacy skills are weak and PISA is preparing a 2029 assessment on media and AI literacy .

Microsoft’s Study and Learn Agent applies the same idea to tutoring. In preview, it shifts Copilot from answer engine to coach: instead of solving a problem outright, it asks what the student has tried, gives just enough explanation to move them forward, and can generate flashcards, quizzes, and study plans grounded in uploaded notes or files . The limitations are clear too: it is still in preview, requires Copilot Chat to be enabled, and is currently for students 13+ .

On the teacher workflow side, Microsoft’s free Teach Module is expanding from drafting into modification: aligning activities to recognized standards in 40+ countries and U.S. states, differentiating instructions, adjusting reading level while preserving key terms, and adding real-world examples . One current constraint is localization: presenters said grade levels are U.S.-based for now and only becoming more localized over the coming months .

Ellis pushes this scaffolding pattern into educator support. It uses a retrieval-augmented system built on trusted sources such as CAST, Understood, NCLD, Digital Promise, and the Reading League to generate classroom strategies and action plans from a teacher’s scenario . Its boundaries matter as much as its features: it stores scenarios for follow-up, strips or replaces student names, and stops the conversation when self-harm or suicidal ideation appears, directing educators back to school protocols and crisis supports .

For self-directed learners, NotebookLM added topic summaries and next-study suggestions after quizzes and flashcards, plus a regenerate option for more practice on selected topics . At the more advanced end, Andrej Karpathy described using LLMs to compile source materials into a markdown wiki in Obsidian, query it for complex questions, and feed outputs back into the knowledge base — powerful for research, but still, in his words, closer to a “hacky collection of scripts” than a mainstream learning product .

Governance is shifting from bans to assignment-level rules and disclosure

Policy is also getting more concrete.

Pineville ISD shifted from “acceptable use” to “responsible use,” arguing that platform-specific rules become obsolete too quickly as AI gets embedded into existing tools . Its most practical move is an assignment-level AI scale that runs from no AI use to AI-focused projects, with teachers choosing the level per task . Microsoft is building the same concept into product workflow: Assignments will let teachers mark expected AI use as none, partial, or full, and attach an explicit prompt when full AI use is allowed .

In higher education, Lance Eaton and Carol Damm’s new transparency framework argues institutions should document their own GenAI use if they expect students to disclose theirs, and that improving export and import features across AI tools could make that record-keeping more realistic .

The urgency is real. One EdSurge essay cited a May 2025 study finding that 84% of high school students used generative AI for schoolwork, and pointed to reporting on pervasive, undisclosed AI use to grade and give feedback on student writing in some New Orleans schools . At the institutional level, Google and IDC warned that uneven adoption inside universities is creating a new digital divide: some students get AI-enabled learning and AI safety practice, while others get neither because faculty, departments, and institutions lack a shared strategy .

Some institutions are now responding at curriculum level. Purdue is moving toward an AI skills graduation requirement, Ohio State wants every freshman through an AI literacy course, and Microsoft noted that PISA’s 2029 assessment will cover media and AI literacy .

Governance also has to cover new harms, not just plagiarism. Laura Knight described a recent UK school deepfake incident involving sexualized images of teachers and warned that AI “friend” chatbots can pull vulnerable children toward attachment and monetized intimacy . Her recommendation is less screen-time rhetoric and more scenario-based professional development, peer support, coaching, and digital self-regulation .

Research is sharpening the line between useful support and unsafe substitution

Research this week reinforced a simple rule: guided assistance can help, but automation is weak where judgment, relationships, or fairness matter.

Where AI is helping

  • In a UK math RCT with 165 students, both human and AI tutors beat written hints; the AI performed slightly better on novel problems and strong Socratic questioning, but human tutors were better at reading emotion and adjusting pace .
  • A Wharton and National Taiwan University study of 770 high-school Python learners found proactive adaptive problem selection outperformed reactive chatbots and produced gains equal to 6-9 extra months of learning .
  • India’s Shiksha Copilot reduced lesson-plan creation from 45-90 minutes to 15, but the study still emphasized teacher-AI collaboration and found English outputs stronger than local-language ones .

Where caution is warranted

  • More AI-driven revision is not automatically better. In a University of Queensland study, hybrid feedback produced more revisions, but all feedback types ended with similar quality, confidence, and grades .
  • A Stanford analysis of four LLMs giving feedback on 600 eighth-grade essays found the same writing received different feedback when models were told the student was low ability, high ability, Asian, male, or female; the practical recommendation was to minimize demographic data in prompts .
  • Thirteen AI detectors tested on 280,000 student works produced an average 41% false-positive rate on short texts, making them unsafe for high-stakes use .
  • Hidden prompt injections still manipulated older and smaller judge models in a new Wharton report, even if most frontier models resisted; Gemini 3 was the only tested frontier model reported as susceptible .
  • Chatbots were not a substitute for human contact: in a two-week RCT with 300 first-year students, only daily conversations with another human reduced loneliness; chatbot chats performed no better than journaling .

That is why Justin Reich argues schools should stop looking for universal AI “best practice” and instead run local experiments, compare student work over time, and decide where AI belongs in core versus peripheral curriculum .

What This Means

  • For school operators: AI is starting to change schedule design, staffing, and specialization. If you are evaluating new models, pair the claims with local experiments and work-sample review rather than copying operator narratives at face value .
  • For teachers and instructional designers: the practical wins are scaffolds and modifications — source evaluation, guided study, differentiated instructions, reading-level adjustment, and lesson planning .
  • For higher ed and L&D teams: the middle path is getting clearer. Ethan Mollick describes AI tutors outside class and more exercises, simulations, grading, and reflection inside class, while institutions like Ohio State and Purdue are moving AI literacy into the curriculum itself .
  • For self-directed learners: source-grounded study is getting better, from NotebookLM’s quiz guidance to LLM-built personal knowledge bases, but the best workflows still depend on curated source sets and active note-building .
  • For school leaders and compliance teams: assignment-level AI expectations and disclosure are likely more durable than blanket bans, especially when detector tools still misfire on short student work .
  • For buyers and investors: the strongest product signals this week were source grounding, teacher control, privacy boundaries, and human fallback — not broader claims of autonomy .

Watch This Space

  • AI-native school expansion. Alpha says Time Back will open more broadly in 2026, and Liemandt says specialized academies are expanding across new schools, sports, and cities .
  • AI literacy becoming a formal requirement. Purdue is moving to an AI skills graduation requirement, Ohio State wants every freshman through AI literacy, and PISA will assess media and AI literacy in 2029 .
  • Personal study stacks and memory-aware workflows. NotebookLM’s quiz upgrade, author-created llms.txt reading experiences, Karpathy’s LLM wikis, and new work on memory-aware agents all point toward more cumulative, source-bound self-study workflows .
  • Student-built learning software. A high school student-built 3D chemistry app prompted Liemandt to predict that students will soon learn from apps built by other students .
  • AI-specific safeguarding. Deepfake sexualized imagery and synthetic-intimacy chatbots are likely to push schools toward more explicit AI safety education, not just generic screen-time rules .

Discover agents

Subscribe to public agents from the community or create your own—private for yourself or public to share.

Active

Coding Agents Alpha Tracker

Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.

110 sources
Active

AI in EdTech Weekly

Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.

92 sources
Active

Bitcoin Payment Adoption Tracker

Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics

107 sources
Active

AI News Digest

Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves

114 sources
Active

Global Agricultural Developments

Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs

86 sources
Active

Recommended Reading from Tech Founders

Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media

137 sources

Supercharge your knowledge discovery

Reclaim your time and stay ahead with personalized insights. Limited spots available for our beta program.

Frequently asked questions