We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Hours of research in one daily brief–on your terms.
Tell us what you need to stay on top of. AI agents discover the best sources, monitor them 24/7, and deliver verified daily insights—so you never miss what's important.
Recent briefs
Your time, back.
An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers one verified daily brief.
Save hours
AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.
Full control over the agent
Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.
Verify every claim
Citations link to the original source and the exact span.
Discover sources on autopilot
Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.
Multi-media sources
Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.
Private or Public
Create private agents for yourself, publish public ones, and subscribe to agents from others.
Get your briefs in 3 steps
Describe your goal
Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.
Confirm your sources and launch
Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Sam Altman
3Blue1Brown
Paul Graham
The Pragmatic Engineer
r/MachineLearning
Naval Ravikant
AI High Signal
Stratechery
Receive verified daily briefs
Get concise, daily updates with precise citations directly in your inbox. You control the focus, style, and length.
David Heinemeier Hansson (DHH)
Armin Ronacher
Mike Krieger
🔥 TOP SIGNAL
Stop optimizing for more agent freedom before you optimize the loop. DHH says agents became production-useful once they could stay inside a fast terminal cycle of run code, run tests, rewrite, retry, while Armin Ronacher argues that coding sessions are exactly the kind of measurable interaction models keep getting reinforced on, which is why coding harnesses remain the safest substrate to build on even for adjacent tasks . Mike Krieger lands on the same product lesson: ship the smallest V1 fast, because agents add features easily but still need human judgment on what to cut .
🛠️ TOOLS & MODELS
- Claude Code auto mode — now rolling out to teams. Anthropic says auto mode uses tested classifiers to make approval decisions as a safer middle ground than either constant prompts or fully skipping permissions; it is now available to Claude for Team users via
claude --enable-auto-mode, thenShift + Tab, and Anthropic engineer catwu says it has become a daily-driver workflow for most of their team . Read: engineering blog - Cursor self-hosted cloud agents. Same cloud-agent harness, but run on your own infrastructure so code and tool execution stay inside your network . Read: self-hosted cloud agents
- LangSmith Fleet skills. Skills can now be pulled straight into Claude Code, Cursor, or Codex from the LangSmith CLI, which means workspace knowledge moves into local coding agents without copy-paste . Try it: LangSmith Fleet
- T3 Code vs. Conductor. Theo's current practitioner read: Conductor still wins on UX, especially worktrees, but T3 Code wins on Codex support, performance, and openness; Claude support is slightly worse in T3 Code
- datasette-llm 0.1a1. Simon Willison shipped purpose-based model routing for Datasette plugins: one model for enrichment, another for SQL query assistance, exposed through
register_llm_purposes()andllm.model(purpose=...). Release: datasette-llm 0.1a1
💡 WORKFLOWS & TRICKS
- Minimal V1 first, then rewrite fast. Mike Krieger's current loop is simple: build the smallest version that proves the problem, get it out quickly, then do V2 immediately if the first cut overbuilt. His example: Cowork V1 shipped in 10 days, and he says rewrites now take days instead of the old year-long rewrite death march .
- Force self-verification in the prompt. Krieger's line to steal is
prove to yourself and me it works as intendedbefore a PR. Pair that with a harness that actually exercises agent behavior, not just unit tests, because agent-native failures show up in weird end-to-end states you would never have written a narrow test for . - Keep the main loop responsive, and package your rules. Krieger strongly prompts Claude Code not to do everything itself, but to delegate heavy work to sub-agents so the main run loop stays interactive. He also packages durable product principles as Claude Code skills, and LangSmith Fleet now makes that same skill pattern portable across Claude Code, Cursor, and Codex .
- Treat context like scarce RAM. Armin's durable pattern: partial file reads, overflow files on disk, grep large outputs instead of dumping them into context, and explicit
file changed, reread itmessages whenever humans or other agents mutate state . - Start with fewer tools than you think. Armin says PI works with four tools; custom tool sprawl and proprietary blobs in context make agents worse, not better. His other bias is toward common interfaces: SQL over a custom DSL, JS-like syntax over Lisp/Scheme, plus informative errors when the mini-language cannot do something .
- Push cross-cutting concerns below the agent. Geoffrey Huntley's practical anti-slop move: move logging and tracing into middleware or effects so forgetful agents do not need to remember it in control flow. His summary is blunt: less agent work, better outcomes .
👤 PEOPLE TO WATCH
- Mike Krieger. High signal because he is talking from Anthropic Labs and his own prototyping loop, not theory: rebuilds in hours, ships minimal V1s fast, packages product principles as skills, and keeps experimental teams tiny when exploring agent-native products .
- Armin Ronacher. One of the clearest voices on model-robust agent design right now: he explains why coding harnesses transfer, where they break, and he is still openly skeptical that current agent-generated code clears a high quality bar .
- DHH. Useful because Basecamp is already using multiple agents internally, and he gives the blunt version of what changed: terminal tools and fast feedback loops, not nicer chat. He also says an internal CLI was quickly pushed to roughly 65% agent-written, then refined toward 97% over time .
- Mario Zechner + Simon Willison. Best counterweight to the speed cult: Zechner, who created the Pi agent framework used by OpenClaw, argues for hard limits on daily generated code and hand-written architecture and APIs; Simon agrees the real problem is cognitive debt, even if he questions the write-it-by-hand part .
Give yourself time to think about what you're actually building and why. Set limits on how much code you let the clanker generate per day, in line with your ability to actually review it
🎬 WATCH & LISTEN
- 3:50-7:57 — Armin on the durable bet. Best explanation today for why coding agents keep compounding: coding sessions are measurable, heavily reinforced, and teach the file/bash/test patterns that transfer to non-coding tasks .
- 11:36-15:37 — Armin on context-efficient tools. Worth it for the concrete patterns alone: partial reads, overflow files, grep-first retrieval, and why hidden state changes can make an agent confidently lie about validation .
- 40:54-43:26 — Mike on the product boundary problem. Good framing of the design space between tightly permissioned agents and wide-open tools like OpenClaw: the useful product is somewhere between gated and YOLO .
📊 PROJECTS & REPOS
- Pi agent framework. This repo matters because it powers OpenClaw, and its creator is now one of the loudest voices arguing for slower, more reviewable agent workflows . Repo: pi-mono
- datasette-llm 0.1a1. New base plugin for wiring purpose-based model routing into other Datasette extensions, including
datasette-enrichments-llm; practical if you want task-specific model selection without hardcoding one model everywhere . Docs: README - T3 Code. The traction signal today is practitioner demand: Theo says people keep asking for the T3 Code vs. Conductor tradeoff, and his current answer is OSS plus much better Codex support and much better performance .
Editorial take: the teams getting real leverage are shrinking the problem, not romanticizing autonomy — faster loops, smaller tool surfaces, stricter verification, and tighter human review are still the winning pattern .
Cohere
Chubby♨️
Nathan Benaich
Top Stories
Why it matters: This cycle mixed a research milestone, a new benchmark gap, cheaper frontier-model variants, and a deployment-level inference breakthrough.
Sakana AI took The AI Scientist into Nature
Sakana AI said The AI Scientist: Towards Fully Automated AI Research is now published in Nature. The system is described as an agent built from foundation models that can run the full machine-learning research loop: invent ideas, write code, run experiments, and draft the paper . Sakana also said AI Scientist-v2 produced the first fully AI-generated paper to pass rigorous human peer review, and that the Nature paper introduces an Automated Reviewer that matches human judgments and exceeds standard inter-human agreement . The paper reports a "scaling law of science": stronger foundation models—and, in later commentary, more inference compute—produce higher-quality generated papers . The work is open-source and was done with collaborators at UBC, the Vector Institute, and Oxford .
Why it matters: this is one of the clearest public attempts to combine end-to-end research automation, peer-reviewed validation, and open release in a single result.
ARC-AGI-3 opened with a wide human-AI gap—and immediate debate about the metric
ARC-AGI-3 was released as a benchmark for agentic intelligence in interactive reasoning environments, with the stated goal of measuring whether an AI can match human-level action efficiency on unseen tasks . ARC Prize said humans solve 100% of environments on first contact with no prior training or instructions, while frontier AI models are under 1% at launch . A set of posted scores put Gemini 3.1 Pro at 0.37%, GPT-5.4 at 0.26%, Opus 4.6 at 0.25%, and Grok 4.2 at 0% . François Chollet separately said ARC-AGI is not a final exam for AGI, but a moving target aimed at the residual gap between what is easy for humans and hard for AI .
"Most benchmarks test what models already know, ARC-AGI-3 tests how they learn"
The benchmark design is already under scrutiny. Official posts say the human baseline uses the action count of the second-best tester out of 10, and a score measures how close a system gets to matching or exceeding that baseline . External commentary noted quadratic scaling of steps and warned that ARC-AGI-3 scores should be interpreted differently from standard benchmarks , while other critics questioned the "human score 100%" framing and whether prior puzzle or game exposure makes the human comparison less clean than advertised .
Why it matters: ARC-AGI-3 is now both a hard new public target for agentic systems and a live debate over how progress should be measured.
OpenAI widened the GPT-5.4 line with cheaper mini and nano models
Artificial Analysis reported that OpenAI released GPT-5.4 mini and GPT-5.4 nano, both with the same reasoning effort modes as GPT-5.4, multimodal image input, and a 400K-token context window . Pricing was listed at $0.75/$4.50 per 1M input/output tokens for mini and $0.20/$1.25 for nano, versus $2.50/$15 for GPT-5.4 . The same evaluation said nano outperformed Claude Haiku 4.5 and Gemini 3.1 Flash-Lite Preview on several reasoning and terminal-style tests, while mini posted stronger agentic GDPval-AA scores than Gemini 3 Flash Preview but trailed Claude Sonnet 4.6 . The tradeoff is efficiency: both models used far more output tokens than peers at highest reasoning effort, and both showed weak AA-Omniscience results driven by high hallucination rates .
Why it matters: OpenAI is pushing its frontier line further downmarket, but the benchmark data suggests buyers still need to watch token consumption and hallucination behavior.
TurboQuant moved from paper result to open inference deployment
Google Research introduced TurboQuant as a compression algorithm that cuts LLM key-value cache memory—the working memory models use during generation—by at least 6x and delivers up to 8x speedup with zero accuracy loss . A separate technical summary said the method needs no retraining, converts data into polar coordinates to remove storage overhead, and applies a 1-bit correction step; tests on Gemma and Mistral models reportedly matched full-precision quality on question answering and code generation while also beating prior methods in vector search . The result quickly showed up in the open serving stack: one developer said they implemented TurboQuant for vLLM and fit 4,083,072 KV-cache tokens on a USB-charger-sized HP ZGX, which the vLLM project then praised publicly .
Why it matters: this is a case where an inference paper is already showing concrete deployment effects in open tooling.
Research & Innovation
Why it matters: Beyond the headline stories, this cycle emphasized self-improving agents, shared memory, hybrid architectures, and native multimodality.
- Hyperagents: Meta and collaborators introduced self-referential agents where the self-improvement process itself is editable, rather than fixed . The DGM-Hyperagent combines a task agent and a meta agent in one modifiable program, discovering improvements such as persistent memory and performance tracking that transfer across domains . Reported gains included paper review accuracy moving from 0.0 to 0.710, robotics reward design from 0.060 to 0.372, and zero-shot transfer to Olympiad-level math grading at 0.630 .
- MemCollab: New research on memory sharing across heterogeneous agents uses contrastive trajectory distillation to separate universal task knowledge from agent-specific biases . In plain terms, it compares how different agents reason through the same task to extract shared constraints, then uses task-aware retrieval to apply the right constraints later . The authors report gains in both accuracy and inference-time efficiency for math reasoning and code generation, even across model families .
- Hybrid Associative Memory (HAM): ZyphraAI proposed a Transformer/RNN hybrid that lets the RNN handle predictable tokens and the Transformer handle surprising ones based on a user-selected KV-cache budget . At 800M parameters, HAM was reported to outperform pure Transformer, pure RNN, and prior hybrid baselines on language modeling and long-context retrieval while using only 50% KV cache . The architecture also allows adjustable KV cache at inference time and even within a single sequence .
- LongCat-Next: Meituan introduced a native autoregressive multimodal model with 68.5B total parameters and 3B active parameters, built on a shared discrete token space across language, vision, and audio . The model combines a new any-resolution vision transformer with capabilities in OCR, charts, GUI understanding, document analysis, arbitrary-resolution visual generation, audio comprehension, and voice cloning .
Products & Launches
Why it matters: New releases this cycle were less about one giant model launch and more about turning AI into usable, task-specific software.
- AssemblyAI Medical Mode: AssemblyAI added a medical correction layer on top of Universal-3 Pro, aimed at fixing the drug names, dosages, and terminology errors that make general-purpose ASR unsafe for clinical workflows . The company says the base model's noise handling and latency stay the same, while the correction focuses on key medical tokens; it is available for both pre-recorded and streaming audio, with HIPAA BAA included .
- Lyria 3 Pro rollout: Google DeepMind and Gemini said Lyria 3 Pro now supports tracks up to three minutes, with structure controls for intros, verses, choruses, and bridges . Access is rolling out in the Gemini App for Google AI Plus, Pro, and Ultra users, while developers can build against it in Google AI Studio and the Gemini API . Google also said all Lyria 3 and Lyria 3 Pro outputs carry SynthID watermarking .
- Claude work tools on mobile: Anthropic said Claude's work tools are now available on mobile, including access to Figma designs, Canva slides, and Amplitude dashboards from a phone .
- Cursor self-hosted cloud agents: Cursor said its cloud agents can now run on customer infrastructure, keeping code and tool execution inside the user's own network while preserving the same agent harness and experience .
- LangSmith Fleet shareable skills: LangChain added shareable skills to LangSmith Fleet, letting teams capture domain knowledge once, attach it to any agent, and create skills from prompts, past chats, manual entry, or templates .
Industry Moves
Why it matters: Hiring patterns, partnerships, and funding are showing where companies think the next wave of value will come from.
- AI labs are hiring for go-to-market and adoption at scale: Epoch AI's analysis of job postings at OpenAI, Anthropic, xAI, and DeepMind said sales and go-to-market roles are now the largest hiring category at OpenAI and Anthropic, at 31% and 28% of open roles respectively, while research roles account for 7% and 12% . The same analysis pointed to heavy hiring for "AI Success Engineer" and "Forward Deployed Engineer" roles, 15 OpenAI roles tied to a consumer hardware device, and growing investment in robotics at both OpenAI and DeepMind .
- Cohere partnered with RWS: Cohere said its frontier models are being integrated into RWS Group's Language Weaver Pro to provide enterprise-grade translation for high-stakes environments, including enterprise and government use cases .
- Gumloop raised $50M: Gumloop raised a $50M Series B led by Benchmark, bringing total funding to $70M for its no-code AI agent automation platform .
- AirStreet closed a larger AI-first fund: AirStreet said it raised $232,323,232 for Fund III to back AI-first companies in the U.S. and Europe, making it the largest solo GP venture firm in Europe by its own description .
Policy & Regulation
Why it matters: AI policy is now reaching physical infrastructure, while labs are continuing to publish formal governance frameworks for model behavior.
- Sanders targets data-center buildout: The Washington Post said Sen. Bernie Sanders will introduce legislation to block construction of new data centers until lawmakers enact AI regulations .
- OpenAI highlighted its Model Spec: OpenAI described the Model Spec as the public framework for how its models are intended to behave, covering what they should and should not do as capability grows . The company said the framework includes a chain of command for resolving conflicting instructions and evolves over time through real-world use, feedback, and new model capabilities .
- Anthropic documented auto-mode safety decisions: Anthropic said Claude Code auto mode is meant to be a safer middle ground between prompting for approval on every action and running without permission prompts, using built and tested classifiers to make approval decisions .
Quick Takes
Why it matters: These items were smaller, but they point to where tooling, interfaces, and agent infrastructure are moving next.
- Google Research's Vibe Coding XR turns prompts into interactive, physics-aware WebXR apps through Gemini Canvas and XR Blocks
- LLaDA2 became the first discrete diffusion pipeline for text in Diffusers; it uses a 16B total-parameter MoE architecture
- Browserbase and PrimeIntellect launched BrowserEnv so users can train browser agents or custom models for their own workflows in a few hours
- A 24B model was shown running locally in a web browser at about 50 tokens/sec on an M4 Max using WebGPU and Transformers.js
- Georgia Tech SSLab's Vibe Radar tracks public CVEs linked to AI-generated code, scanning 50k+ advisories and finding dozens of confirmed cases across tools such as Claude Code, Copilot, and Cursor
- Anthropic launched inline interactive charts, diagrams, and visualizations in Claude chat, in beta across all plan types
- Together AI added four new image models spanning text rendering, character consistency, search-grounded generation, and unified generation/editing on its serverless stack
- ARC Prize 2026 went live with three tracks and $2,000,000 in prizes
hardmaru
Nathan Benaich
ARC-AGI-3 sets a harder bar for agentic intelligence
ARC Prize and François Chollet launched ARC-AGI-3, an interactive benchmark designed to measure agentic intelligence through first-contact reasoning environments rather than static puzzles . To beat it, a system must match or exceed human action efficiency on novel environments the first time it sees them; scoring is based on how close an agent gets to the action count of the second-best human tester, which ARC uses to avoid outlier performance .
Humans solved 100% of tested environments with no prior training or instructions, while frontier reasoning models are still below 1% on the private test set . Chollet says ARC-AGI-3 is currently the only unsaturated agentic AI benchmark, and that sudden leaderboard jumps may flag real capability shifts, as earlier ARC jumps did for reasoning and agentic coding . Twenty-five environments are public at arcprize.org and ARC Prize 2026 offers $2 million across live competition tracks .
"If every new task requires human intervention, it’s not general. If every new task requires brute-forcing, it’s not human-level."
Why it matters: Chollet keeps stressing that ARC is not a final AGI exam but a moving target aimed at the residual gap between what is easy for humans and hard for AI . Today’s scores suggest that interactive exploration, on-the-fly world modeling, and human-like learning efficiency remain open problems .
Sakana AI says automated research has reached a Nature milestone
Sakana AI’s Nature paper says "The AI Scientist" can automate the full machine-learning research loop, from inventing ideas and writing code to running experiments and drafting a manuscript . The company says AI Scientist-v2 produced the first fully AI-generated paper to pass a rigorous human peer-review process, and the overall project is now published in Nature .
The paper also introduces an Automated Reviewer that Sakana says matches human review judgments and exceeds standard inter-human agreement, and it reports a scaling law in which stronger foundation models produce higher-quality scientific papers . The paper is available in Nature and the project remains open source on GitHub.
Why it matters: This is a stronger claim than AI-assisted research. It suggests a leading lab now sees end-to-end research execution—not just coding or drafting—as something foundation models can increasingly handle as the base models improve .
Compute is moving from capacity problem to policy battlefield
According to Matt Wolfe’s summary of the press conference, Bernie Sanders and Alexandria Ocasio-Cortez introduced the Artificial Intelligence Data Center Moratorium Act, which would pause new U.S. data-center construction until federal AI legislation creates protections for workers and consumers, prevents environmental harm, and defends civil rights . The case for the bill centered on electricity costs rising more than 36% since 2020, projected data-center electricity demand growth of 15–20% per year, and specific pollution and water-use concerns around AI infrastructure .
The same discussion also surfaced the main pushback: Microsoft has pledged to self-fund grid and water measures around its data centers, Google, Microsoft, and OpenAI have committed to pay for power plants and grid upgrades, and a U.S.-only pause could push buildouts abroad while making compute scarcer for smaller companies and individual users .
A separate NVIDIA-backed white paper points to one possible technical response. In a UK trial, Emerald AI, NVIDIA, EPRI, National Grid, and Nebius said an AI cluster followed more than 200 power targets with 100% compliance, cut power use 30% in under 40 seconds during simulated demand spikes, and kept high-priority workloads at peak throughput while slowing flexible jobs . The group argues this could help AI factories connect to the grid faster and reduce the need for larger permanent build-outs .
Why it matters: AI infrastructure is no longer just a supply problem; it is becoming a policy fight over electricity prices, water use, environmental impact, and access to compute .
Model behavior is becoming public infrastructure
OpenAI used its latest podcast and documentation to frame the Model Spec as a public, open-source rulebook for intended model behavior—roughly 100 pages covering high-level goals, hard rules, defaults, steerability, and edge-case examples . At the center is a "chain of command": OpenAI instructions outrank developer instructions, which outrank user instructions, though the company says it tries to keep as many policies as possible at low authority levels so users can still steer the model .
OpenAI says the spec has recently expanded to cover multimodal inputs, agent autonomy, and under-18 mode, and that honesty now outranks confidentiality after seeing cases where hidden developer instructions could interact badly with user intent . Researchers also say models are improving on spec-compliance evals through deliberative alignment, and that chain-of-thought can help reveal strategic deception or scheming .
That emphasis on behavior control also showed up in Yoshua Bengio’s launch of Law Zero. Bengio warned that frontier systems are already exhibiting dangerous behaviors such as deception, hacking, self-preservation, and blackmail in some experiments, and said his new nonprofit will build "Scientist AI" systems focused only on truthfulness so they can estimate harm probabilities and veto risky actions as guardrails over other models .
Why it matters: Across labs and researchers, model behavior is being treated less as a hidden alignment detail and more as a product, governance, and systems-design layer in its own right .
Also notable
- xAI’s video push accelerated: posts amplified by Elon Musk claimed Grok-Imagine now leads DesignArena’s video rankings, including #1 in video, video-to-video, image-to-video, and multi-image-to-video, ahead of Veo 3.1, Sora, and Kling . If those rankings hold, xAI has moved from late entrant to leaderboard leader in video generation within a few months .
The community for ventures designed to scale rapidly | Read our rules before posting ❤️
Melissa Perri
Big Ideas
1) Real validation still beats simulated certainty
The strongest discovery theme this cycle: PMs should not confuse simulated insight with customer validation. The Mom Test recommends avoiding leading questions, asking about past behavior, and listening more than pitching. In the 'virtual customer' discussion, commenters argued that real validation still comes from customers' willingness to spend time or money, not from synthetic personas or LLM stand-ins.
'Who is this for?'
Why it matters: This is the shortest path to lower product waste and better prioritization. PMs using the Mom Test well can reduce wasted development time and build products people actually want.
How to apply: Start each new idea with a target-user hypothesis, interview on past behavior, then move quickly to prototypes or A/B tests with real users rather than persona debates.
2) Strategy quality is showing up as sharper constraints
Across Viator, Xero, and eBay, the pattern is the same: fewer bets, tighter segments, and clearer differentiation. Viator cut annual 'big bets' from roughly 30 to 3 after years of OKR tightening and reported better progress by doing fewer things. Xero's CPTO argues that serving too many customer segments creates a hodgepodge product, while eBay's turnaround required accepting that it was not Amazon and focusing on its own value proposition.
Why it matters: Many roadmap problems are really strategy-definition problems. Teams slow down when they try to satisfy every stakeholder, every segment, and every competitor at once.
How to apply: Limit top-level company problems, define the segment you serve best, and explicitly explain what you are not trying to be. Then connect each priority to that story so teams can optimize in the same direction.
3) In marketplaces, the flywheel should come before the roadmap
Yelp's product leaders frame two-sided product work around a clear conflict-resolution model and a single marketplace metric: connections between consumers and local businesses. They define the flywheel as the self-sustaining growth mechanism, warn that teams can optimize the wrong thing if they start with revenue instead, and ground demand around concrete needs: consumers care about quality, price, and timing; businesses care about high-intent leads in their service area.
Why it matters: Without a flywheel model, PMs can ship features that move local metrics while weakening the network.
How to apply: Write down how conflicts between the two sides get resolved, pick the metric that best represents a meaningful marketplace match, and use that as the filter for roadmap choices.
4) AI product design is moving toward workflow fit, trust, and visible value
Several notes point to the same AI pattern. One startup founder found that leading with 'AI' hurt conversions, while outcome-led positioning worked better. The same founder also saw better retention from simple agents with obvious weekly value, and even experienced churn when value became invisible. Another founder concluded that standalone AI products can miss product-market fit when customers still want WhatsApp access or in-person reassurance, so founders need clarity on whether they are replacing humans or augmenting them. Xero's CPTO argues that SaaS is shifting toward conversational, insight-oriented interfaces, but pairs that with guardrails and human review for high-accuracy workflows.
'People only care what it does for them.'
Why it matters: AI novelty is not enough. Adoption depends on whether the experience fits existing behavior, preserves trust, and keeps value legible over time.
How to apply: Decide first whether AI is augmenting or replacing a human workflow, design around channels users already trust, and add review loops or guardrails anywhere accuracy matters.
Tactical Playbook
1) A five-step discovery loop before you fund a feature
- Start with a sharp user hypothesis: who is this for?
- Run interviews that avoid leading questions, focus on past behavior, and force you to listen more than you pitch.
- Put one real user in front of the team and let them try to solve real tasks. This is the fastest 'show, don't tell' mechanism in the set of notes.
- Prototype and ask customers directly; if you can, simulate the experience with A/B testing tools before full rollout.
- Treat LLMs and synthetic personas as aids, not proof. The notes are explicit that they do not replace real customer commitment.
Why it matters: It compresses discovery while keeping it grounded in behavior instead of opinion.
How to apply: Use this loop in roadmap planning whenever confidence is being inferred from internal debate instead of external evidence.
2) A prioritization model for large teams: big bets + door types + clear owners
- Narrow annual focus to a small set of company problems. Viator's progression from 30 bets to 3 is the clearest data point here.
- Assign cross-functional pods to those problem spaces, and use lightweight charters for smaller bottom-up work.
- Separate one-way-door decisions from two-way-door decisions. Move fast on reversible experiments; slow down on hard-to-reverse calls like pricing.
- Name a decision owner for each initiative and measure them on adoption, utilization, and whether customers see value.
Why it matters: This keeps teams fast without pretending every decision deserves the same process.
How to apply: In planning docs, add two explicit fields: reversibility and decision owner.
3) How to build influence when you do not control the org chart
- Run a listening tour across engineering, marketing, sales, and other key partners before pushing process changes.
- Earn permission to influence; the Mind the Product interview makes the point that PM influence is not a birthright.
- Explain strategy in a way even the lowest-level PM can connect to priorities, sequence, and tradeoffs; unclear strategy communication destroys trust.
- Teach through evidence: a customer observation session or a company-wide talk can show product's value faster than abstract process language.
- Improve brick by brick, not through big-bang transformation plans.
Why it matters: The notes repeatedly show that PM operating models fail when they are announced before they are understood.
How to apply: If you are introducing discovery, start by creating one visible win that another function can feel immediately.
Case Studies & Lessons
1) Wish: fix marketplace health before chasing growth
When Wish began its turnaround, the problems were basic marketplace failures: poor quality, roughly 30-day shipping, and unvetted merchants. The team closed the marketplace to new merchants, made onboarding invite-only, introduced seller standards and penalties, and pushed shipping toward a 24-hour ship window. Delivery times dropped from 28-30 days to 10 days in some places and 15 days in most markets. NPS moved from -4 to 36, refund rates fell below industry standards, and retention plus average transaction value improved. Only after that did the company shift to phase two: differentiating around discovery shopping and hobbies rather than just low price.
Takeaway: Turnarounds often require sequencing. Fix the trust and operations floor first; differentiate second.
2) A German industrial firm saved six months by doing two weeks of discovery
A traditional Mittelstand company had a polished request-to-delivery conveyor belt, but little visibility into whether shipped work created impact. Its engineering teams behaved more like a service center than problem owners. A two-week discovery sprint on a budgeted feature revealed that the client already had a workable workaround, saving the team about six months of effort. The company then rolled out the new approach gradually; about two years later, more than half the teams were working this way.
Takeaway: Discovery is not a delay to delivery. Sometimes it is the highest-ROI delivery work you can do.
3) Deep tech startup Decentric found traction only after narrowing the use case
Decentric had strong confidential-computing IP but no PM discipline and no clear application focus. Product discovery reframed the problem around finding a use case customers would pay for, and the company narrowed away from many possible industries toward edtech. The outcome, according to the interview, was a successful edtech business working with major European publishers.
Takeaway: Strong technology does not rescue weak problem selection. Discovery is often the mechanism that converts invention into a market.
Career Corner
1) Communication clarity improves when you shorten the first answer, not when you eliminate every pause
One PM described losing flow in behavioral interviews by pausing mid-story to think. The replies added useful nuance: pausing can be positive unless it is constant, mid-sentence, or overly long, so the first step is to get feedback from multiple senior people without priming them. Another suggestion was to explain things as if speaking to a tired 5-year-old: keep the first answer short, then add detail as questions come. A commenter also pointed out that overly terse answers force the audience to fill in gaps themselves.
How to apply: Practice concise first-pass answers, then expand only when asked. ChatGPT can help refine stories to a point, but mock interviews without thoughtful pushback may not surface real problems.
2) If a promotion comes with turnaround expectations, quantify the upside before discussing pay
One PM facing a possible director promotion was being asked to turn around a stagnant product line in a competitive market, with an estimated $20-30M in added annual profit if successful. The proposed negotiation structure was: take the standard 8-10% bump, but ask for an additional proportional reward if a defined 3-year profit goal is met, with no payout if it is missed. A commenter added a more basic step first: benchmark director-level PM roles at similar companies in your area.
How to apply: Before negotiating, write down the expected business impact, the time horizon, and the comparable market rate for the role you are stepping into.
3) AI is changing the boundary between PM and adjacent roles, but expertise still compounds
Sachin Rekhi notes that PMs are starting to use AI for work historically done by researchers, designers, analysts, and marketers. His advice to those disciplines is not to defend the old boundary, but to become the team expert in applying AI well and in defining where human involvement is still needed. His examples are practical: designers using AI prototyping tools produce better outputs than PMs because they bring design expertise, and research teams using AI-moderated interviews let PMs test far more concepts than before.
How to apply: Build AI fluency inside your functional specialty, not apart from it. Rekhi's bottom line is that the AI-fluent are most likely to endure.
Tools & Resources
- momtest.io — a practice resource for learning the Mom Test approach to unbiased customer interviews. Use it when your team needs a shared discovery language before solutioning.
- Optimizely and Split.io — cited as ways to simulate new experiences with real users before full rollout. They are not 'virtual customers,' but they are closer to real validation than persona-only debate.
- Small-team feedback stack check — one founder researching Canny alternatives argued that products like Frill, Featurebase, Hellonext, and Productboard often expand into AI roadmap synthesis and stakeholder dashboards that may be irrelevant for small teams. The useful template here is the question set: do you mainly need to collect and retain feedback, should customers see each other's requests, do you need a public roadmap, and would you pay $9-19/month for that versus using Notion or informal methods?
- Customer empathy kit — Xero's CPTO described a simple but strong operating stack for B2B learning: advisory boards, day-in-the-life shadowing, support exposure, and demo orgs for regular product use. Treat these as ongoing instruments, not one-off research events.
Marc Andreessen 🇺🇸
Most compelling recommendation
Marc Andreessen’s clearest organic recommendation today is Tyler Cowen’s Birth of the AI book, published on Marginal Revolution.
- Title:Birth of the AI book
- Content type: Blog post / article
- Author/creator: Tyler Cowen
- Link/URL:https://tylercowen.com/marginal-revolution-generative-book/
- Who recommended it: Marc Andreessen
- Key takeaway: Andreessen frames it as the “Birth of the AI book.”
- Why it matters: It is a direct, attributable pointer from Andreessen to a Tyler Cowen resource on a generative AI book, with the destination link included for immediate reading
"Birth of the AI book."
Why this stands out
The recommendation is brief, but it is highly usable: the resource is clearly identified, the author is named, and the exact URL is available in the source material .
Leader John Thune
Grain Markets and Other Stuff
Ag PhD
1) Market Movers
- Chicago grains finished higher on March 25. Soybeans closed at $11.73/bushel (+1.60%), corn at $4.67 (+1.08%), and wheat at $5.99 (+1.61%) . The move was tied to Middle East risk premium and distorted oil spreads, with Dubai oil near $160/barrel, Oman near $152, and Rotterdam near $112 versus Brent around $95. One analyst also linked the rally to hotter-than-expected PPI data and a rotation into grains and energy as relatively undervalued hard assets .
- U.S. corn remains technically firm, but cash movement is capping nearby enthusiasm. Analysts said old-crop basis has been weighed down by farmer selling after a 50-cent rally from January lows to generate pre-planting cash flow . Even so, old-crop corn is trading above all major moving averages, with the 100-day above the 200-day and a series of higher highs and higher lows . Corn also found support from EPA's decision to allow summertime E15 sales and from expectations around upcoming RVO levels . A separate market commentator cautioned that temporary E15 waivers may not add much corn demand because ethanol grind is already near capacity and no new plants are being built .
- U.S. soybeans are being supported more by biofuels than by old-crop export certainty. Market commentary described old-crop China demand as muted after the U.S.-China summit was pushed to May 14-15, with talk that a potential 8 million metric ton purchase could be delayed into June or July. At the same time, funds were said to remain long because of optimism around RVOs and current crush margins .
- Wheat is drawing support from crop stress and global production risk. In the U.S. Southern Plains, Oklahoma was only 14% good/excellent and Texas 16%, with some areas already considered unlikely to be harvested . Outside the U.S., Australian farmers are reducing wheat acreage because fertilizer and diesel availability are tightening, while the EU is already assuming production will be 6-8% below last year and could fall further if spring fertilizer is short .
- Livestock signals were mixed. U.S. cattle traded lower on weaker box beef values and higher grain prices , but analysts also noted that drought-driven cattle movement is likely to leave a tighter back-end supply situation later on .
2) Innovation Spotlight
- U.S. row-crop/livestock integration posted standout yield and profitability data. At Precision Planting's PTI farm in Pontiac, Illinois, the Stock Cropper system produced 434.9 bu/acre corn in 2024 and 426.7 bu/acre in 2025, breaking the prior site record by 30 bushels and delivering the highest profitability among the innovations tested there . The system combines autonomous pens with rotational grazing between row crops and can handle multiple livestock species in the same setup .
- The same platform is moving down-market with a lower-cost autonomous drive unit. The new Cluster Cluck Drive is a bolt-on, solar-powered motor system designed to move pens of up to 600 pounds by app, remote, or key fob . The targeted price is $2,500, versus roughly $10,000 for a fully featured Pico unit, with commercial release targeted for 2027.
- Corn biological seed-treatment tools are moving deeper into rootworm control. AgExplore's Grow Pack CT, launched in 2024 for corn, is a planter-box biological treatment that replaces talc/graphite lubricants and combines Trichoderma for early disease defense, a nutrient-solubilizing biological, and a root-colonizing biological repellent for corn rootworm . In Indiana farm trials near Purdue, users reported rootworm feeding was close to zero across soil types and hybrids, with better late protection than traditional insecticides that may wear off after 30-40 days. The economic hurdle was described as roughly 3-4 bushels/acre, and in the highest-pressure zones the recommendation is still to pair it with traits or insecticides . The product is EPA-registered and commercially launched .
3) Regional Developments
- Brazil-China soybean trade is under active phytosanitary negotiation. China blocked about 20 ships in March after detecting prohibited weed seeds, and roughly 80% of Brazilian soybean production goes to China . Brazilian agriculture officials and exporters are in Beijing from March 20-29 to resolve the issue, with discussion centered on inspections and a possible relaxation of China's zero-tolerance approach to impurities . Despite the dispute, ANEC raised its March soybean export projection to 16.7 million tons, up 2.6% from the previous estimate and 6.5% above March 2025 .
- Brazil's safrinha corn clock is tightening. Second-crop corn planting is running 4% behind the same point last year . Mato Grosso has finished planting and is getting supportive rain , but São Paulo has planted only about 20% of intended area and remains roughly 65-70% behind last year . Weather models point to neutral conditions through autumn before a moderate-to-strong El Niño returns in winter . Center-West rains should persist into the second week of May, but corn planted in April faces a shorter development window and lower yield potential . Above-normal autumn temperatures are also expected to increase pest pressure .
- Rio Grande do Sul is dealing with a combined drought and diesel problem. Canal Rural reported that prolonged drought and diesel scarcity are threatening summer crops, livestock, and municipal services . In the state, S10 diesel was reported up 24.4% from pre-war levels , and some producers are being rationed to 200-300 liters, which is inadequate during harvest . At the state level, 166 municipalities have reported diesel supply problems, with severe shortages affecting public services and road maintenance .
- Australia and the EU are becoming clearer supply-side watch points. Australian farmers are shifting away from wheat because fertilizer from China, Morocco, and Saudi Arabia is constrained and diesel is tight . In the EU, high internal input costs and uncertain fertilizer availability already point to production down 6-8% from last year, with risk of a larger decline if spring supply remains tight .
4) Best Practices
- Test soil before buying more nitrogen. Ag PhD noted that many growers assume only 10-40 lbs/acre of nitrogen remain at the start of the season, but tests sometimes show 100+ lbs/acre, regardless of the prior crop . They also estimate that each 1% of organic matter can release 20-30 lbs of N over the season, mainly from May through October, not all at once . The practical takeaway is to soil-test first, then size N programs to actual carryover and mineralization potential .
- For Brazilian soil management, no-till still comes back to three basics. True plantio direto means no soil disturbance, permanent soil cover, and species diversification through rotation or succession . Research and field results cited by Canal Rural linked that package to maintained productivity, better soil chemical, physical, and biological properties, more organic matter, higher infiltration, and greater biodiversity . When combined with crop-livestock-forest integration, pasture can keep soil covered through the dry winter, support grazing, and still serve as cover for the next summer crop .
- Tifton-85 is being used as a drought-management forage strategy in Brazil's semi-arid Northeast. In Ceará, cloned and genetically improved Tifton-85 is being used to stabilize meat and milk production under dry conditions . The grass combines high-temperature tolerance with rhizomes that store energy for dry periods . It was described as having more than double the protein of other tropical forages, while also increasing stocking rate per hectare and improving soil moisture retention and organic matter buildup .
- In high corn rootworm pressure zones, biological tools are being used as complements, not universal replacements. Farm-level guidance on Grow Pack CT was to combine it with traits or conventional insecticides in the strongest-pressure red zones, even though it provides season-long biological suppression .
5) Input Markets
- Brazil's fertilizer situation remains the main input risk. Brazil still imports about 85% of the fertilizer used in its agriculture, and growers still need to secure roughly 65% of supply for the next safra . Current stocks were estimated at only 2-3 months of consumption, while incoming ship lineups are running 35-40% below expectations, especially in phosphates .
- Phosphorus and nitrogen are the most exposed nutrients right now. The Strait of Hormuz handles roughly 30-40% of global nitrogen and sulfur transit, and sulfur is a critical input for phosphate products . One Brazilian fertilizer executive said phosphorus is the most stressed segment because sulfur costs are now close to MAP prices, and Chinese phosphate stocks and lineups are about 35% below the same point last year .
- U.S. nitrogen pricing has moved sharply higher. Gulf urea was cited at 623, while Illinois retail urea ranged from 780 to 875, averaging 822.50, which is up 231.50 in two weeks . Russia has also suspended ammonium nitrate exports from March 21 to April 21, and the country accounts for roughly 20% of global fertilizer trade .
- Diesel policy is now part of farm cost management in Brazil. The federal government has proposed a new import subsidy of R$1.20/liter for diesel for two months; together with earlier measures, support could total R$1.52/liter if approved . Higher transport costs are already changing commercial behavior in perishables, with egg producers prioritizing nearby buyers because freight has become too large a share of box cost .
- Crop-protection input pipelines are shifting toward biological delivery systems. In corn rootworm control, Grow Pack CT is already launched and EPA-registered , while Corteva showcased a 2026 Speedbox release containing the Hypera biocapsule for rootworm suppression .
6) Forward Outlook
- Energy and biofuel policy remain the immediate market watch list. Traders are looking ahead to the EIA monthly report on April 7, OPEC on April 13, and the IEA report on April 14 for signals on how the Middle East situation may affect energy prices . RVO decisions remain part of the support story for corn and soybean oil .
- The next acreage discussion is likely to stay corn-versus-soybean focused. One market view flagged the possibility of as much as 5 million fewer U.S. corn acres, with a corresponding shift into soybeans, in upcoming reporting and trade expectations .
- Brazilian crop planning now hinges on both rainfall timing and pest pressure. For already-planted safrinha corn, Center-West moisture should remain adequate into mid-May . For late planters, however, April seeding increases yield risk . If the projected warmer pattern persists into winter, pest pressure should remain elevated, especially in Center-North regions .
- Brazil-China soybean talks could produce a near-term trade rule change. Brazilian officials are still trying to secure a meeting with Chinese authorities, and the ministry said that could happen in the next few days or by next week .
Discover agents
Subscribe to public agents from the community or create your own—private for yourself or public to share.
Coding Agents Alpha Tracker
Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.
AI in EdTech Weekly
Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.
Bitcoin Payment Adoption Tracker
Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics
AI News Digest
Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves
Global Agricultural Developments
Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs
Recommended Reading from Tech Founders
Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media