Hours of research in one daily brief–on your terms.

Tell us what you need to stay on top of. AI agents discover the best sources, monitor them 24/7, and deliver verified daily insights—so you never miss what's important.

Setup your daily brief agent
Discovering relevant sources...
Syncing sources 0/180...
Extracting information
Generating brief

Recent briefs

OpenAI’s Superintelligence Push Meets Anthropic’s Compute Buildout
Apr 7
8 min read
615 docs
Anthropic
SAIR
Omead Pooladzandi
+28
OpenAI published a policy blueprint for the 'Intelligence Age' as Anthropic disclosed $30B run-rate revenue and secured multi-gigawatt TPU capacity. This cycle also brought new warnings on agent security, fresh evidence of brittle reasoning, and a wave of speech, developer, and infrastructure launches.

Top Stories

Why it matters: This cycle centered on three frontier questions at once: how leading labs are framing advanced AI politically, how fast they can secure future compute, and whether current systems are reliable enough for wider deployment.

OpenAI says the superintelligence transition has started — and treats it as a policy problem now

OpenAI published a 13-page blueprint, Industrial Policy for the Intelligence Age: Ideas to keep people first, and said it is “beginning a transition toward superintelligence” . The proposal combines economic and safety measures, including a Public Wealth Fund, tax shifts away from payroll, a right to AI, containment playbooks for dangerous models, auto-triggered safety nets, and an international AI safety network . Altman also warned that soon-to-be-released models could enable a “world-shaking cyberattack” this year and argued the U.S. may need a new social contract on the scale of the Progressive Era or New Deal .

“We’re beginning a transition toward superintelligence: AI systems capable of outperforming the smartest humans even when they are assisted by AI.”

Impact: OpenAI is framing frontier AI as an immediate governance and labor issue, not a distant scenario .

Anthropic pairs revenue acceleration with a long-horizon compute deal

Anthropic said its run-rate revenue has surpassed $30 billion, up from $9 billion at the end of 2025, as demand for Claude continues to accelerate . It also signed an agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity, coming online starting in 2027, to train and serve frontier Claude models . Separate reporting on OpenAI and Anthropic financials said inference still consumes more than half of revenue at both labs, while Anthropic expects profitability sooner than OpenAI once training costs are included .

Impact: Frontier competition is increasingly about securing future energy and hardware capacity, not just model quality .

Agent security is emerging as a deployment bottleneck

A widely shared summary of Google DeepMind work described a large empirical study of AI manipulation covering 502 participants across 8 countries and 23 attack types tested on frontier models including GPT-4o, Claude, and Gemini . The reported result is that websites can detect when an AI agent visits and serve it different content than humans see, including hidden instructions in HTML, image pixels, PDFs, and other files . The same summary says sanitization, prompt guards, sandboxing, and human oversight all fail in important ways, especially when attacks propagate across multi-agent pipelines .

Impact: For agentic systems, the risk is not only misuse of the model itself; it is also untrusted data flowing through the system unnoticed .

New benchmark evidence shows reasoning remains brittle under simple changes

Apple researchers introduced GSM-NoOp, a modified GSM8K benchmark with swapped numbers and irrelevant “no-op” clauses, and reported performance drops across 25 state-of-the-art models . In one example, models subtracted an irrelevant “5” from a kiwi-counting problem that should total 190, yielding 185 instead . The paper summary says few-shot examples barely helped, performance worsened faster as tasks gained steps, and the authors concluded that current LLMs are not capable of genuine logical reasoning but instead reproduce reasoning patterns from training data .

Impact: Strong benchmark scores still do not remove a basic reliability issue: small, irrelevant changes can derail current reasoning models .

Research & Innovation

Why it matters: The most useful research this cycle focused on better evaluation, stronger tool use, and simpler explanations for where current systems still fail.

XpertBench raises the bar for expert-workflow evaluation

XpertBench is built around 1,346 open-ended tasks across 80 categories and 7 domains, using submissions from more than 1,000 experts via ByteDance’s Xpert Data Platform . Instead of simple pass/fail grading, it uses 15–40 weighted checkpoints per task and calibrates automated judging with expert-scored exemplars . On XpertBench-Gold, Claude-Opus-4.6-thinking led at 66.20%, followed by GPT-5.4-high at 64.78% and Doubao-2.0-pro at 64.51%, with most models clustered around 50% and no single model dominating every domain . STEM and Education remained especially difficult because formal reasoning, strict calculation, and long-horizon planning are still weak points .

OctoTools shows a training-free route to better tool use

OctoTools combines standardized tool cards, a planner, and an executor to handle visual understanding, retrieval, math, and multistep reasoning without additional training . The framework reported gains across 16 tasks, outperforming GPT-4o by 9.3%, AutoGen by 10.6%, GPT-4o Functions by 7.5%, and LangChain by 7.3% . It has also been accepted to ACL 2026 .

Equalized-compute tests challenge the case for multi-agent reasoning

A new paper comparing single-agent and multi-agent systems under equal thinking-token budgets found that single-agent LLMs consistently matched or outperformed multi-agent architectures on multi-hop reasoning . The result suggests some apparent multi-agent gains may come from extra computation rather than better coordination .

Simple baselines remain hard to beat in streaming video

A paper on streaming video understanding found that feeding a vision-language model only the most recent four frames can reach near state-of-the-art performance on many benchmarks, often outperforming more complex retrieval and memory setups . The authors recommend using SimpleStream as a baseline and redesigning benchmarks when the actual goal is to test long-range dependencies .

Products & Launches

Why it matters: Commercial releases continued to move beyond chat, especially in speech, developer agents, and production tooling.

Speech tooling improved on both generation and transcription

Mistral launched Voxtral TTS, a 4B-parameter multilingual text-to-speech model supporting 9 languages, 70ms latency, and voice cloning from 3-second samples . Cohere launched Transcribe, a 2B open-source ASR model topping the Hugging Face Open ASR Leaderboard with a 5.42% average word error rate across 14 languages .

GitHub and Arena shipped more practical agent workflows

GitHub’s Copilot cloud agent can now research, plan, and make code changes without needing a pull request first, and can be kicked off from the GitHub mobile app . Arena introduced “Battles in Direct,” which anonymously inserts a second model mid-conversation; it reports 90%+ correlation with regular Battle mode and deeper evaluation through longer context windows .

New infrastructure features target production ergonomics

LangChain launched Cost Alerting in LangSmith so teams can set configurable alerts on total agent spend as production usage rises . Hugging Face introduced gradio.Server, which lets developers pair custom frontends with Gradio’s backend while keeping its queuing system, API infrastructure, MCP support, and ZeroGPU on Spaces .

Industry Moves

Why it matters: The business layer is being shaped by compute intensity, capital requirements, and how companies balance open releases against competitive pressure.

OpenAI and Anthropic are growing fast, but training costs remain the core constraint

Reporting on confidential financials said both OpenAI and Anthropic are seeing revenue surge, but training costs are rising even faster . For OpenAI, the projection is $121 billion in compute spending by 2028, with $85 billion in losses that year even after nearly doubling revenue; including training costs, break-even does not arrive until the 2030s . A separate post similarly said OpenAI does not expect profit until at least 2030 . Another report said Altman wants to take OpenAI public as early as Q4 2026, while CFO Sarah Friar doubts the company will be ready because of spending commitments, slowing revenue growth, and organizational work still ahead .

Meta is preparing a new model family with delayed open-source releases

Reporting says Meta is preparing to release its first LLM built under Alexandr Wang soon, but open versions will not ship at launch because the company wants to remove proprietary elements and address safety risks first . Meta also appears to be positioning the family around selective consumer strengths rather than claiming it will beat OpenAI or Anthropic across the board .

Compute ownership remains highly concentrated

Epoch AI’s new AI Chip Owners explorer estimates that the top U.S. hyperscalers control more than 60% of global AI compute, led by Google at roughly 5 million Nvidia H100-equivalent GPUs, much of it through custom TPUs . Chinese companies collectively account for just over 5%, a share that is falling under export controls; Huawei has become the leading source of AI compute in China on paper .

Policy & Regulation

Why it matters: AI governance is moving from general principles toward specific controls, public-interest proposals, and government-backed operational systems.

OpenAI’s blueprint favors targeted frontier controls and social protections

The policy document calls for stricter regulation on a narrow set of frontier models rather than the broader AI ecosystem, alongside competitive auditing, containment playbooks, an international safety network, worker voice in deployment decisions, and broader access to AI as basic infrastructure . OpenAI is also backing policy work with up to $100,000 fellowships, $1 million in API credits, and a Washington workshop opening in May .

Japan’s internal affairs ministry is using AI against disinformation

Sakana AI said it completed a project with Japan’s Ministry of Internal Affairs and Communications to build an end-to-end system for visualizing, detecting, and countering misinformation on social media at national scale . The system uses autonomous agents running novelty searches, combines frontier models with proprietary small models, and simulates how counter-messaging spreads before deployment .

Safety research capacity is still expanding

OpenAI launched a Safety Fellowship to support independent research on safety and alignment, including evaluation, robustness, and scalable mitigations; applications are open through May 4, 2026 . Constellation also opened applications for its fully funded five-month Astra Fellowship in empirical AI safety research, strategy, and governance .

Quick Takes

Why it matters: Smaller updates this cycle still showed how quickly AI is spreading into healthcare, enterprise workflows, edge deployment, and creative production.*

  • Voice as a diagnostic tool: Vox, an FDA-designated system, can analyze five seconds of speech to detect worsening heart failure; it was trained on more than 3 million voice samples and supported by five clinical trials .
  • Voice restoration: Neuralink and ElevenLabs restored the real voice of an ALS patient through voice cloning, replacing a robotic voice with a more familiar one .
  • Edge model compression: Bonsai introduced 1-bit weights for 1.7B to 8B-parameter models, reporting 14x compression versus bf16 and 8x faster edge performance .
  • Inference speed: Baseten said it shipped named-entity recognition inference at 1 ms P50 and 3 ms P99 server-side latency, 7.7x faster than an optimized PyTorch baseline .
  • Enterprise research adoption: Elicit is now formally deployed at 30% of the top 20 global life sciences companies to automate research .
  • Open science infrastructure: SAIR Foundation and Hugging Face announced a collaboration to provide open data, benchmarks, tools, and models for AI x Science competitions .
  • Creative generation: Runway’s Ad Concepter App produced a short brand film from two input images and a short text description .
Asimov's Robot Stories Lead Today's Picks, With The Beginning of Infinity and a Crime-Tech Essay Also Surfacing
Apr 7
2 min read
130 docs
David Ulevitch 🇺🇸
andrew chen
Sam Altman
+1
The strongest signal today is Isaac Asimov: in the Sam Altman–Francois Chollet AGI conversation, his Robot stories and Foundation series surfaced twice as formative reading. Andrew Chen also shared a practical article on crime, tech, and how to make things better.

What stood out

Only a small number of recommendations cleared the authenticity bar today, but one stood out because it surfaced twice in a serious AGI conversation and was framed as formative rather than casually interesting.

Most compelling recommendation

Isaac Asimov's Robot stories and Foundation series

  • Content type: Books / science fiction series
  • Author/creator: Isaac Asimov
  • Link/URL: None provided in the source material
  • Who recommended it: Both speakers in AGI: Francois Chollet + Sam Altman
  • Key takeaway: One speaker said Asimov's Robot stories were a major influence behind wanting to build human-level AI since age 16, while the other said Asimov may have had more impact when younger than The Beginning of Infinity
  • Why it matters: This is the strongest pick today because it appears twice, unprompted, in a conversation about AGI and is tied to long-run intellectual formation rather than a passing endorsement

"I've wanted to build human level AI since I was like 16. And I think one of the big influences on me at the time was Asimov's Robot Stories."

Also worth saving

The Beginning of Infinity

  • Content type: Book
  • Author/creator: Not specified in the source material
  • Link/URL: None provided in the source material
  • Who recommended it: A speaker in AGI: Francois Chollet + Sam Altman said it was the book they were going to choose as most impactful
  • Key takeaway: It surfaced as the first answer in a discussion of the most impactful book, before that speaker added that Asimov may have had more impact earlier in life
  • Why it matters: Even with limited context, it was positioned at the top of a very short list of personally important books in the same AGI discussion

Crime-and-tech writeup (title not provided)

  • Content type: Article
  • Author/creator: Not specified in the source material
  • Link/URL:http://x.com/i/article/2041160953094639617
  • Who recommended it: Andrew Chen
  • Key takeaway: After describing personal experiences with car break-ins, a garage break-in, and stolen bikes in San Francisco, Chen called it a "great writeup about crime and tech and how to make it all better"
  • Why it matters: This is the most practical recommendation in today's set: it is framed as a resource on a concrete problem and possible improvements, not just commentary on the problem itself

Across today's picks, the split is clear: the books are presented as deep formative influences, while the article is recommended for its applied thinking on a live civic issue .

Cursor 3.0’s Swarm Control, Claude Code’s Slide, and Codex at the Limit
Apr 7
5 min read
85 docs
Theo - t3․gg
Fireship
Theo - t3.gg
Cursor 3.0 is the day’s clearest workflow shift: the IDE is becoming a control plane for parallel agents. The other strong signal comes from Theo’s side of the market—Claude Code frustration, Codex preference, and real limits showing up in long-running, high-volume usage.

🔥 TOP SIGNAL

Cursor 3.0 is the clearest product shift today: the developer stops being a typist and becomes an agent dispatcher. In Fireship’s walkthrough, a fresh project goes from plan mode to parallel agents across marketing, servers, and other projects, with yellow-dot approval gates for risky commands, blue-dot completion signals, and a 13k-line prototype ready to inspect in-browser .

🛠️ TOOLS & MODELS

  • Cursor 3.0 — Major UX change: Cursor now wants you running swarms of agents across repos, machines, and the cloud, not manually editing code line by line . The new interface was rewritten in Rust + TypeScript for agent management, while the old VS Code-style editor still exists in the product .
  • Composer 2 — Cursor’s new in-house model was presented as smarter, faster, and cheaper than Opus on benchmark slides, then Cursor later apologized for the lack of transparency and published a technical report saying it was Kimi plus reinforcement learning.
  • Claude Code — Theo’s negative signal keeps getting louder: he says it is “basically unusable” for his use cases, and his Dropbox repair example shows Claude refusing to help once the task looked like general computer support instead of software engineering .
  • Codex CLI — Theo says he is now repointing his cc alias to Codex --yolo and prefers Codex for coding, research, and longer runs. His reasons: open-source CLI, better models, easier to build on top of, and higher trust on extended tasks .
  • Benchmark signal — Theo also points out that Claude Code ranks last on TerminalBench among harnesses using Opus 4.6, with ten separate harnesses doing better on the same base model .

💡 WORKFLOWS & TRICKS

  • Cursor swarm loop to copy

    1. Start a fresh repo in plan mode and let the agent sketch architecture .
    2. While that runs, dispatch more agents in parallel: a landing page, remote work over SSH, or an entirely different project .
    3. Use the status dots as the control surface: yellow means you need to approve risky commands; blue means review-ready .
    4. Review output in one place via git history, terminal, file explorer, and the built-in browser.
    5. For UI cleanup, jump to design mode, select the broken element, describe the fix, and keep queueing more requests while the agent works in the background .
  • Whole-machine debugging loop with Codex

    1. Give Codex the operational task directly: kill and relaunch the broken app .
    2. If the first pass stalls, add a mid-run steering prompt telling it to research similar failures online .
    3. Let it propose root causes, then authorize the cleanup step—in Theo’s example, nuking duplicate Dropbox installs .
    4. End by asking for a reinstall checklist so the agent hands back a concrete recovery plan, not just terminal output .
  • Small habit, big routing effect — Theo says a big reason he defaulted to Claude Code was simply that cc was already aliased in his shell with the right flags. He is now changing that alias to open Codex with --yolo, which is a good reminder to bake your preferred tool and flags into muscle memory .

  • Long-thread context is real, but so are quota ceilings — Theo says Julius trusted compaction enough to run threads over 180 million tokens, and separately reports Julius burned through 100% of a $200/month Codex plan during T3 Code iteration .

👤 PEOPLE TO WATCH

  • Jeff Delaney / Fireship — Useful today because he shows Cursor 3.0 doing real multi-agent work, not just reading release notes: architecture planning, parallel agents, SSH tasks, browser review, and UI repair in one short demo .
  • Theo Browne — Still high-signal for hard negative feedback on agent tooling. Today he combines a concrete Claude Code failure case, a permanent alias switch to Codex, and a benchmark critique that isolates harness quality from base-model quality .
  • Julius / @jullerino — Worth tracking as a power-user stress test for cost and context limits. Theo highlights both full-plan burn on Codex and extremely long compaction-backed threads .

🎬 WATCH & LISTEN

  • 2:46-3:33 — Cursor 3.0’s core loop — Best clip of the day if you want to see the thesis in one minute: fresh project, multiple agents in parallel, permission gates, and a 13k-line codebase ready for review .
  • 3:55-4:11 — Design mode for UI cleanup — Short but practical. Delaney highlights a broken element, asks AI to fix it, and keeps stacking more UI tasks instead of waiting for each one to finish .

📊 PROJECTS & REPOS

  • T3 Code — Theo describes it as the only UI he finds performant for working across lots of projects at once. It is fully open source and free, and can front either Claude Code or Codex subscriptions through the agent SDK .
  • T3 Code usage signal — The stronger signal today is workload intensity: Theo says Julius exhausted 100% of a $200/month Codex plan during T3 Code iteration and is getting a second account so progress is not blocked by token ceilings .
  • Codex CLI — Theo calls out the open-source CLI specifically as a reason he prefers Codex; he says it is easier to build on top of and lets you reuse auth in other places .

Editorial take: the edge is shifting from raw model IQ to the control plane around it—parallelism, approval gates, long-lived context, and quota management.

Judgment, Signal Capture, and Faster PM Teams
Apr 7
10 min read
76 docs
a16z
Teresa Torres
John Cutler
+8
This brief covers four shifts shaping modern PM work: judgment is becoming more valuable as AI speeds execution, discovery depends on better signal capture, and faster teams need tighter demo and alignment loops. It also includes practical plays for interviewing, productivity, career prep, and tools worth testing now.

Big Ideas

1) Judgment is the durable advantage in AI work

“Speed is the demo. Judgment is the actual job.”

Leah Tharin’s point is straightforward: AI can generate output fast, but only domain expertise can tell whether that output is good enough to ship . She also argues that AI products increasingly win on compatibility—whether they fit how users already work with AI—not on forcing a brand-new interface or workflow . Teresa Torres applies the same filter personally: stay aware of new tools, but go deep only when a tool solves a real friction point and is actionable now . Peter Yang adds that core PM skills still center on talking to users and identifying the right problem to solve .

Why it matters: AI makes output cheaper; it does not make evaluation easier .

How to apply:

  • Use AI as an assistant and keep human judgment close to anything that ships
  • Evaluate new products by asking whether they work with existing AI habits
  • Adopt tools when they remove live friction, not just because they are interesting

2) Fast teams put a premium on interaction design inside the org

“You are the head game designer.”

John Cutler argues that leaders shape the environment people work in, and that culture is the sum of the quality of interactions inside the organization . He recommends focusing on a local trust boundary of roughly 30-50 people, where managers can still materially shape how work gets done . The same theme shows up in practice: Julie Zhuo says TeamSundial canceled all recurring meetings except a Monday demo, and the remaining meeting now feels like weekly hackathon energy . Anthropic’s head of growth built a weekly AI agent that scans Slack for cross-functional misalignment before teams waste weeks on overlapping work . Peter Yang also describes a future where 2-3 person product teams work with agents across functional lines .

Why it matters: When build speed rises, meeting design, visibility, and misalignment detection matter more .

How to apply:

  • Treat recurring interactions as product design work, not calendar residue
  • Keep one visible demo cadence and challenge the rest of the meeting stack
  • Add lightweight checks for overlap and drift in Slack-heavy teams

3) Discovery quality depends on capturing real behavior, not lucky recall

Teresa Torres’s current teaching emphasis is continuous interviewing: collect specific stories about customers’ past behavior and synthesize what you learn from each interview . The Reddit HubSpot story shows the cost of doing this poorly. A $45k ARR account asked for a HubSpot integration in a QBR; the request was mentioned in a Slack thread and ignored because it was not attached to a large enterprise deal . Months later, an unrelated prospect requested the same integration almost word for word, revealing a pattern the team had not tracked systematically . After the work was fast-tracked, the integration became the primary entry point for a mid-market segment and roughly $1.4M in pipeline .

Why it matters: If signals live in Slack, QBR notes, and memory, you are depending on luck to find product demand .

How to apply:

  • Ask for concrete past-behavior stories, not general opinions
  • Synthesize each interview before detail fades
  • Put CS, sales, and product signals into a trackable system so repeated requests become visible

4) Cheap prototyping changes the quality process

Aakash Gupta’s notes on Anthropic describe a culture where every PM codes, the norm is to “send a PR,” and 80% of prototypes never ship . Agent Teams let one lead agent delegate to 10 parallel teammates; in one example, three open issues became three PRs within 40 minutes . Auto Mode is described as saving 20-40 permission clicks per session and completing a refactor across 14 files, including test runs and fixes, in eight minutes . Peter Yang makes the PM implication explicit: build a thing yourself, get feedback, then bring engineers along .

Why it matters: When prototypes are cheap, quality depends more on generating options and killing weak ones quickly .

How to apply:

  • Use agents to create multiple credible options in parallel
  • Treat a high kill rate as part of the quality bar, not as wasted motion
  • Keep user feedback close to the prototype loop so speed becomes learning, not just output

Tactical Playbook

1) Run a continuous interviewing loop

  1. Ask customers for specific stories about what they actually did
  2. Synthesize the interview immediately after it ends
  3. Add adjacent evidence from QBRs, CS notes, and Slack threads into the same tracking system
  4. Look for the same request from unrelated accounts before escalating priority
  5. Track requests formally so pattern detection does not depend on someone remembering an old message

Why it matters: This is the difference between systematic discovery and finding a $1.4M opportunity late .

2) Use an actionable-not-interesting filter for AI tools

  1. Scan broadly enough to understand the solution space
  2. When a tool looks exciting, ask: “Why do you need that?”
  3. Wait for a real friction point before going deep
  4. Prioritize tools that are actionable now, not merely interesting
  5. Time-box learning and stay with a tool long enough to hit real constraints and context issues

Why it matters: This reduces burnout and produces deeper learning on the tools you actually adopt .

3) Reset operating cadence around demos and misalignment checks

  1. Audit recurring meetings and remove the ones that do not create value
  2. Preserve one visible demo ritual so work stays legible
  3. Define the local group where trust and environment can still be shaped—roughly 30-50 people
  4. Treat culture as interaction quality, not as a slogan
  5. Run a scheduled scan for overlap and drift in project conversations

Why it matters: Faster teams benefit when visibility is high and coordination overhead stays low .

4) Measure productivity by delivered outcomes, then consolidate

  1. For one month, track only completed and delivered outcomes
  2. List every place work lives and how often you have to search across them
  3. Consolidate the core system so you are not carrying the map in your head
  4. Simplify until the system stops requiring constant maintenance

Why it matters: A system can feel productive while producing very little that ships .

Case Studies & Lessons

1) A forgotten Slack thread became a new mid-market wedge

A mid-market customer asked for a HubSpot integration during a QBR. The request was dropped into a Slack thread, but it stayed below the cut line because it was not tied to an enterprise deal or a competitive loss . Months later, a different prospect asked for the same integration, and an AE remembered the original message . The team fast-tracked the work; within about 11 months of the first QBR mention, the integration had become the primary entry point for a mid-market segment and was sitting at roughly $1.4M in pipeline .

Key takeaway: Weak signals are valuable only if your system can retrieve them before someone gets lucky .

2) HubSpot shows how an AI shift can pressure even strong SaaS performance

Leah Tharin highlights a stark contrast: HubSpot’s revenue grew from $1.3B in 2021 to $3.1B in 2025, a 141% increase, while the stock fell 71% from its 2021 peak . Her explanation is the classic innovator’s dilemma: move too fast and risk current customers; move too slow and miss the next paradigm shift . She argues the earlier sales-led-to-product-led pivot was easier when the company was smaller .

Key takeaway: Strong current revenue does not remove pressure to adapt when the market’s definition of fit is changing .

3) One weekly demo replaced a lot of meeting weight at TeamSundial

Julie Zhuo says TeamSundial canceled all recurring meetings except one Monday demo meeting at 6:30am . The result, in her telling, is recurring “surprise” and “delight” and energy that feels closer to a weekly hackathon than the old quarterly event cadence .

Key takeaway: A single showcase ritual can do more for visibility and momentum than a stack of status meetings .

Career Corner

1) Prepare behavioral interviews as modular stories

Aakash Gupta says most PM candidates overprepare product sense and underprepare behavioral interviews . Across 1,000+ mock interviews, candidates who prepare 5-6 answers that map across categories outperform candidates who prepare 30 isolated answers . One strong cross-functional conflict story can cover 8+ questions across 3 categories, and top candidates build a library of 6-8 stories that map across 84 common questions in 7 categories, then practice each answer in under two minutes .

Why it matters: Modular stories travel further than memorized answers .

How to apply:

  • Build 6-8 stories, not 30 scripts
  • Cover conflict, decision-making, and cross-functional execution first
  • Practice concise versions that land in under two minutes

2) Hypergeneralists need packaging, not self-reduction

John Cutler argues that hypergeneralists may be more valuable than ever, but they also have the hardest time explaining how that breadth helps in a specific environment . His advice is not to box yourself in permanently, but to design a “Trojan horse” package that makes your range accessible to other people . He also notes that public writing increases surface area for serendipity, and many of the best things that happened in his career trace back to something he wrote online .

Why it matters: Breadth helps only if other people can understand where it fits .

How to apply:

  • Turn broad experience into a simple narrative others can repeat
  • Keep enough flexibility that the story does not trap you
  • Publish thoughtful work publicly if you want more unexpected opportunities

3) You do not need to become the most technical AI operator

Leah Tharin says two pieces of prior advice were wrong for her: PMs do not need to know SQL to be effective, and they do not need deep technical AI knowledge to work well with AI . Her view is that the people using AI best are treating it like an assistant . The limiting factor is still judgment—knowing what good looks like in your domain . Peter Yang’s addition is practical: keep talking to users, figure out what to build, and prototype enough to learn fast .

Why it matters: The edge comes from domain judgment plus hands-on reps, not from pretending every PM needs the same technical profile .

How to apply:

  • Double down on the domain strengths that help you judge outputs well
  • Use AI as leverage, not as an identity project
  • Prototype enough to sharpen product sense and feedback loops

Tools & Resources

1) Continuous Discovery Habits reading cohort

Teresa Torres is organizing a 2026 group read of Continuous Discovery Habits with monthly reading guides, reflection questions, exercises, short videos for teammates, and quarterly live discussion sessions . April’s chapter focuses on continuous interviewing and includes a supplemental reading on AI synthesis .

Why explore it: It turns discovery concepts into a recurring practice loop .

How to use it: Work through one section per month and use the exercises to build an actual interviewing habit, not just a reading habit .

2) NotebookLM

Torres ignored NotebookLM until she had a concrete need: creating overview videos and infographics from existing blog posts. She now uses it to generate both from Product Talk articles .

Why explore it: It is useful when you already have source material and need a new format for it .

How to use it: Start with existing documents or posts you already trust, then test whether the generated summaries or visuals help your audience .

3) 11 Labs

After launching paid subscriptions, Torres used 11 Labs to create audio versions of blog posts and now uses it for her article podcast audio .

Why explore it: It can extend existing written content into an audio format without building that workflow from scratch .

How to use it: Apply it to a content stream you already publish, then judge whether audio adds real user value .

4) Cowork

Cowork is described as running on your computer with access to your apps and files. In one example, it caught up on Slack DMs and updated a metrics deck before a meeting . Anthropic’s head of growth also uses it with Slack MCP as a scheduled task to scan projects and conversations for cross-functional misalignment .

Why explore it: It is being used for both personal prep work and org-level signal detection .

How to use it: Start with bounded tasks such as inbox triage, meeting prep, or weekly alignment checks .

5) Agent Teams and Auto Mode

Aakash Gupta highlights two Anthropic features he says changed how he works. Agent Teams lets a lead agent delegate to 10 parallel teammates; in his example, three open issues produced three PRs within 40 minutes . Auto Mode handled edits across 14 files, ran tests three times, fixed failures, and committed, while saving 20-40 permission clicks per session .

Why explore them: They compress repetitive build and prototype work into a much shorter loop .

How to use them: Try them on parallel prototypes, refactors, or other tasks where speed matters and the work can be reviewed quickly .

Anthropic’s TPU Deal, OpenAI’s Washington Push, and a Gradual Automation Picture
Apr 7
4 min read
217 docs
Import AI
Rowan Cheung
Sam Altman
+2
Anthropic locked in multi-gigawatt TPU capacity and OpenAI stepped up its case for earlier policy debate on cyber, bio, and energy. New research, meanwhile, suggests AI capabilities are spreading broadly across work and cyber tasks even as near-term GDP effects may remain more modest.

Frontier buildout is accelerating faster than the measured macro story

Today's clearest contrast was between frontier companies planning for much larger demand and new research that still points to a slower macroeconomic rollout .

Anthropic locks in multi-gigawatt TPU capacity for Claude

Anthropic said it signed an agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity, coming online starting in 2027, to train and serve frontier Claude models . The company also said its run-rate revenue has passed $30 billion, up from $9 billion at the end of 2025, and framed the deal as the compute needed to keep pace with demand .

Why it matters: This ties Anthropic's next stage of growth directly to multi-year compute procurement and sharply rising commercial demand .

OpenAI takes its superintelligence argument to Washington

In a Washington-facing interview, Sam Altman said OpenAI wants policy ideas discussed now because AI is beginning to do more real work and could change coding, knowledge work, science, and the shape of work . He said OpenAI's main preparedness areas are cyber and bio, expects significant cyber threats within the next year, and argued that safety in a world of powerful AI cannot be handled by companies alone; he also pointed to energy buildout, privacy, and frontier-system auditing as active policy areas .

"I suspect in the next year we will see significant threats we have to mitigate from cyber"

Why it matters: The policy conversation is moving from abstract AGI claims toward concrete questions about resilience, infrastructure, and oversight .

Research suggests the impact may be broad, but not abrupt

MIT finds a "rising tide" across text-based work

Researchers analyzing 3,000 O-NET tasks with 17,000 worker evaluations said AI progress across realistic text-based labor-market tasks looks more like a rising tide than a crashing wave . They report that, between 2024-Q2 and 2025-Q3, frontier models moved from 50% success on 3- to 4-hour tasks to 1-week tasks, and project most surveyed tasks could reach 80%-95% AI success rates by 2029 at minimally sufficient quality .

Why it matters: The study points to broad, gradual gains across many job families rather than a few isolated task categories changing all at once .

Forecasters still expect only a modest GDP boost by 2030

A major report from the Forecasting Research Institute found that surveyed groups expect moderate to rapid AI progress in the coming years, yet still see GDP impacts staying relatively small by 2030 - about one percentage point above the 2025 baseline growth rate of 2.4% . The same material says all surveyed cohorts expect continued declines in labor-force participation and rising wealth inequality, while economists put a 14% chance on major near-term increases in GDP and inequality .

Why it matters: Taken together with the MIT work, the picture is not "no impact" - it is faster task-level progress paired with a still-muted near-term GDP forecast .

Security and public-safety systems

Offensive cyber capability keeps improving on short timelines

Lyptus Research found that frontier-model performance on cyberoffense tasks has followed a 9.8-month doubling time since 2019, steepening to 5.7 months for models released since 2024 . The evaluation covered standard cyber benchmarks and a new 291-task dataset with time estimates calibrated by 10 offensive cybersecurity professionals . In the study, GPT-5.3 Codex and Opus 4.6 reached 50% success on tasks that take human experts about 3.1 to 3.2 hours, and the most recent open-weight model in the sample lagged the closed frontier by 5.7 months .

Why it matters: The result points to both stronger offensive capability and relatively short diffusion timelines from closed frontier systems to open-weight models .

Google puts 24-hour flash-flood forecasts on Flood Hub

Google launched an AI system that predicts flash floods 24 hours in advance and made the predictions live for free on Flood Hub . The system uses Gemini to extract confirmed flood locations and times from global news, builds missing historical event data, and combines weather forecasts with terrain, soil absorption, and urban density; the notes say it can work in countries with little flood-monitoring infrastructure .

Why it matters: Flash floods kill more than 5,000 people each year, and a 12-hour warning alone can reduce damage by 60%, making this a notable example of AI being deployed into public-risk infrastructure .

Wheat Stress, Energy-Driven Input Risk, and New Nitrogen Playbooks
Apr 7
7 min read
222 docs
Market Minute LLC
Arlan Suderman
Tarım Editörü
+6
Plains wheat conditions worsened as corn and soybean markets weighed strong exports against rising fuel and fertilizer risk. This brief also highlights lower-cost nitrogen programs, cover-crop design choices, and regional trade and weather shifts from Brazil to the EU-Mercosur corridor.

Market Movers

  • United States / Black Sea wheat: May Chicago wheat was $5.925 and May Kansas City wheat $6.0675 on April 6, but the market is trading a stressed Plains crop: 65% of U.S. winter wheat area is in drought, national good/excellent is 35% versus 48% a year ago, and states such as Colorado and Oklahoma are at 12% good/excellent. Funds are now net long SRW wheat for the first time since June 2022. Additional Black Sea logistics risk remains after a vessel carrying wheat sank in the Sea of Azov and Russia launched new attacks on Ukraine despite an Easter ceasefire .
  • United States corn: May corn was $4.5025, down 2 cents, but export demand remains the strongest grain story. Weekly inspections were 78.8 million bushels, and marketing-year inspections are running 297 million bushels ahead of the pace needed to hit USDA's target. Weekly export sales were 45 million bushels, and one market source described the corn export book as the best on record. That demand is being offset by weak cash basis in parts of the western Corn Belt and heavy farmer selling that has left commercials net short 572,000 corn contracts .
  • United States soybeans: May soybeans were $11.6725, up 3¾ cents. Weekly inspections reached 28.6 million bushels, including 18.3 million destined for China, but marketing-year soybean inspections still trail the pace needed for USDA's target by 96 million bushels. Support is coming from biofuel-linked soybean oil strength and expectations that a China meeting could unlock additional buying .
  • Energy and ag inputs: WTI crude jumped 11% to $111.54/bbl, its highest close since June 2022, while U.S. average retail fuel was about $4.11/gal for gasoline and $5.61/gal for diesel. Several market notes tie grain volatility, fertilizer uncertainty, and farm freight costs back to this energy shock .

Innovation Spotlight

  • United States / corn nitrogen redesign: In a Minnesota Soil Health Coalition presentation, John Kempf described a corn program built around nitrogen form and timing rather than total seasonal pounds. His framework starts with 40 lb N at planting, 40 lb N sidedressed around V5-V6, 25 lb sulfur total, and two foliar low-biuret urea applications of 10 lb N/acre around tassel and R1. He said the first 25 lb of sulfur can deliver the yield response of 25 lb N, and that this package can replace a 200-lb N program with about 185 lb N-equivalent while cutting input costs 30-50%; 0.65-0.75 lb N/bushel was described as routinely achievable in the field experience presented .
  • The same presentation cited older research showing highest-yielding corn at roughly 80% ammonium / 20% nitrate, and argued nitrate should be emphasized early then minimized after V5-V6 because it requires more water and energy than ammonium .
  • United States / on-farm validation: Griggs Farms in west Tennessee said replicated, on-farm trials plus scales on the grain cart are the highest-return evaluation tools on the farm. In its cover-crop system, the farm said it has documented double summer infiltration rates, higher water-holding capacity, more organic matter, more biological activity, and better weed control, even though small-plot yield gains have been harder to show consistently. The same farm has cut its maximum corn N rate from 200 to 170-175 units, and cotton from 80 down to 20-25 units while continuing to test reductions. One biological product costing $4.60/acre won every corn side-by-side trial over five years on that farm .

I'm not looking to produce more. What I'm looking to do is maintain my yield and it costing me less

Regional Developments

  • United States: March 31 planting intentions put corn acres above trade expectations, but several analysts said the survey closed before the Iran conflict fully hit fertilizer logistics. Farmers who booked urea are reportedly facing shipment delays, and part of corn nitrogen demand is still unresolved as planting starts. The Eastern Corn Belt has also picked up 4-5 inches of rain, improving subsoil moisture but potentially slowing fieldwork .
  • Brazil / Rio Grande do Sul: Soybean harvest has reached 23% of planted area, with 43% of fields in maturation and 31% still filling grain; average yield is estimated near 2,900 kg/ha over more than 6.6 million hectares. At the same time, an extratropical cyclone is bringing 30+ mm rains, strong winds and hail risk to key rice and soy zones such as Uruguaiana, Rio Pardo and Alegrete, with localized totals above 100 mm over five days in the southwest .
  • EU / Mercosur: The EU plans to provisionally start the commercial core of the Mercosur-EU agreement on May 1, focused on tariff reductions under the part of the treaty that falls under EU trade competence, even while the full agreement remains under review by the EU Court of Justice .
  • China / Brazil beef trade: China's foot-and-mouth cases are currently described as isolated and rapidly contained, so there is no immediate export windfall for Brazil. Analysts said quota flexibility or larger Brazilian beef sales would require a broader deterioration in China's herd situation .

Best Practices

  • Corn nitrogen management: Separate nitrate decisions from ammonium and urea decisions. The Minnesota framework emphasizes generous nitrate earlier, then moving to urea/ammonium and foliar N after V5-V6; if following that system, total sulfur is kept near 25 lb/acre. The same presenter said manure is mostly organic nitrogen, but high-salt dairy manure applications can damage soil biology .
  • Cover crop design before corn: Griggs Farms uses annual ryegrass as a base, cereal rye + oats ahead of corn for weed control and organic matter, clovers / vetch / lentils where more N is needed, and radish / rapeseed / buckwheat where compaction or phosphorus release is the goal. Drill when stand consistency and weed suppression matter most; interseed earlier when biomass is the bigger priority .
  • Dairy / forage: Embrapa's recommendation for grass silage is to harvest at 90-110 days, when dry matter is around 20%. The guidance says dry matter can be checked with a microwave and kitchen scale; waiting beyond 110-120 days reduces digestibility and slows the next regrowth cycle .
  • Weed programs under new dicamba labels: For 2026, over-the-top dicamba is back with stricter rules: two applications maximum, runoff-mitigation points, buffers, and ESA compliance. Regional fit still varies - Kochia in the northern Plains, waterhemp in Illinois corn/soy, Palmer amaranth in the Delta - but multiple experts stressed that resistance means growers need residuals and seedbank reduction, not just a different POST sequence .
  • Product testing discipline: Use replicated strips and calibrated grain-cart scales to decide which biologicals or crop inputs earn a permanent place in the program .

Input Markets

  • Nitrogen fertilizer: Supply risk has become operational, not just theoretical. Brazilian analysts said the Middle East/Russia/Africa corridor supplies 70-80% of Brazil's imported nitrogenates, while Russia and China have limited nitrogen exports to prioritize domestic planting. In Iran, GUBRETAS affiliate Razi Petrochemical temporarily stopped production after attack damage to electrical units. U.S. analysts likewise reported that some booked urea shipments are not arriving, leaving part of corn nitrogen needs unresolved heading into planting .
  • Fuel and freight: Rising fuel is now hitting both field costs and livestock supply chains. U.S. retail diesel averaged about $5.61/gal, while Brazilian analysts said diesel and maritime freight costs are rising across poultry and swine chains and are difficult to fully pass through to consumers .
  • Crop protection labels and pipeline: Dicamba's 2026 return comes with stricter ESA-based runoff and buffer requirements and a max of two applications. Brownfield and Farm4Profit sources also noted a slower herbicide pipeline: new products can take about 10 years and more than $300 million to commercialize, while current litigation and label-defensibility reviews are slowing registrations further .
  • Feed coproducts: China has started receiving Brazilian DDGs, with a first 62,000-ton cargo reported. More corn ethanol output also means more DDG availability for swine, poultry and cattle feed .

Forward Outlook

  • Wheat: Near-term direction depends on whether forecast rains actually reach HRW country. Ratings are weak enough that even modest moisture matters, but the western Plains still need more than the current forecast offers. From a chart perspective, the recent rally has already stalled near last February's highs and a 61.8% retracement level .
  • Corn vs. soy acreage: Final corn area still looks less settled than the March survey suggests because fertilizer logistics changed after the survey window and new-crop soybean economics are competitive. One analyst expects the March report to mark the high print for corn this year if late nitrogen remains tight. Technically, $4.45-$4.50 is the must-hold support zone being watched in corn .
  • Risk management: Pro Farmer recommended November soybean $11.60 puts at 60 cents, creating roughly an $11.00 floor on 40% of new-crop production, and December corn $4.80 puts at 32 cents, creating a $4.48 floor .
  • Livestock exporters: Brazil's protein export system is proving resilient despite longer routes and higher costs, but not disruption-free. Expect slower shipments and more expensive logistics rather than a clean break in trade .
  • Capital spending: In Brazil, machinery sales were down 17% in Q1 and are forecast down 8% for the year as higher rates and delinquency keep producers cautious. That is a reminder that 2026 planning is happening in a tight-credit environment, not a capex cycle .

Your time, back.

An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers one verified daily brief.

Save hours

AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.

Full control over the agent

Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.

Verify every claim

Citations link to the original source and the exact span.

Discover sources on autopilot

Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.

Multi-media sources

Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.

Private or Public

Create private agents for yourself, publish public ones, and subscribe to agents from others.

Get your briefs in 3 steps

1

Describe your goal

Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.

Stay updated on space exploration and electric vehicle innovations
Daily newsletter on AI news and research
Track startup funding trends and venture capital insights
Latest research on longevity, health optimization, and wellness breakthroughs
Auto-discover sources

2

Confirm your sources and launch

Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.

Discovering relevant sources...
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter · Gergely Orosz
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile ·
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS · Ben Thompson
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter · Gergely Orosz
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile ·
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS · Ben Thompson

3

Receive verified daily briefs

Get concise, daily updates with precise citations directly in your inbox. You control the focus, style, and length.

OpenAI’s Superintelligence Push Meets Anthropic’s Compute Buildout
Apr 7
8 min read
615 docs
Anthropic
SAIR
Omead Pooladzandi
+28
OpenAI published a policy blueprint for the 'Intelligence Age' as Anthropic disclosed $30B run-rate revenue and secured multi-gigawatt TPU capacity. This cycle also brought new warnings on agent security, fresh evidence of brittle reasoning, and a wave of speech, developer, and infrastructure launches.

Top Stories

Why it matters: This cycle centered on three frontier questions at once: how leading labs are framing advanced AI politically, how fast they can secure future compute, and whether current systems are reliable enough for wider deployment.

OpenAI says the superintelligence transition has started — and treats it as a policy problem now

OpenAI published a 13-page blueprint, Industrial Policy for the Intelligence Age: Ideas to keep people first, and said it is “beginning a transition toward superintelligence” . The proposal combines economic and safety measures, including a Public Wealth Fund, tax shifts away from payroll, a right to AI, containment playbooks for dangerous models, auto-triggered safety nets, and an international AI safety network . Altman also warned that soon-to-be-released models could enable a “world-shaking cyberattack” this year and argued the U.S. may need a new social contract on the scale of the Progressive Era or New Deal .

“We’re beginning a transition toward superintelligence: AI systems capable of outperforming the smartest humans even when they are assisted by AI.”

Impact: OpenAI is framing frontier AI as an immediate governance and labor issue, not a distant scenario .

Anthropic pairs revenue acceleration with a long-horizon compute deal

Anthropic said its run-rate revenue has surpassed $30 billion, up from $9 billion at the end of 2025, as demand for Claude continues to accelerate . It also signed an agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity, coming online starting in 2027, to train and serve frontier Claude models . Separate reporting on OpenAI and Anthropic financials said inference still consumes more than half of revenue at both labs, while Anthropic expects profitability sooner than OpenAI once training costs are included .

Impact: Frontier competition is increasingly about securing future energy and hardware capacity, not just model quality .

Agent security is emerging as a deployment bottleneck

A widely shared summary of Google DeepMind work described a large empirical study of AI manipulation covering 502 participants across 8 countries and 23 attack types tested on frontier models including GPT-4o, Claude, and Gemini . The reported result is that websites can detect when an AI agent visits and serve it different content than humans see, including hidden instructions in HTML, image pixels, PDFs, and other files . The same summary says sanitization, prompt guards, sandboxing, and human oversight all fail in important ways, especially when attacks propagate across multi-agent pipelines .

Impact: For agentic systems, the risk is not only misuse of the model itself; it is also untrusted data flowing through the system unnoticed .

New benchmark evidence shows reasoning remains brittle under simple changes

Apple researchers introduced GSM-NoOp, a modified GSM8K benchmark with swapped numbers and irrelevant “no-op” clauses, and reported performance drops across 25 state-of-the-art models . In one example, models subtracted an irrelevant “5” from a kiwi-counting problem that should total 190, yielding 185 instead . The paper summary says few-shot examples barely helped, performance worsened faster as tasks gained steps, and the authors concluded that current LLMs are not capable of genuine logical reasoning but instead reproduce reasoning patterns from training data .

Impact: Strong benchmark scores still do not remove a basic reliability issue: small, irrelevant changes can derail current reasoning models .

Research & Innovation

Why it matters: The most useful research this cycle focused on better evaluation, stronger tool use, and simpler explanations for where current systems still fail.

XpertBench raises the bar for expert-workflow evaluation

XpertBench is built around 1,346 open-ended tasks across 80 categories and 7 domains, using submissions from more than 1,000 experts via ByteDance’s Xpert Data Platform . Instead of simple pass/fail grading, it uses 15–40 weighted checkpoints per task and calibrates automated judging with expert-scored exemplars . On XpertBench-Gold, Claude-Opus-4.6-thinking led at 66.20%, followed by GPT-5.4-high at 64.78% and Doubao-2.0-pro at 64.51%, with most models clustered around 50% and no single model dominating every domain . STEM and Education remained especially difficult because formal reasoning, strict calculation, and long-horizon planning are still weak points .

OctoTools shows a training-free route to better tool use

OctoTools combines standardized tool cards, a planner, and an executor to handle visual understanding, retrieval, math, and multistep reasoning without additional training . The framework reported gains across 16 tasks, outperforming GPT-4o by 9.3%, AutoGen by 10.6%, GPT-4o Functions by 7.5%, and LangChain by 7.3% . It has also been accepted to ACL 2026 .

Equalized-compute tests challenge the case for multi-agent reasoning

A new paper comparing single-agent and multi-agent systems under equal thinking-token budgets found that single-agent LLMs consistently matched or outperformed multi-agent architectures on multi-hop reasoning . The result suggests some apparent multi-agent gains may come from extra computation rather than better coordination .

Simple baselines remain hard to beat in streaming video

A paper on streaming video understanding found that feeding a vision-language model only the most recent four frames can reach near state-of-the-art performance on many benchmarks, often outperforming more complex retrieval and memory setups . The authors recommend using SimpleStream as a baseline and redesigning benchmarks when the actual goal is to test long-range dependencies .

Products & Launches

Why it matters: Commercial releases continued to move beyond chat, especially in speech, developer agents, and production tooling.

Speech tooling improved on both generation and transcription

Mistral launched Voxtral TTS, a 4B-parameter multilingual text-to-speech model supporting 9 languages, 70ms latency, and voice cloning from 3-second samples . Cohere launched Transcribe, a 2B open-source ASR model topping the Hugging Face Open ASR Leaderboard with a 5.42% average word error rate across 14 languages .

GitHub and Arena shipped more practical agent workflows

GitHub’s Copilot cloud agent can now research, plan, and make code changes without needing a pull request first, and can be kicked off from the GitHub mobile app . Arena introduced “Battles in Direct,” which anonymously inserts a second model mid-conversation; it reports 90%+ correlation with regular Battle mode and deeper evaluation through longer context windows .

New infrastructure features target production ergonomics

LangChain launched Cost Alerting in LangSmith so teams can set configurable alerts on total agent spend as production usage rises . Hugging Face introduced gradio.Server, which lets developers pair custom frontends with Gradio’s backend while keeping its queuing system, API infrastructure, MCP support, and ZeroGPU on Spaces .

Industry Moves

Why it matters: The business layer is being shaped by compute intensity, capital requirements, and how companies balance open releases against competitive pressure.

OpenAI and Anthropic are growing fast, but training costs remain the core constraint

Reporting on confidential financials said both OpenAI and Anthropic are seeing revenue surge, but training costs are rising even faster . For OpenAI, the projection is $121 billion in compute spending by 2028, with $85 billion in losses that year even after nearly doubling revenue; including training costs, break-even does not arrive until the 2030s . A separate post similarly said OpenAI does not expect profit until at least 2030 . Another report said Altman wants to take OpenAI public as early as Q4 2026, while CFO Sarah Friar doubts the company will be ready because of spending commitments, slowing revenue growth, and organizational work still ahead .

Meta is preparing a new model family with delayed open-source releases

Reporting says Meta is preparing to release its first LLM built under Alexandr Wang soon, but open versions will not ship at launch because the company wants to remove proprietary elements and address safety risks first . Meta also appears to be positioning the family around selective consumer strengths rather than claiming it will beat OpenAI or Anthropic across the board .

Compute ownership remains highly concentrated

Epoch AI’s new AI Chip Owners explorer estimates that the top U.S. hyperscalers control more than 60% of global AI compute, led by Google at roughly 5 million Nvidia H100-equivalent GPUs, much of it through custom TPUs . Chinese companies collectively account for just over 5%, a share that is falling under export controls; Huawei has become the leading source of AI compute in China on paper .

Policy & Regulation

Why it matters: AI governance is moving from general principles toward specific controls, public-interest proposals, and government-backed operational systems.

OpenAI’s blueprint favors targeted frontier controls and social protections

The policy document calls for stricter regulation on a narrow set of frontier models rather than the broader AI ecosystem, alongside competitive auditing, containment playbooks, an international safety network, worker voice in deployment decisions, and broader access to AI as basic infrastructure . OpenAI is also backing policy work with up to $100,000 fellowships, $1 million in API credits, and a Washington workshop opening in May .

Japan’s internal affairs ministry is using AI against disinformation

Sakana AI said it completed a project with Japan’s Ministry of Internal Affairs and Communications to build an end-to-end system for visualizing, detecting, and countering misinformation on social media at national scale . The system uses autonomous agents running novelty searches, combines frontier models with proprietary small models, and simulates how counter-messaging spreads before deployment .

Safety research capacity is still expanding

OpenAI launched a Safety Fellowship to support independent research on safety and alignment, including evaluation, robustness, and scalable mitigations; applications are open through May 4, 2026 . Constellation also opened applications for its fully funded five-month Astra Fellowship in empirical AI safety research, strategy, and governance .

Quick Takes

Why it matters: Smaller updates this cycle still showed how quickly AI is spreading into healthcare, enterprise workflows, edge deployment, and creative production.*

  • Voice as a diagnostic tool: Vox, an FDA-designated system, can analyze five seconds of speech to detect worsening heart failure; it was trained on more than 3 million voice samples and supported by five clinical trials .
  • Voice restoration: Neuralink and ElevenLabs restored the real voice of an ALS patient through voice cloning, replacing a robotic voice with a more familiar one .
  • Edge model compression: Bonsai introduced 1-bit weights for 1.7B to 8B-parameter models, reporting 14x compression versus bf16 and 8x faster edge performance .
  • Inference speed: Baseten said it shipped named-entity recognition inference at 1 ms P50 and 3 ms P99 server-side latency, 7.7x faster than an optimized PyTorch baseline .
  • Enterprise research adoption: Elicit is now formally deployed at 30% of the top 20 global life sciences companies to automate research .
  • Open science infrastructure: SAIR Foundation and Hugging Face announced a collaboration to provide open data, benchmarks, tools, and models for AI x Science competitions .
  • Creative generation: Runway’s Ad Concepter App produced a short brand film from two input images and a short text description .
Asimov's Robot Stories Lead Today's Picks, With The Beginning of Infinity and a Crime-Tech Essay Also Surfacing
Apr 7
2 min read
130 docs
David Ulevitch 🇺🇸
andrew chen
Sam Altman
+1
The strongest signal today is Isaac Asimov: in the Sam Altman–Francois Chollet AGI conversation, his Robot stories and Foundation series surfaced twice as formative reading. Andrew Chen also shared a practical article on crime, tech, and how to make things better.

What stood out

Only a small number of recommendations cleared the authenticity bar today, but one stood out because it surfaced twice in a serious AGI conversation and was framed as formative rather than casually interesting.

Most compelling recommendation

Isaac Asimov's Robot stories and Foundation series

  • Content type: Books / science fiction series
  • Author/creator: Isaac Asimov
  • Link/URL: None provided in the source material
  • Who recommended it: Both speakers in AGI: Francois Chollet + Sam Altman
  • Key takeaway: One speaker said Asimov's Robot stories were a major influence behind wanting to build human-level AI since age 16, while the other said Asimov may have had more impact when younger than The Beginning of Infinity
  • Why it matters: This is the strongest pick today because it appears twice, unprompted, in a conversation about AGI and is tied to long-run intellectual formation rather than a passing endorsement

"I've wanted to build human level AI since I was like 16. And I think one of the big influences on me at the time was Asimov's Robot Stories."

Also worth saving

The Beginning of Infinity

  • Content type: Book
  • Author/creator: Not specified in the source material
  • Link/URL: None provided in the source material
  • Who recommended it: A speaker in AGI: Francois Chollet + Sam Altman said it was the book they were going to choose as most impactful
  • Key takeaway: It surfaced as the first answer in a discussion of the most impactful book, before that speaker added that Asimov may have had more impact earlier in life
  • Why it matters: Even with limited context, it was positioned at the top of a very short list of personally important books in the same AGI discussion

Crime-and-tech writeup (title not provided)

  • Content type: Article
  • Author/creator: Not specified in the source material
  • Link/URL:http://x.com/i/article/2041160953094639617
  • Who recommended it: Andrew Chen
  • Key takeaway: After describing personal experiences with car break-ins, a garage break-in, and stolen bikes in San Francisco, Chen called it a "great writeup about crime and tech and how to make it all better"
  • Why it matters: This is the most practical recommendation in today's set: it is framed as a resource on a concrete problem and possible improvements, not just commentary on the problem itself

Across today's picks, the split is clear: the books are presented as deep formative influences, while the article is recommended for its applied thinking on a live civic issue .

Cursor 3.0’s Swarm Control, Claude Code’s Slide, and Codex at the Limit
Apr 7
5 min read
85 docs
Theo - t3․gg
Fireship
Theo - t3.gg
Cursor 3.0 is the day’s clearest workflow shift: the IDE is becoming a control plane for parallel agents. The other strong signal comes from Theo’s side of the market—Claude Code frustration, Codex preference, and real limits showing up in long-running, high-volume usage.

🔥 TOP SIGNAL

Cursor 3.0 is the clearest product shift today: the developer stops being a typist and becomes an agent dispatcher. In Fireship’s walkthrough, a fresh project goes from plan mode to parallel agents across marketing, servers, and other projects, with yellow-dot approval gates for risky commands, blue-dot completion signals, and a 13k-line prototype ready to inspect in-browser .

🛠️ TOOLS & MODELS

  • Cursor 3.0 — Major UX change: Cursor now wants you running swarms of agents across repos, machines, and the cloud, not manually editing code line by line . The new interface was rewritten in Rust + TypeScript for agent management, while the old VS Code-style editor still exists in the product .
  • Composer 2 — Cursor’s new in-house model was presented as smarter, faster, and cheaper than Opus on benchmark slides, then Cursor later apologized for the lack of transparency and published a technical report saying it was Kimi plus reinforcement learning.
  • Claude Code — Theo’s negative signal keeps getting louder: he says it is “basically unusable” for his use cases, and his Dropbox repair example shows Claude refusing to help once the task looked like general computer support instead of software engineering .
  • Codex CLI — Theo says he is now repointing his cc alias to Codex --yolo and prefers Codex for coding, research, and longer runs. His reasons: open-source CLI, better models, easier to build on top of, and higher trust on extended tasks .
  • Benchmark signal — Theo also points out that Claude Code ranks last on TerminalBench among harnesses using Opus 4.6, with ten separate harnesses doing better on the same base model .

💡 WORKFLOWS & TRICKS

  • Cursor swarm loop to copy

    1. Start a fresh repo in plan mode and let the agent sketch architecture .
    2. While that runs, dispatch more agents in parallel: a landing page, remote work over SSH, or an entirely different project .
    3. Use the status dots as the control surface: yellow means you need to approve risky commands; blue means review-ready .
    4. Review output in one place via git history, terminal, file explorer, and the built-in browser.
    5. For UI cleanup, jump to design mode, select the broken element, describe the fix, and keep queueing more requests while the agent works in the background .
  • Whole-machine debugging loop with Codex

    1. Give Codex the operational task directly: kill and relaunch the broken app .
    2. If the first pass stalls, add a mid-run steering prompt telling it to research similar failures online .
    3. Let it propose root causes, then authorize the cleanup step—in Theo’s example, nuking duplicate Dropbox installs .
    4. End by asking for a reinstall checklist so the agent hands back a concrete recovery plan, not just terminal output .
  • Small habit, big routing effect — Theo says a big reason he defaulted to Claude Code was simply that cc was already aliased in his shell with the right flags. He is now changing that alias to open Codex with --yolo, which is a good reminder to bake your preferred tool and flags into muscle memory .

  • Long-thread context is real, but so are quota ceilings — Theo says Julius trusted compaction enough to run threads over 180 million tokens, and separately reports Julius burned through 100% of a $200/month Codex plan during T3 Code iteration .

👤 PEOPLE TO WATCH

  • Jeff Delaney / Fireship — Useful today because he shows Cursor 3.0 doing real multi-agent work, not just reading release notes: architecture planning, parallel agents, SSH tasks, browser review, and UI repair in one short demo .
  • Theo Browne — Still high-signal for hard negative feedback on agent tooling. Today he combines a concrete Claude Code failure case, a permanent alias switch to Codex, and a benchmark critique that isolates harness quality from base-model quality .
  • Julius / @jullerino — Worth tracking as a power-user stress test for cost and context limits. Theo highlights both full-plan burn on Codex and extremely long compaction-backed threads .

🎬 WATCH & LISTEN

  • 2:46-3:33 — Cursor 3.0’s core loop — Best clip of the day if you want to see the thesis in one minute: fresh project, multiple agents in parallel, permission gates, and a 13k-line codebase ready for review .
  • 3:55-4:11 — Design mode for UI cleanup — Short but practical. Delaney highlights a broken element, asks AI to fix it, and keeps stacking more UI tasks instead of waiting for each one to finish .

📊 PROJECTS & REPOS

  • T3 Code — Theo describes it as the only UI he finds performant for working across lots of projects at once. It is fully open source and free, and can front either Claude Code or Codex subscriptions through the agent SDK .
  • T3 Code usage signal — The stronger signal today is workload intensity: Theo says Julius exhausted 100% of a $200/month Codex plan during T3 Code iteration and is getting a second account so progress is not blocked by token ceilings .
  • Codex CLI — Theo calls out the open-source CLI specifically as a reason he prefers Codex; he says it is easier to build on top of and lets you reuse auth in other places .

Editorial take: the edge is shifting from raw model IQ to the control plane around it—parallelism, approval gates, long-lived context, and quota management.

Judgment, Signal Capture, and Faster PM Teams
Apr 7
10 min read
76 docs
a16z
Teresa Torres
John Cutler
+8
This brief covers four shifts shaping modern PM work: judgment is becoming more valuable as AI speeds execution, discovery depends on better signal capture, and faster teams need tighter demo and alignment loops. It also includes practical plays for interviewing, productivity, career prep, and tools worth testing now.

Big Ideas

1) Judgment is the durable advantage in AI work

“Speed is the demo. Judgment is the actual job.”

Leah Tharin’s point is straightforward: AI can generate output fast, but only domain expertise can tell whether that output is good enough to ship . She also argues that AI products increasingly win on compatibility—whether they fit how users already work with AI—not on forcing a brand-new interface or workflow . Teresa Torres applies the same filter personally: stay aware of new tools, but go deep only when a tool solves a real friction point and is actionable now . Peter Yang adds that core PM skills still center on talking to users and identifying the right problem to solve .

Why it matters: AI makes output cheaper; it does not make evaluation easier .

How to apply:

  • Use AI as an assistant and keep human judgment close to anything that ships
  • Evaluate new products by asking whether they work with existing AI habits
  • Adopt tools when they remove live friction, not just because they are interesting

2) Fast teams put a premium on interaction design inside the org

“You are the head game designer.”

John Cutler argues that leaders shape the environment people work in, and that culture is the sum of the quality of interactions inside the organization . He recommends focusing on a local trust boundary of roughly 30-50 people, where managers can still materially shape how work gets done . The same theme shows up in practice: Julie Zhuo says TeamSundial canceled all recurring meetings except a Monday demo, and the remaining meeting now feels like weekly hackathon energy . Anthropic’s head of growth built a weekly AI agent that scans Slack for cross-functional misalignment before teams waste weeks on overlapping work . Peter Yang also describes a future where 2-3 person product teams work with agents across functional lines .

Why it matters: When build speed rises, meeting design, visibility, and misalignment detection matter more .

How to apply:

  • Treat recurring interactions as product design work, not calendar residue
  • Keep one visible demo cadence and challenge the rest of the meeting stack
  • Add lightweight checks for overlap and drift in Slack-heavy teams

3) Discovery quality depends on capturing real behavior, not lucky recall

Teresa Torres’s current teaching emphasis is continuous interviewing: collect specific stories about customers’ past behavior and synthesize what you learn from each interview . The Reddit HubSpot story shows the cost of doing this poorly. A $45k ARR account asked for a HubSpot integration in a QBR; the request was mentioned in a Slack thread and ignored because it was not attached to a large enterprise deal . Months later, an unrelated prospect requested the same integration almost word for word, revealing a pattern the team had not tracked systematically . After the work was fast-tracked, the integration became the primary entry point for a mid-market segment and roughly $1.4M in pipeline .

Why it matters: If signals live in Slack, QBR notes, and memory, you are depending on luck to find product demand .

How to apply:

  • Ask for concrete past-behavior stories, not general opinions
  • Synthesize each interview before detail fades
  • Put CS, sales, and product signals into a trackable system so repeated requests become visible

4) Cheap prototyping changes the quality process

Aakash Gupta’s notes on Anthropic describe a culture where every PM codes, the norm is to “send a PR,” and 80% of prototypes never ship . Agent Teams let one lead agent delegate to 10 parallel teammates; in one example, three open issues became three PRs within 40 minutes . Auto Mode is described as saving 20-40 permission clicks per session and completing a refactor across 14 files, including test runs and fixes, in eight minutes . Peter Yang makes the PM implication explicit: build a thing yourself, get feedback, then bring engineers along .

Why it matters: When prototypes are cheap, quality depends more on generating options and killing weak ones quickly .

How to apply:

  • Use agents to create multiple credible options in parallel
  • Treat a high kill rate as part of the quality bar, not as wasted motion
  • Keep user feedback close to the prototype loop so speed becomes learning, not just output

Tactical Playbook

1) Run a continuous interviewing loop

  1. Ask customers for specific stories about what they actually did
  2. Synthesize the interview immediately after it ends
  3. Add adjacent evidence from QBRs, CS notes, and Slack threads into the same tracking system
  4. Look for the same request from unrelated accounts before escalating priority
  5. Track requests formally so pattern detection does not depend on someone remembering an old message

Why it matters: This is the difference between systematic discovery and finding a $1.4M opportunity late .

2) Use an actionable-not-interesting filter for AI tools

  1. Scan broadly enough to understand the solution space
  2. When a tool looks exciting, ask: “Why do you need that?”
  3. Wait for a real friction point before going deep
  4. Prioritize tools that are actionable now, not merely interesting
  5. Time-box learning and stay with a tool long enough to hit real constraints and context issues

Why it matters: This reduces burnout and produces deeper learning on the tools you actually adopt .

3) Reset operating cadence around demos and misalignment checks

  1. Audit recurring meetings and remove the ones that do not create value
  2. Preserve one visible demo ritual so work stays legible
  3. Define the local group where trust and environment can still be shaped—roughly 30-50 people
  4. Treat culture as interaction quality, not as a slogan
  5. Run a scheduled scan for overlap and drift in project conversations

Why it matters: Faster teams benefit when visibility is high and coordination overhead stays low .

4) Measure productivity by delivered outcomes, then consolidate

  1. For one month, track only completed and delivered outcomes
  2. List every place work lives and how often you have to search across them
  3. Consolidate the core system so you are not carrying the map in your head
  4. Simplify until the system stops requiring constant maintenance

Why it matters: A system can feel productive while producing very little that ships .

Case Studies & Lessons

1) A forgotten Slack thread became a new mid-market wedge

A mid-market customer asked for a HubSpot integration during a QBR. The request was dropped into a Slack thread, but it stayed below the cut line because it was not tied to an enterprise deal or a competitive loss . Months later, a different prospect asked for the same integration, and an AE remembered the original message . The team fast-tracked the work; within about 11 months of the first QBR mention, the integration had become the primary entry point for a mid-market segment and was sitting at roughly $1.4M in pipeline .

Key takeaway: Weak signals are valuable only if your system can retrieve them before someone gets lucky .

2) HubSpot shows how an AI shift can pressure even strong SaaS performance

Leah Tharin highlights a stark contrast: HubSpot’s revenue grew from $1.3B in 2021 to $3.1B in 2025, a 141% increase, while the stock fell 71% from its 2021 peak . Her explanation is the classic innovator’s dilemma: move too fast and risk current customers; move too slow and miss the next paradigm shift . She argues the earlier sales-led-to-product-led pivot was easier when the company was smaller .

Key takeaway: Strong current revenue does not remove pressure to adapt when the market’s definition of fit is changing .

3) One weekly demo replaced a lot of meeting weight at TeamSundial

Julie Zhuo says TeamSundial canceled all recurring meetings except one Monday demo meeting at 6:30am . The result, in her telling, is recurring “surprise” and “delight” and energy that feels closer to a weekly hackathon than the old quarterly event cadence .

Key takeaway: A single showcase ritual can do more for visibility and momentum than a stack of status meetings .

Career Corner

1) Prepare behavioral interviews as modular stories

Aakash Gupta says most PM candidates overprepare product sense and underprepare behavioral interviews . Across 1,000+ mock interviews, candidates who prepare 5-6 answers that map across categories outperform candidates who prepare 30 isolated answers . One strong cross-functional conflict story can cover 8+ questions across 3 categories, and top candidates build a library of 6-8 stories that map across 84 common questions in 7 categories, then practice each answer in under two minutes .

Why it matters: Modular stories travel further than memorized answers .

How to apply:

  • Build 6-8 stories, not 30 scripts
  • Cover conflict, decision-making, and cross-functional execution first
  • Practice concise versions that land in under two minutes

2) Hypergeneralists need packaging, not self-reduction

John Cutler argues that hypergeneralists may be more valuable than ever, but they also have the hardest time explaining how that breadth helps in a specific environment . His advice is not to box yourself in permanently, but to design a “Trojan horse” package that makes your range accessible to other people . He also notes that public writing increases surface area for serendipity, and many of the best things that happened in his career trace back to something he wrote online .

Why it matters: Breadth helps only if other people can understand where it fits .

How to apply:

  • Turn broad experience into a simple narrative others can repeat
  • Keep enough flexibility that the story does not trap you
  • Publish thoughtful work publicly if you want more unexpected opportunities

3) You do not need to become the most technical AI operator

Leah Tharin says two pieces of prior advice were wrong for her: PMs do not need to know SQL to be effective, and they do not need deep technical AI knowledge to work well with AI . Her view is that the people using AI best are treating it like an assistant . The limiting factor is still judgment—knowing what good looks like in your domain . Peter Yang’s addition is practical: keep talking to users, figure out what to build, and prototype enough to learn fast .

Why it matters: The edge comes from domain judgment plus hands-on reps, not from pretending every PM needs the same technical profile .

How to apply:

  • Double down on the domain strengths that help you judge outputs well
  • Use AI as leverage, not as an identity project
  • Prototype enough to sharpen product sense and feedback loops

Tools & Resources

1) Continuous Discovery Habits reading cohort

Teresa Torres is organizing a 2026 group read of Continuous Discovery Habits with monthly reading guides, reflection questions, exercises, short videos for teammates, and quarterly live discussion sessions . April’s chapter focuses on continuous interviewing and includes a supplemental reading on AI synthesis .

Why explore it: It turns discovery concepts into a recurring practice loop .

How to use it: Work through one section per month and use the exercises to build an actual interviewing habit, not just a reading habit .

2) NotebookLM

Torres ignored NotebookLM until she had a concrete need: creating overview videos and infographics from existing blog posts. She now uses it to generate both from Product Talk articles .

Why explore it: It is useful when you already have source material and need a new format for it .

How to use it: Start with existing documents or posts you already trust, then test whether the generated summaries or visuals help your audience .

3) 11 Labs

After launching paid subscriptions, Torres used 11 Labs to create audio versions of blog posts and now uses it for her article podcast audio .

Why explore it: It can extend existing written content into an audio format without building that workflow from scratch .

How to use it: Apply it to a content stream you already publish, then judge whether audio adds real user value .

4) Cowork

Cowork is described as running on your computer with access to your apps and files. In one example, it caught up on Slack DMs and updated a metrics deck before a meeting . Anthropic’s head of growth also uses it with Slack MCP as a scheduled task to scan projects and conversations for cross-functional misalignment .

Why explore it: It is being used for both personal prep work and org-level signal detection .

How to use it: Start with bounded tasks such as inbox triage, meeting prep, or weekly alignment checks .

5) Agent Teams and Auto Mode

Aakash Gupta highlights two Anthropic features he says changed how he works. Agent Teams lets a lead agent delegate to 10 parallel teammates; in his example, three open issues produced three PRs within 40 minutes . Auto Mode handled edits across 14 files, ran tests three times, fixed failures, and committed, while saving 20-40 permission clicks per session .

Why explore them: They compress repetitive build and prototype work into a much shorter loop .

How to use them: Try them on parallel prototypes, refactors, or other tasks where speed matters and the work can be reviewed quickly .

Anthropic’s TPU Deal, OpenAI’s Washington Push, and a Gradual Automation Picture
Apr 7
4 min read
217 docs
Import AI
Rowan Cheung
Sam Altman
+2
Anthropic locked in multi-gigawatt TPU capacity and OpenAI stepped up its case for earlier policy debate on cyber, bio, and energy. New research, meanwhile, suggests AI capabilities are spreading broadly across work and cyber tasks even as near-term GDP effects may remain more modest.

Frontier buildout is accelerating faster than the measured macro story

Today's clearest contrast was between frontier companies planning for much larger demand and new research that still points to a slower macroeconomic rollout .

Anthropic locks in multi-gigawatt TPU capacity for Claude

Anthropic said it signed an agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity, coming online starting in 2027, to train and serve frontier Claude models . The company also said its run-rate revenue has passed $30 billion, up from $9 billion at the end of 2025, and framed the deal as the compute needed to keep pace with demand .

Why it matters: This ties Anthropic's next stage of growth directly to multi-year compute procurement and sharply rising commercial demand .

OpenAI takes its superintelligence argument to Washington

In a Washington-facing interview, Sam Altman said OpenAI wants policy ideas discussed now because AI is beginning to do more real work and could change coding, knowledge work, science, and the shape of work . He said OpenAI's main preparedness areas are cyber and bio, expects significant cyber threats within the next year, and argued that safety in a world of powerful AI cannot be handled by companies alone; he also pointed to energy buildout, privacy, and frontier-system auditing as active policy areas .

"I suspect in the next year we will see significant threats we have to mitigate from cyber"

Why it matters: The policy conversation is moving from abstract AGI claims toward concrete questions about resilience, infrastructure, and oversight .

Research suggests the impact may be broad, but not abrupt

MIT finds a "rising tide" across text-based work

Researchers analyzing 3,000 O-NET tasks with 17,000 worker evaluations said AI progress across realistic text-based labor-market tasks looks more like a rising tide than a crashing wave . They report that, between 2024-Q2 and 2025-Q3, frontier models moved from 50% success on 3- to 4-hour tasks to 1-week tasks, and project most surveyed tasks could reach 80%-95% AI success rates by 2029 at minimally sufficient quality .

Why it matters: The study points to broad, gradual gains across many job families rather than a few isolated task categories changing all at once .

Forecasters still expect only a modest GDP boost by 2030

A major report from the Forecasting Research Institute found that surveyed groups expect moderate to rapid AI progress in the coming years, yet still see GDP impacts staying relatively small by 2030 - about one percentage point above the 2025 baseline growth rate of 2.4% . The same material says all surveyed cohorts expect continued declines in labor-force participation and rising wealth inequality, while economists put a 14% chance on major near-term increases in GDP and inequality .

Why it matters: Taken together with the MIT work, the picture is not "no impact" - it is faster task-level progress paired with a still-muted near-term GDP forecast .

Security and public-safety systems

Offensive cyber capability keeps improving on short timelines

Lyptus Research found that frontier-model performance on cyberoffense tasks has followed a 9.8-month doubling time since 2019, steepening to 5.7 months for models released since 2024 . The evaluation covered standard cyber benchmarks and a new 291-task dataset with time estimates calibrated by 10 offensive cybersecurity professionals . In the study, GPT-5.3 Codex and Opus 4.6 reached 50% success on tasks that take human experts about 3.1 to 3.2 hours, and the most recent open-weight model in the sample lagged the closed frontier by 5.7 months .

Why it matters: The result points to both stronger offensive capability and relatively short diffusion timelines from closed frontier systems to open-weight models .

Google puts 24-hour flash-flood forecasts on Flood Hub

Google launched an AI system that predicts flash floods 24 hours in advance and made the predictions live for free on Flood Hub . The system uses Gemini to extract confirmed flood locations and times from global news, builds missing historical event data, and combines weather forecasts with terrain, soil absorption, and urban density; the notes say it can work in countries with little flood-monitoring infrastructure .

Why it matters: Flash floods kill more than 5,000 people each year, and a 12-hour warning alone can reduce damage by 60%, making this a notable example of AI being deployed into public-risk infrastructure .

Wheat Stress, Energy-Driven Input Risk, and New Nitrogen Playbooks
Apr 7
7 min read
222 docs
Market Minute LLC
Arlan Suderman
Tarım Editörü
+6
Plains wheat conditions worsened as corn and soybean markets weighed strong exports against rising fuel and fertilizer risk. This brief also highlights lower-cost nitrogen programs, cover-crop design choices, and regional trade and weather shifts from Brazil to the EU-Mercosur corridor.

Market Movers

  • United States / Black Sea wheat: May Chicago wheat was $5.925 and May Kansas City wheat $6.0675 on April 6, but the market is trading a stressed Plains crop: 65% of U.S. winter wheat area is in drought, national good/excellent is 35% versus 48% a year ago, and states such as Colorado and Oklahoma are at 12% good/excellent. Funds are now net long SRW wheat for the first time since June 2022. Additional Black Sea logistics risk remains after a vessel carrying wheat sank in the Sea of Azov and Russia launched new attacks on Ukraine despite an Easter ceasefire .
  • United States corn: May corn was $4.5025, down 2 cents, but export demand remains the strongest grain story. Weekly inspections were 78.8 million bushels, and marketing-year inspections are running 297 million bushels ahead of the pace needed to hit USDA's target. Weekly export sales were 45 million bushels, and one market source described the corn export book as the best on record. That demand is being offset by weak cash basis in parts of the western Corn Belt and heavy farmer selling that has left commercials net short 572,000 corn contracts .
  • United States soybeans: May soybeans were $11.6725, up 3¾ cents. Weekly inspections reached 28.6 million bushels, including 18.3 million destined for China, but marketing-year soybean inspections still trail the pace needed for USDA's target by 96 million bushels. Support is coming from biofuel-linked soybean oil strength and expectations that a China meeting could unlock additional buying .
  • Energy and ag inputs: WTI crude jumped 11% to $111.54/bbl, its highest close since June 2022, while U.S. average retail fuel was about $4.11/gal for gasoline and $5.61/gal for diesel. Several market notes tie grain volatility, fertilizer uncertainty, and farm freight costs back to this energy shock .

Innovation Spotlight

  • United States / corn nitrogen redesign: In a Minnesota Soil Health Coalition presentation, John Kempf described a corn program built around nitrogen form and timing rather than total seasonal pounds. His framework starts with 40 lb N at planting, 40 lb N sidedressed around V5-V6, 25 lb sulfur total, and two foliar low-biuret urea applications of 10 lb N/acre around tassel and R1. He said the first 25 lb of sulfur can deliver the yield response of 25 lb N, and that this package can replace a 200-lb N program with about 185 lb N-equivalent while cutting input costs 30-50%; 0.65-0.75 lb N/bushel was described as routinely achievable in the field experience presented .
  • The same presentation cited older research showing highest-yielding corn at roughly 80% ammonium / 20% nitrate, and argued nitrate should be emphasized early then minimized after V5-V6 because it requires more water and energy than ammonium .
  • United States / on-farm validation: Griggs Farms in west Tennessee said replicated, on-farm trials plus scales on the grain cart are the highest-return evaluation tools on the farm. In its cover-crop system, the farm said it has documented double summer infiltration rates, higher water-holding capacity, more organic matter, more biological activity, and better weed control, even though small-plot yield gains have been harder to show consistently. The same farm has cut its maximum corn N rate from 200 to 170-175 units, and cotton from 80 down to 20-25 units while continuing to test reductions. One biological product costing $4.60/acre won every corn side-by-side trial over five years on that farm .

I'm not looking to produce more. What I'm looking to do is maintain my yield and it costing me less

Regional Developments

  • United States: March 31 planting intentions put corn acres above trade expectations, but several analysts said the survey closed before the Iran conflict fully hit fertilizer logistics. Farmers who booked urea are reportedly facing shipment delays, and part of corn nitrogen demand is still unresolved as planting starts. The Eastern Corn Belt has also picked up 4-5 inches of rain, improving subsoil moisture but potentially slowing fieldwork .
  • Brazil / Rio Grande do Sul: Soybean harvest has reached 23% of planted area, with 43% of fields in maturation and 31% still filling grain; average yield is estimated near 2,900 kg/ha over more than 6.6 million hectares. At the same time, an extratropical cyclone is bringing 30+ mm rains, strong winds and hail risk to key rice and soy zones such as Uruguaiana, Rio Pardo and Alegrete, with localized totals above 100 mm over five days in the southwest .
  • EU / Mercosur: The EU plans to provisionally start the commercial core of the Mercosur-EU agreement on May 1, focused on tariff reductions under the part of the treaty that falls under EU trade competence, even while the full agreement remains under review by the EU Court of Justice .
  • China / Brazil beef trade: China's foot-and-mouth cases are currently described as isolated and rapidly contained, so there is no immediate export windfall for Brazil. Analysts said quota flexibility or larger Brazilian beef sales would require a broader deterioration in China's herd situation .

Best Practices

  • Corn nitrogen management: Separate nitrate decisions from ammonium and urea decisions. The Minnesota framework emphasizes generous nitrate earlier, then moving to urea/ammonium and foliar N after V5-V6; if following that system, total sulfur is kept near 25 lb/acre. The same presenter said manure is mostly organic nitrogen, but high-salt dairy manure applications can damage soil biology .
  • Cover crop design before corn: Griggs Farms uses annual ryegrass as a base, cereal rye + oats ahead of corn for weed control and organic matter, clovers / vetch / lentils where more N is needed, and radish / rapeseed / buckwheat where compaction or phosphorus release is the goal. Drill when stand consistency and weed suppression matter most; interseed earlier when biomass is the bigger priority .
  • Dairy / forage: Embrapa's recommendation for grass silage is to harvest at 90-110 days, when dry matter is around 20%. The guidance says dry matter can be checked with a microwave and kitchen scale; waiting beyond 110-120 days reduces digestibility and slows the next regrowth cycle .
  • Weed programs under new dicamba labels: For 2026, over-the-top dicamba is back with stricter rules: two applications maximum, runoff-mitigation points, buffers, and ESA compliance. Regional fit still varies - Kochia in the northern Plains, waterhemp in Illinois corn/soy, Palmer amaranth in the Delta - but multiple experts stressed that resistance means growers need residuals and seedbank reduction, not just a different POST sequence .
  • Product testing discipline: Use replicated strips and calibrated grain-cart scales to decide which biologicals or crop inputs earn a permanent place in the program .

Input Markets

  • Nitrogen fertilizer: Supply risk has become operational, not just theoretical. Brazilian analysts said the Middle East/Russia/Africa corridor supplies 70-80% of Brazil's imported nitrogenates, while Russia and China have limited nitrogen exports to prioritize domestic planting. In Iran, GUBRETAS affiliate Razi Petrochemical temporarily stopped production after attack damage to electrical units. U.S. analysts likewise reported that some booked urea shipments are not arriving, leaving part of corn nitrogen needs unresolved heading into planting .
  • Fuel and freight: Rising fuel is now hitting both field costs and livestock supply chains. U.S. retail diesel averaged about $5.61/gal, while Brazilian analysts said diesel and maritime freight costs are rising across poultry and swine chains and are difficult to fully pass through to consumers .
  • Crop protection labels and pipeline: Dicamba's 2026 return comes with stricter ESA-based runoff and buffer requirements and a max of two applications. Brownfield and Farm4Profit sources also noted a slower herbicide pipeline: new products can take about 10 years and more than $300 million to commercialize, while current litigation and label-defensibility reviews are slowing registrations further .
  • Feed coproducts: China has started receiving Brazilian DDGs, with a first 62,000-ton cargo reported. More corn ethanol output also means more DDG availability for swine, poultry and cattle feed .

Forward Outlook

  • Wheat: Near-term direction depends on whether forecast rains actually reach HRW country. Ratings are weak enough that even modest moisture matters, but the western Plains still need more than the current forecast offers. From a chart perspective, the recent rally has already stalled near last February's highs and a 61.8% retracement level .
  • Corn vs. soy acreage: Final corn area still looks less settled than the March survey suggests because fertilizer logistics changed after the survey window and new-crop soybean economics are competitive. One analyst expects the March report to mark the high print for corn this year if late nitrogen remains tight. Technically, $4.45-$4.50 is the must-hold support zone being watched in corn .
  • Risk management: Pro Farmer recommended November soybean $11.60 puts at 60 cents, creating roughly an $11.00 floor on 40% of new-crop production, and December corn $4.80 puts at 32 cents, creating a $4.48 floor .
  • Livestock exporters: Brazil's protein export system is proving resilient despite longer routes and higher costs, but not disruption-free. Expect slower shipments and more expensive logistics rather than a clean break in trade .
  • Capital spending: In Brazil, machinery sales were down 17% in Q1 and are forecast down 8% for the year as higher rates and delinquency keep producers cautious. That is a reminder that 2026 planning is happening in a tight-credit environment, not a capex cycle .

Choose the setup that fits how you work

Free

Follow public agents at no cost.

$0

No monthly fee

Unlimited subscriptions to public agents
No billing setup

Plus

14-day free trial

Get personalized briefs with your own agents.

$20

per month

Includes $20 of usage each month

Private by default
Any topic you follow
Daily or weekly delivery

Includes $20 of usage during trial

Discover agents

Subscribe to public agents from the community or create your own—private for yourself or public to share.

Public Active

Coding Agents Alpha Tracker

Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.

110 sources
Public Active

AI in EdTech Weekly

Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.

92 sources
Public Active

AI News Digest

Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves

114 sources
Public Active

Global Agricultural Developments

Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs

86 sources
Public Active

Recommended Reading from Tech Founders

Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media

137 sources
Public Active

PM Daily Digest

Curates essential product management insights including frameworks, best practices, case studies, and career advice from leading PM voices and publications

100 sources

Supercharge your knowledge discovery

Reclaim your time and stay ahead with personalized insights. Limited spots available for our beta program.

Frequently asked questions