Hours of research in one daily brief–on your terms.

Tell us what you need to stay on top of. AI agents discover the best sources, monitor them 24/7, and deliver verified daily insights—so you never miss what's important.

Setup your daily brief agent
Discovering relevant sources...
Syncing sources 0/180...
Extracting information
Generating brief

Recent briefs

Multi-agent reality check: worktree-based parallelism, new Claude Code skills, and Codex 5.3 low-level wins
Feb 28
6 min read
162 docs
Greg Brockman
eigenron
Peter Steinberger
+12
Today’s highest-signal theme: multi-agent setups break down on research rigor, even as raw coding capabilities keep climbing. You’ll get concrete tool updates (Claude Code /batch + /simplify, Remote Control rollout), replicable workflows (spec→async agent run→deploy, worktree-based parallelism), and two watchable clips on long-horizon loops and evaluation scaffolding.

🔥 TOP SIGNAL

Multi-agent coding looks very different when the task isn’t “implement this,” but “do research.” Andrej Karpathy tried running 8 agents (4 Claude + 4 Codex) in parallel on nanochat experiments (1 GPU each) and found the system “doesn’t work” largely because agents’ idea generation and experimental rigor are weak—they skip solid baselines/ablations and run nonsensical variations, even if they can implement well-scoped instructions quickly . His framing: the real target is “programming an organization”—prompts, skills, tools, and rituals (even “daily standup”) become the “org code,” and the eval is how fast that org makes progress on arbitrary tasks .

🛠️ TOOLS & MODELS

  • Claude Code (next version): new Skills /simplify + /batch

    • /simplify: run parallel agents to improve code quality, tune efficiency, and ensure CLAUDE.md compliance.
    • /batch: interactively plan migrations, then execute with dozens of isolated agents using git worktrees; each agent tests before opening a PR .
    • Intended use: automate much of the work to shepherd PRs to production and to do straightforward, parallelizable migrations.
  • Claude Code Remote Control: rolling out to Pro users

    • Rollout: 10% and ramping; Team/Enterprise “coming soon” .
    • Enablement checklist: update to claude v2.1.58+, log out/in, then run /remote-control.
  • GPT-5.3-Codex: “default choice” signals for automation

    • OpenAI’s Tibo Sottiaux: since release in the API, he’s “consistently hearing” at meetups that GPT-5.3-Codex is the model to use to “get actual work done,” and a “clear winner” for background agents / automation at scale.
    • Also notes it’s breaking through on raw coding ability and that “the secret is out” on best results per $.
    • Docs: https://developers.openai.com/api/docs/models/gpt-5.3-codex.
  • Codex 5.3-high: one-shot, low-level infra surgery

    • Reported “one-shotted” task: bypassed HuggingFace KV cache abstraction, monkey-patched attention at module level, handled M-RoPE, coordinated prompt-memory state with KV cache state, and performed granular eviction with span tracking.
    • Greg Brockman points to Codex 5.3 for “complicated software engineering” .
  • Cursor adoption lens (workflow evolution)

    • Karpathy’s sketch of the “optimal setup” evolution as capabilities improve: None → Tab → Agent → Parallel agents → Agent Teams (?) → ???.
    • His process heuristic: 80% of time on what reliably works, 20% exploring the next step up—even if it’s messy .

💡 WORKFLOWS & TRICKS

  • Parallel agents with real isolation: git worktrees are emerging as the default primitive

    • Karpathy’s research-org simulation: each “research program” as a git branch, each scientist forks a feature branch, and git worktrees provide isolation; “simple files” handle comms .
    • Claude Code’s /batch mirrors this: each migration agent runs in full isolation via git worktrees, tests, then opens a PR .
  • “Research org” orchestration pattern (Karpathy): tmux as your control plane

    • One setup: a tmux window grid of interactive agent sessions so you can watch work, and “take over” when needed .
    • His finding: agents are strong at implementation, weak at experiment design (baselines, ablations, runtime/FLOPs controls), so expect humans to still provide taste + rigor .
  • Fast app-to-prod loop with the Codex app (from a live demo)

    • Romain Huet highlights a <30 min workflow: scaffold the app, use docs + Playwright MCP, add features with plan mode, then use skills for OpenAI image generation and Vercel deploy.
    • Demo link: https://x.com/kagigz/status/2027444590895063313.
  • Spec-first → async agent run against a real repo (Simon Willison)

  • Context-window hygiene via “stop-and-reset” loops (Ringo/OpenClaw example)

    • Ringo’s “RALPH loop” executes a task markdown file one step at a time, then stops so the next step starts with a fresh context window.
    • Practical takeaway: if your runs degrade over time, consider deliberately chunking work into restartable steps instead of trying to one-shot long horizons .
  • Safety guardrails for agentic tools with destructive capabilities (OpenClaw talk)

    • Patterns called out: mandatory confirmations for destructive actions, sandboxing/read-only modes, and using a separate phone number/SIM for the bot .
    • Failure mode to design around: rules stored only in the model’s working memory can be lost after context compaction—leading to destructive behavior .
  • Eval realism check: scaffolding juice is real, but overfit risk is too

    • METR’s Joel Becker describes harness/scaffold tuning for high performance on dev tasks while trying to avoid overfitting; they invest heavily in scaffolds to upper bound model capabilities for safety analysis .
    • He also notes how measuring productivity got harder: developers may refuse “AI-disallowed” randomization, and today’s concurrent workflows (multiple issues in parallel) don’t fit old study designs .

👤 PEOPLE TO WATCH

  • Andrej Karpathy — concrete, instrumented look at why “agent research orgs” are still messy: implementation is easy; ideas + rigor are the bottleneck.
  • Boris Cherny (Claude Code) — shipping practical agent “skills” that encode repeatable team workflows: /simplify + /batch, plus Remote Control rollout details .
  • Romain Huet (OpenAI/Codex) — curating high-signal Codex workflows and capability examples (rapid app shipping; low-level infra tasks) .
  • Max Woolf — detailed “skeptic tries agent coding” writeup; notable claim that Opus 4.6/Codex 5.3 feel “an order of magnitude better” for complex tasks than models from months earlier .
  • Simon Willison — repeatable “spec → async agent run → deploy” patterns with publicly inspectable artifacts .

🎬 WATCH & LISTEN

1) OpenClaw Manila — Ringo’s “idea → live prototype” loop (≈24:15–27:55)

How it works under the hood: a ReAct-style loop that writes a task file, executes one task per fresh context window, and uses infra integrations (GitHub/Cloudflare/etc.) to ship prototypes fast .

2) METR (Joel Becker) — harness/scaffold tuning and the overfit trap (≈56:25–57:35)

A grounded explanation of why different harnesses can swing results—and why METR invests in scaffolds to estimate “best possible” model capability without fooling themselves via overfitting .

📊 PROJECTS & REPOS


Editorial take: Raw coding is getting solved; the leverage is moving to orchestration + isolation + guardrails—and the hardest remaining gap is still tasteful, rigorous idea generation, not implementation .

OpenAI’s $110B raise, U.S. defense deployment deals, and DeepSeek V4 signals
Feb 28
10 min read
1015 docs
Sakana AI
Nous Research
Taalas Inc.
+43
OpenAI announces a $110B funding round and expanded infrastructure partnerships, then signs a classified-network deployment agreement with the Department of War as Anthropic faces a supply-chain risk designation and prepares a court challenge. Also: DeepSeek V4 timing signals and systems commits, plus notable advances in video generation, KV-cache efficiency, and privacy-preserving inference wrappers.

Top Stories

1) OpenAI’s $110B funding round reshapes the infra/partner map

Why it matters: This round ties OpenAI’s growth directly to specific cloud + chip roadmaps (AWS/Trainium, NVIDIA systems, Azure API exclusivity), and signals how competitive advantage is increasingly negotiated through infrastructure access and distribution.

OpenAI CEO Sam Altman said OpenAI raised a $110B round from Amazon, NVIDIA, and SoftBank. Reporting in the same cycle pegged the round at a $730B pre-money valuation, with amounts broken out as $50B (Amazon), $30B (SoftBank), $30B (NVIDIA). OpenAI’s own messaging framed this as scaling infrastructure “to bring AI to everyone,” supported by those partners .

Partnership details highlighted publicly include:

  • Amazon/AWS: New enterprise products including a stateful runtime environment and use of Trainium. A separate OpenAI-partner post described co-building a Stateful Runtime for agentic apps on Bedrock, scaling with 2GW of Trainium compute, and creating custom models for Amazon apps.
  • Microsoft: OpenAI said its stateless API will remain exclusive to Azure, alongside plans to build more capacity with Microsoft .
  • NVIDIA: OpenAI described NVIDIA chips as foundational, and said it’s excited to run NVIDIA systems in AWS. NVIDIA also said it’s entering a “next phase” with OpenAI to deploy 5GW on Vera Rubin for training and inference .

On finances, Epoch AI noted the round “nearly triples” OpenAI’s total raised so far and cited a projection (attributed to The Information) of $157B cash burn through 2028, saying this round plus $40B cash on hand roughly matches that projection .

2) OpenAI says it reached a classified-network agreement with the U.S. Department of War

Why it matters: Frontier-lab defense deployments are becoming contract + control-system design problems: what gets prohibited, who enforces it, and what technical safeguards accompany access.

Altman said OpenAI reached an agreement with the Department of War (DoW) to deploy models in its classified network. He said the agreement embeds OpenAI’s safety principles—prohibiting domestic mass surveillance and requiring human responsibility for use of force (including autonomous weapon systems)—and that DoW agrees and reflects these in law/policy and the agreement .

OpenAI also described additional deployment measures:

  • Technical safeguards to ensure model behavior
  • Field Deployed Engineers (FDEs) to help with the models and safety
  • Cloud networks only

Altman further said OpenAI is asking DoW to offer the “same terms to all AI companies,” and expressed a desire to de-escalate away from legal/government actions toward “reasonable agreements” .

A DoW official account characterized the OpenAI contract as flowing from a touchstone of “all lawful use”, referencing legal authorities and mutually agreed safety mechanisms (described as a compromise Anthropic was offered and rejected) .

3) Anthropic faces a U.S. government crackdown; says it will fight a “supply chain risk” label in court

Why it matters: “Supply chain risk” designations and procurement rules can reshape the AI market indirectly—by forcing contractors and cloud ecosystems to pick sides.

Secretary of War Pete Hegseth’s account said the DoW is designating Anthropic a “Supply-Chain Risk to National Security”, and that “no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic,” while allowing Anthropic to provide services for no more than six months for transition . The same post tied this to the President’s directive for the federal government to cease all use of Anthropic’s technology.

Anthropic subsequently said it will challenge any supply chain risk designation in court, arguing the DoW cannot restrict customers’ use of Claude outside of DoW contract work .

Several posts highlighted second-order implications: Anthropic serves models via cloud providers including AWS (primary), Google Cloud, and Azure, and another post argued that since those providers do business with the U.S. military, a literal interpretation could block Anthropic from serving via them .

4) DeepSeek V4 countdown: release timeline + chip collaborations + major systems work

Why it matters: DeepSeek’s next release is being framed as both a capability event and a hardware-optimization event, alongside continued investment in the systems stack.

Multiple posts citing the Financial Times said DeepSeek is set to release DeepSeek-V4 next week, and is working with Chinese AI chipmakers Huawei and Cambricon to optimize V4 for their latest products . The same reporting said a brief technical note will accompany the release, followed by a more comprehensive report about a month later .

Separately, DeepSeek made a “major commit” to DeepGEMM, adding mHC integration, early support for NVIDIA Blackwell (SM100), and FP4 ultra-low precision computing.

There were also signals of a web update: a post claimed DeepSeek was updated with knowledge cutoff May 2025 and 1M token context length, “likely V4 (or V4 lite),” pushed to the web .

5) Kling 3.0 claims the top Text-to-Video spot (with and without audio)

Why it matters: Video generation is rapidly stratifying around quality tiers, audio integration, and leaderboard-driven iteration, with pricing now comparable across leading tools.

Artificial Analysis reported Kling 3.0 (1080p Pro) took #1 in Text-to-Video across both With Audio and Without Audio leaderboards, surpassing Grok Imagine, Runway Gen-4.5, and Veo 3.1 . The release supports up to 15-second generations and native audio, with 1080p (Pro) and 720p (Standard) tiers .

Kling also released Kling 3.0 Omni, a unified multimodal model supporting image/video inputs, editing, and generation; Omni 1080p (Pro) placed #2 in Text-to-Video With Audio and #4 in No Audio . Pricing cited: ~$13/min (1080p Pro, no audio) and ~$20/min (with audio); 720p Standard ~$10/min (no audio) and ~$15/min (with audio) .

Research & Innovation

Why it matters: Several releases this period target practical bottlenecks—long-context cost, KV-cache memory, and stable post-training—which increasingly determine what “agentic” systems can do in production.

  • Instant model customization via Doc-to-LoRA / Text-to-LoRA (Sakana AI): Sakana introduced hypernetwork methods that generate task- or document-specific LoRA adapters “on the fly,” turning customization into a single forward pass rather than fine-tuning or long prompts . Reported results include near-perfect needle-in-a-haystack performance on instances 5× longer than the base model’s context window and sub-second latency for rapid experimentation . A separate summary emphasized Doc-to-LoRA compressing long documents into adapters to avoid repeated context re-reading, improving memory/update latency and serving cost for long-document agents .

  • Self-managed KV cache (NVIDIA SideQuest): SideQuest has the reasoning model decide which tokens remain useful and “garbage collect” the rest, running this management as an auxiliary task so it doesn’t pollute the main context . Trained with 215 samples, it reduced peak token usage by up to 65% with minimal accuracy loss .

  • Off-policy RL for reasoning (Databricks OAPL): Databricks said its OAPL approach shows you don’t need strict on-policy training to improve reasoning . Reported metrics: matches/beats GRPO, remains stable with large policy lag, and uses ~3× fewer training generations .

  • Agentic inference systems (DeepSeek DualPath): A DeepSeek/THU/PKU paper summary described DualPath pooling otherwise-mismatched NIC bandwidth between prefill and decode to move KV cache more efficiently . Reported results: up to 1.87× speedup on DS-660B offline inference and positioning for higher concurrency/lower cost in multi-agent systems with repeated long-context KV-cache access .

  • Physics-aware image editing (PhysicEdit): PhysicEdit reframes editing as a physical state transition and distills transition priors from videos into a latent representation for more physically plausible edits . It introduced the PhysicTran38K dataset (38K video trajectories with reasoning traces) and reported benchmark improvements over prior approaches .

  • Long-term coherence eval (YC Bench): YC Bench simulates “running a startup” for three years to test long-horizon agent coherence . It reported GPT-5.2 (and sometimes Sonnet 4.6) “quickly goes bankrupt” and fails to beat a sub-optimal greedy baseline , while Gemini-3-Flash was described as matching the baseline via multi-stage strategy in the provided scratchpad .

Products & Launches

Why it matters: The ecosystem continues shifting from chat to systems that execute work—with privacy wrappers, agents that run while you’re away, and developer-grade infrastructure for evaluation and ranking.

  • Open Anonymity “unlinkable inference” (privacy wrapper for remote models): Open Anonymity described a “VPN for AI inference” layer that uses decentralized proxies and blind signatures to make requests hard to link back to users across time . It emphasized ephemeral keys per session/request to combat longitudinal tracking and shipped an open chat app, oa-chat, with local chat history and temporary keys for OpenAI calls . Resources: https://chat.openanonymity.ai/ and https://openanonymity.ai/blog/unlinkable-inference/.

  • Hermes Agent (NousResearch) adds OCR/document extraction skill: Hermes Agent is positioned as an open-source agent with multi-level memory and persistent dedicated machine access . A recent update added broad OCR/document extraction (PDFs, ePubs, DocX, PowerPoint, etc.) .

  • Claude Code Remote Control: A rollout to Claude Code Pro users enables “remote control,” with instructions to update to v2.1.58+, log out/in for new flags, and use /remote-control.

  • Gemma on iOS via Google AI Edge Gallery: A post said the Google AI Edge Gallery app brings fully offline, on-device AI to iOS (chat, image Q&A, audio transcription/translation, voice commands), with an App Store link .

  • Perplexity embeddings open-sourced (bidirectional + context-aware variants): Perplexity open-sourced four bidirectional embedding models (0.6B and 4B parameters; standard and context-aware types) . The “context-aware” version processes an entire document so chunks “know” the full document meaning . Collection: https://huggingface.co/collections/perplexity-ai/pplx-embed.

  • Arena-Rank (open-source leaderboard construction): Arena released Arena-Rank, a Python package for statistically grounded, reproducible leaderboards using pairwise comparison data . GitHub: https://github.com/lmarena/arena-rank.

Industry Moves

Why it matters: This week’s biggest competitive moves were about capital + distribution + compute—and about who controls the “control plane” (cloud distribution, identity, and evaluation infrastructure).

  • AWS frames the OpenAI partnership as distribution + runtime + Trainium adoption: Amazon’s CEO described a stateful runtime environment on Amazon Bedrock powered by OpenAI intelligence for developers running OpenAI services on AWS . He also said OpenAI is “going big” on Trainium, describing Trainium as 30–40% more price performant than comparable GPUs, and said AWS will be the exclusive third-party cloud distribution provider for OpenAI Frontier (agent teams) .

  • Microsoft–OpenAI joint statement on AGI definition unchanged: A Microsoft/OpenAI joint statement was shared alongside commentary that the contractual definition and determination process for AGI remains unchanged despite new funding and partnerships . The AGI definition quoted: a system that can perform “most economically valuable tasks better than humans,” and is officially declared AGI by the OpenAI board .

  • Guidde raises $50M (agent training from screen-recordings): Guidde raised $50M to train AI agents on expert screen-recording videos rather than static documentation, claiming 41% reduction in video creation time and 34% fewer support tickets .

  • Taalas launches first product (models encoded into chips): Taalas said it launched its first product after $30M in development by 24 people, emphasizing specialization, speed, and power efficiency . A separate summary described “Hardcore Models” chips that store weights on-chip (mask ROM) and can reach 16–17k tokens/sec inference, with RAM for KV cache and small updates like LoRA .

  • OpenAI Codex usage growth: A post said Codex added 600k users in three weeks, moving from 1M WAU (Feb 4) to 1.6M WAU (Feb 27) .

Policy & Regulation

Why it matters: AI governance is being operationalized through procurement, designations, and deployment constraints, not just principles—and the effects spill into cloud ecosystems and enterprise buyers.

  • Anthropic refuses to enable domestic mass surveillance or fully autonomous weapons: A post quoted Anthropic’s position: “threats do not change our position: we cannot in good conscience accede to their request,” framed as a moral line against enabling mass domestic surveillance and fully autonomous weapons.

  • DoW designates Anthropic a supply-chain risk; broad procurement restrictions announced: DoW’s directive said Anthropic will be designated a Supply-Chain Risk to National Security, and barred contractors/suppliers/partners doing business with the U.S. military from commercial activity with Anthropic, with a ≤6-month transition window . Anthropic says it will challenge the designation in court .

  • OpenAI–DoW agreement highlights “no domestic mass surveillance” and “human responsibility for force”: OpenAI’s agreement announcement reiterated these core principles as incorporated into the contract and reflected in DoW law/policy framing .

  • Pentagon cyber tooling (FT-cited) aims at mapping/exploiting vulnerabilities in Chinese infrastructure: A post citing the FT said the Pentagon is developing AI-powered cyber tools to map and exploit vulnerabilities in Chinese infrastructure (e.g., power grids and sensitive networks), automating reconnaissance and speeding targeting .

Quick Takes

Why it matters: These smaller items collectively show where momentum is compounding: usage scale, agent reliability, model-serving efficiency, and fast-moving leaderboards.

  • ChatGPT scale: ChatGPT crossed 900M weekly users and 50M paying subscribers.
  • ChatGPT Android: The Android app (v1.2026.055) mentions a “Naughty chats” setting for 18+ users .
  • GPT-5.3-Codex cost/throughput notes: Reported as 28% cheaper than GPT-5.2 (xhigh) on Artificial Analysis , with a post also calling it more token efficient than 5.2 . Another post cited 400k context and “extra high thinking” in settings .
  • Open models: Feb Text Arena: Arena’s Top 3 open models were GLM-5 (1455), Qwen-3.5 397B A17B (1454), and Kimi-K2.5 Thinking (1452).
  • vLLM on AMD GPUs: vLLM described ROCm attention backends delivering up to 4.4× decode throughput on AMD GPUs, with model-specific benchmarks (e.g., Qwen3-235B MHA 2.7–4.4× TPS) and a one-env-var enablement path .
  • UI-agent click accuracy fix: Tzafon claimed scaling positional embeddings improved click accuracy from 40% to 80% with no retraining .
  • RF-DETR on Apple MLX: A post said RF-DETR on MLX runs at 100+ FPS on an M4 Pro Mac .
OpenAI’s $110B raise and AWS compute deal; classified-network deployment agreement with DoW safety terms
Feb 28
6 min read
192 docs
Greg Brockman
eigenron
Sam Altman
+16
OpenAI disclosed a $110B funding round and a multi-year AWS partnership centered on Trainium and a “stateful runtime environment,” while also announcing an agreement to deploy models in the DoW’s classified network with explicit safety terms. Elsewhere, Codex adoption and agent tooling kept accelerating, alongside notable research on RL training, tool-calling inference speedups, and privacy-preserving “unlinkable inference.”

OpenAI’s $110B raise + Amazon partnership (compute, chips, and “stateful runtime”)

OpenAI confirms a $110B funding round backed by Amazon, NVIDIA, and SoftBank

OpenAI leaders said the company raised a $110B round from Amazon, NVIDIA, and SoftBank. In a CNBC “Squawk Pod” segment, hosts and guests also described the round as valuing OpenAI at $730B, with Amazon as the largest investor committing $50B (structured as $15B upfront plus $35B tied to milestones) .

Why it matters: the scale and structure (two tranches, milestone-based) is a major signal about how capital is being deployed to secure compute supply and strategic distribution for frontier AI.

AWS becomes a multi-year strategic partner, with Trainium and a “stateful runtime environment” in Bedrock

OpenAI and Amazon announced a multi-year strategic partnership. Reporting in the same segment said OpenAI will consume 2 gigawatts of training capacity through AWS infrastructure—some of it described as exclusive to Amazon.

The partnership centers on a “stateful runtime environment” powered by OpenAI GPT models that will be available in Amazon Bedrock. Amazon CEO Andy Jassy described it as enabling developers to access state (e.g., memory, identity) and call tools/compute “in a stateful way,” claiming “there’s nothing else like that today” .

OpenAI CEO Sam Altman also highlighted Amazon’s Trainium as part of the relationship , while Jassy referenced 30–40% better price from leveraging Trainium for training and said Amazon now has “the two largest AI labs… significantly betting on Trainium” .

OpenAI’s cloud positioning: AWS expansion alongside continued Azure exclusivity for the stateless API

Altman said OpenAI “continue[s] to have a great relationship with Microsoft,” and that its stateless API will remain exclusive to Azure, while OpenAI will “build out much more capacity” with Microsoft .

Why it matters: OpenAI is explicitly describing a split deployment model: Azure exclusivity for one interface, while simultaneously scaling via AWS for other advanced workloads.


Pentagon / “Department of War” flashpoint: OpenAI reaches a classified-network deployment agreement

OpenAI says it reached an agreement to deploy models on the DoW’s classified network—with explicit safety principles

Altman said OpenAI reached an agreement with the Department of War (DoW) to deploy models in its classified network. He said the agreement includes two core safety principles: prohibitions on domestic mass surveillance and human responsibility for the use of force, including autonomous weapon systems—principles he said the DoW agrees with and reflects in law/policy .

Altman also said OpenAI will implement technical safeguards, deploy FDEs, and deploy “on cloud networks only” . He added OpenAI is asking the DoW to offer “these same terms to all AI companies” and said OpenAI wants de-escalation “away from legal and governmental actions” toward “reasonable agreements” .

Context: Anthropic dispute + DPA pressure, and diverging interpretations

Commentary across sources framed OpenAI’s agreement against the backdrop of the Pentagon/DoW dispute with Anthropic and reported talk of using the Defense Production Act (DPA) to compel access . Matt Wolfe summarized the dispute as the Pentagon demanding Claude for “all lawful purposes,” while Anthropic refused to lift safeguards around mass domestic surveillance and fully autonomous weapons.

Some reactions praised the stance as aligned with Anthropic’s position (Ilya Sutskever: “It’s extremely good that Anthropic has not backed down, and it’s significant that OpenAI has taken a similar stance.”) . Others described the outcome as confusing or politically fraught, with Nathan Lambert warning it “stands to fracture the AI community” and separately calling Altman’s move a “weird scooping up of the DoW contract” .

An account labeled @UnderSecretaryF argued the DoW contract approach is grounded in “all lawful use,” references legal authorities, and includes mutually agreed safety mechanisms—calling this a compromise that Anthropic “was offered, and rejected” .

Critics (including Gary Marcus) questioned whether the government effectively offered OpenAI terms similar to what it publicly rejected from Anthropic, and raised concerns about unequal treatment and political influence .


Agents & coding: Codex traction, “agentic” tooling, and enterprise demand signals

Codex usage and momentum: demand spike + weekly growth signals

In the Squawk Pod discussion, Altman said demand is “rapidly growing” and cited Codex as having grown “like 30 something percent in the last week,” as an indicator of enterprise readiness . Separately, @swyx reported OpenAI Codex added 600k users in the last 3 weeks, reaching 1.6M WAU on Feb 27 (up from 1M WAU on Feb 4), and said it is >3× up from Jan 1 (including a Feb 2 app launch) .

Why it matters: OpenAI is pointing to Codex as a near-term adoption barometer, while external tracking highlights unusually fast growth over a short window.

Brockman spotlights Codex 5.3’s software-engineering capability on a complex task

Greg Brockman highlighted Codex 5.3 for “complicated software engineering” , pointing to a description of the model “one-shotting” a task that involved bypassing HuggingFace’s KV cache abstraction, monkey-patching attention at the module level, handling M-RoPE, coordinating prompt-level memory state with KV cache state, and performing “granular surgical eviction with span tracking” .

Agent products continue to converge on “computer use” and long-running workflows

Matt Wolfe described new agent features across vendors, including Cursor agents that can control virtual computers and record videos of their actions , and Microsoft Copilot Tasks (waitlist) that aims to take plans/drafts into “completed tasks” such as building slide decks and booking appointments .

Andrej Karpathy, reacting to Cursor usage trends, described a shifting progression of workflows (“None → Tab → Agent → Parallel agents → Agent Teams (?) → ???”) and argued the practical balance is 80% work in reliable setups and 20% exploration of what’s next .


Research & infrastructure: RL training shift, faster tool-calling inference, and privacy architecture

Scale AI: reinforcement learning becomes the majority of training work (for agents)

Scale AI’s Chetan Rane said more than half of Scale’s training work now involves reinforcement learning (RL)—up from less than a quarter six months ago . The piece describes RL as goal-oriented training meant to push models from “responses” into doing things online, but notes limits around generalization and higher compute needs .

ContextCache: persistent KV caching for tool schemas claims large TTFT gains without quality loss

A paper and code release introduced ContextCache, a persistent KV cache system for tool-calling LLMs that caches prefill states for tool schemas indexed by a content hash and restores them on subsequent requests . The authors report ~200ms cached TTFT from 5 to 50 tools versus full prefill growing from 466ms to 5,625ms, with a 29× speedup at 50 tools (skipping 99% of prompt tokens per request) and “zero quality degradation” on listed metrics .

Paper/code: https://zenodo.org/records/18795189 and https://github.com/spranab/contextcache

Open Anonymity: “unlinkable inference” as a VPN-like layer for AI prompts

The Open Anonymity project described an “unlinkable inference” layer—framed as a “VPN for AI inference”—to reduce longitudinal user tracking by routing requests through decentralized proxies and using blind authentication plus ephemeral keys . The project also released oa-chat, storing chat history locally and sending each query to OpenAI under a temporary key intended to be unlinkable to other queries; Percy Liang said he switched to it as a “convenient drop-in replacement” .

Project links: https://openanonymity.ai/ and blog: https://openanonymity.ai/blog/unlinkable-inference/


Labor + markets: layoffs attributed to AI, and competing interpretations

A Squawk Pod segment described Block’s plan to reduce headcount from 10,000+ to under 6,000, and quoted Jack Dorsey saying “AI tools have changed what it means to run a company” while predicting many companies will make similar structural changes . In contrast, Elad Gil argued many “layoffs due to AI” narratives are near-term corrections for 2020-era over-hiring rather than direct AI effects, adding that most AI productivity impact is “still around [the] bend” .

Why it matters: “AI-driven layoffs” is becoming a public storyline, but even AI investors and executives are split on how much is signal versus post-hiring-cycle adjustment.

Shadow PMs, behavioral evidence, and AI prototyping workflows reshaping product work
Feb 28
10 min read
88 docs
The community for ventures designed to scale rapidly | Read our rules before posting ❤️
Melissa Perri
Julie Zhuo
+6
This edition connects a set of fast-moving PM shifts: AI systems as “Shadow PMs” shaping expectations and acquisition, behavioral data as the grounding layer for better decisions, and agentic execution pushing teams to redesign discovery and decision-making. You’ll also get a concrete AI prototyping workflow, several recent case studies (JLR, The Economist, Ramp/Alloy, Just Eat, Perk), and career resources for building AI-native skills and interview leverage.

Big Ideas

1) Your product is being “decided” before users ever land on it (the Shadow PM)

AI systems increasingly act as the first product manager your users interact with—setting expectations, framing capabilities, and influencing which brand gets chosen . This coincides with a funnel shift where AI can pick a brand and drive a purchase in hours, compressing the traditional multi-week comparison journey .

Why it matters:

  • AI search traffic is described as growing 5x YoY, with 80% of B2B tech buyers using AI as much as traditional search during vendor research—often showing up as “unexplained” direct traffic .

How to apply:

  • Treat AI visibility and accuracy as a product + growth surface you manage explicitly (see Tactical Playbook).

“AI is now your first product manager your users talk to.”


2) AI doesn’t create value by itself; it amplifies the systems (and signals) you already have

Jaguar Land Rover’s Jim Kennedy argues that AI “amplifies the systems that you’ve built,” so weak signals and fragmented systems get amplified too . His practical shift: leaders move from managing backlogs to designing systems and asking better questions grounded in behavioral context and intent .

Why it matters:

  • “Behavioral context” is positioned as what makes AI useful—grounding predictions in real intent rather than averages .

How to apply:

  • Invest in instrumentation and behavioral insight that removes debate and drives prioritization from evidence, not opinion .

3) When execution accelerates, decision-making and discovery become the bottleneck

Multiple threads converge on the same operational shift:

  • Teams can ship far faster with agents (e.g., smaller project teams, rapid shipping cycles), making decision-making the constraint rather than execution speed .
  • In this environment, craft and “taste” still matter: knowing what “excellent” looks like, and whether what you built is important and will resonate with users .

Why it matters:

  • If you don’t redesign how decisions get made (and how discovery stays continuous), faster build cycles can just produce more churn, rework, or misaligned output.

How to apply:

  • Shift PM time from coordination toward user understanding, prioritization, and experiment design .

4) Advancing as a product leader increasingly means thinking like a GM (P&L + distribution + outcomes)

Mastercard’s CPO frames why CPOs rarely become CEOs: only 1% make it, in part because PMs often lack P&L ownership and are seen as a delivery function . Her prescription is to adopt GM behaviors inside the product role:

  • Put revenue in OKRs, and treat pricing and costs as design inputs (not afterthoughts) .
  • Design for distribution channels, not “the product” in isolation (e.g., tailoring to platform channels vs. SME portals) .
  • Translate roadmaps into outcome narratives for senior leadership .

Why it matters:

  • It’s a concrete path to “boardroom language” without waiting for a formal title change.

How to apply:

  • Reframe initiatives as outcome narratives with numbers (e.g., “improve authorization rates, reduce fraud, +3% transactions volume”) .

5) AI ROI is constrained less by tooling and more by organizational “debt” (process, data, and access)

Just Eat Takeaway’s CPO describes AI as an “HD mirror” that exposes technical debt, data silos, and “human glue” workarounds—plus avoidance of root-cause fixes . Their stance: without structured and accessible data, strategy and ROI are “permanently limited” .

Why it matters:

  • They cite having 104 petabytes of data and 570M daily events, but still not being able to use it effectively until they restructured it around intent and business logic .

How to apply:

  • Audit where tribal knowledge substitutes for organizational intelligence, then codify and make context available to machines .

Tactical Playbook

1) “Manage the Shadow PM”: a practical AI visibility loop

A Semrush product leader outlines a three-part approach—prompts, sources, processes—to shape how AI describes and recommends your product .

Step-by-step

  1. Audit per model (not “AI” as one thing). Pick 4–5 platforms and track ~50 queries; check if you’re mentioned, whether it’s accurate, and who wins instead .
  2. Make your product crawlable and extractable. The talk claims most AI crawlers don’t render JavaScript; “JS walls” can drop AI visibility to zero .
  3. Fix docs like they’re churn drivers. Users may screenshot UI issues and ask ChatGPT for help; if docs are clear/crawlable, the user stays—if not, they may blame your product and leave .
  4. Structure content for AI consumption. “Short answer first” chunks + schema markup; the talk cites up to 30% higher citation rates for sites with structured data .
  5. Invest in earned media and third-party sources. The talk claims 77% of brand citations come from third parties (Reddit, G2, LinkedIn, YouTube, industry reviews) .
  6. Close the loop with product outcomes. The proposed flywheel: shape AI perception → accurate recommendations → better activation/retention → customer advocacy → stronger AI signal .

Metrics to watch:

  • AI-referred visits are cited as having 27% lower bounce, 12% more engagement, and 5% higher conversion (attributed to arriving with the right expectations) .

2) AI prototyping workflow (from “idea” to handoff with fewer open questions)

Nadav Abrahami (Wix co-founder, Dazl CEO) lays out an end-to-end workflow for AI prototyping .

Prerequisites (don’t skip):

  • Lock down the problem, user story, and rough shape of the solution in one paragraph before prototyping .

Workflow

  1. Start from your design system. Recreate an existing page from a screenshot as the template .
  2. Generate 3–4 divergent variants. Don’t move forward with only one option .
  3. Prompt for structure; visually edit for fine-tuning. Use manual edits for the last-mile polish .
  4. Build the full end-to-end flow. Connecting pages surfaces edge cases earlier .
  5. Test with your actual users (on video). Prefer users who requested the feature, not a testing platform .

PRD + prototype (new standard):

  • Prototype holds the core 90% flows; PRD covers edge cases, errors, tracking, rollout plan—and can live inside the prototype project for AI context .

Engineering handoff tactic:

  • Share the published prototype link (claimed to answer ~90% of questions), then use Cursor/Claude Code to port interactions into the production codebase .

3) Prevent AI-speed execution from outrunning planning (without “more meetings”)

A startup thread describes a common failure mode: implementation moves ~10x faster with tools like Cursor/Claude Code, but teams still miss edge cases, permissions, and “the thing discussed in Slack” because context capture didn’t speed up too .

Step-by-step

  1. Identify recurring “missed detail” categories (edge cases, permissions, workflow steps) .
  2. Don’t solve it by defaulting to extra meetings (noted as a common but slowing response) .
  3. Embed structured context capture at decision points (where decisions happen), rather than making documentation a separate ritual .

4) Use behavioral analytics to remove debate and ship what users actually complete

Jim Kennedy’s JLR examples emphasize behavioral insights (hesitations, retries, abandonments) that dashboards often don’t show . The claim: behavior data removed debate about intent and changed prioritization quickly .

Step-by-step

  1. Instrument journeys to see “moments” (not just outcomes) .
  2. Use real behavior to decide what ships and what scales—require completion/engagement/value before scaling .
  3. Reframe design from pages to journeys and “joined-up moments” .

5) Make surveys behave like product funnels (so they stop bleeding completions)

A PM describes treating surveys like static forms, with poor completion despite good open rates . The fix was to analyze surveys like funnels.

Step-by-step

  1. Add per-question drop-off tracking to see where users abandon .
  2. Remove high-friction questions (e.g., make a demographic question optional) once you confirm it’s the main abandonment driver .
  3. Avoid grids/matrices on mobile; break into single-focus, tap-friendly steps .

Case Studies & Lessons

1) Jaguar Land Rover: from 58 apps to a unified experience system (and behavior-led shipping)

Starting point: when Kennedy joined, JLR had 58 disconnected apps, little analytics signal, and a 1.2 average store rating .

What changed

  • The program was framed less as tech consolidation and more as reducing customer cognitive load and decision friction .
  • They reframed work around customer experience, platform capability, and value creation—and moved from journeys to “joined-up moments” .

Homepage carousel lesson

  • A “conversational carousel” that looked great in design reviews failed in behavior: users scrolled to only one tile in beta testing .
  • They reprioritized what users saw first, personalized from prior sessions, simplified flow, and only scaled after behavior improved (deeper scrolls, higher completion) .

Platform non-negotiables

  • Design systems over one-off designs
  • Configuration over code (release toggles for brands/services)
  • Platform services as modular backbone (brands express via theming/content)

Key takeaway: fragmentation was described as an operating model problem more than a tech problem .


2) The Economist: building trusted AI voice as an additive bridge (not a replacement)

Problem: Digital content is published Mon–Wed, but the narrated audio edition is created on Thursdays, creating a gap for audio-first users .

Solution: Use text-to-speech (TTS) to bridge the gap .

Outcome: Quantitative results indicated people listened earlier and consumed more audio (not less) .

Trust lesson: Loyal subscribers complained about TTS mistakes (example: “6m” read as “6 meters”), reinforcing that TTS worked as a bridge/additive but wasn’t yet ready to replace human connection .

Design principle: Build an “audio identity” that is distinct and authoritative, anchored in trust (reliable, credible, intimate/relevant) .


3) Vibe coding + cloud playgrounds: from prototype to pull request (and why visibility matters)

Alloy’s CEO describes a 2026 inflection where non-technical PMs prototype directly on real codebases, share prototypes broadly, and sometimes produce pull requests for engineering review .

Ramp example: Ramp built an internal “background agent” system; they reportedly went from 0% of PRs merged via that system to 70% in 2–3 months.

Key takeaway: unlocking collaboration is partly about moving beyond “localhost prototypes” so the rest of the company (and customers) can see and react to work in progress .


4) Perk: “radical urgency” requires explicit trade-offs and tight scope

Perk accelerated a spend product launch from January to November via tight scope and cross-company effort . Trade-offs included shipping with fewer features and concentrating specific team expertise on the milestone .

Their broader principle: speed to learning comes from getting to real users quickly for feedback .


5) Just Eat Takeaway: internal data agent to eliminate handoffs

They developed an internal conversational agent that allows PMs and others to query much of their data, reducing analysis handoffs from hours/calls into seconds/minutes and enabling rapid follow-up questioning cycles .

Key takeaway: data access and transparency were positioned as foundational to ROI and decision speed .


Career Corner

1) Strategy execution often fails in the middle: teach middle managers how their job changes

Melissa Perri argues that companies get stuck when middle management isn’t coached on how their role shifts: setting local strategy that ties back to global strategy (instead of pushing down solutions) . She also emphasizes success needs leadership teams to operate as “one team,” not fiefdoms .

How to apply:

  • Define 1–3 strategic intents (concrete outcomes) and force solution proposals to ladder up to them .

2) “Builder” skills are becoming a PM differentiator—but ship safely

Community posts describe PMs being asked to build AI-generated features and test directly with customers, gaining deeper engineering understanding and firsthand user feedback . A key caution: for complex systems, changes shouldn’t hit production without senior engineer review .

How to apply:

  • Use prototypes to learn and validate quickly, but formalize review gates for production codepaths .

3) Use AI to close the engineering communication gap (without pretending to be an engineer)

Aakash Gupta’s guidance: open the actual project, ask AI to explain architecture and components, and do this for weeks to close the communication gap faster than most alternatives .


4) Job search leverage: AI-native PM skills and interview systems are getting packaged

  • Lenny Rachitsky promoted free live workshops on “The AI-Native PM” across AI workflows, becoming more technical, and product sense & influence—with 75,000+ registrations. Sign-up link: http://bit.ly/ai-native-pm.
  • A free Claude Code–based interview coach (built from interviews with 30 job seekers) offers response scoring, mock interviews with pushback, company-specific prep, and negotiation scripts; v2 adds memory, higher challenge, and clearer next steps . Download: https://github.com/noamseg/interview-coach-skill.

Tools & Resources

Progress, permissions, and evidence: Stripe’s agentic-commerce framework, an AI narrative check, and a nutrigenomics breakthrough
Feb 28
4 min read
181 docs
All-In Podcast
Keith Rabois
David Sacks
+1
Today’s strongest cluster is “progress and its constraints”: a standout recommendation of Stripe’s 2025 Annual Letter for its agentic-commerce levels and regulatory framing, plus a companion “build fast” resource list. Also included: a check on AI overconfidence via Derek Thompson, a high-impact Arc/Cell nutrigenomics paper, and a history prompt from Keith Rabois.

Most compelling recommendation: Stripe’s 2025 Annual Letter — a concrete framework for “agentic commerce” (plus real payments data)

Stripe’s 2025 Annual Letter (annual letter) — Patrick and John Collison

  • Link/URL: https://x.com/stripe/status/2026294241450979364?s=20
  • Recommended by: Packy McCormick (Not Boring)
  • Key takeaway (as shared):
    • Packy calls it “one of the best pieces of economic writing you’ll read all year,” citing Stripe’s view into the internet economy via its data and writing .
    • He highlights Stripe’s “agentic commerce” framework: five levels, from agents that merely fill checkout forms to agents that anticipate needs and buy before you ask—while noting we’re “hovering between levels 1 and 2” .
    • He pulls out the “Republic of Permissions” idea: technology success depends on whether regulators, committees, and courts allow deployment .
  • Why it matters: This is a rare combination of a staged framework (how agent-driven buying might evolve) plus operational-market observations about the constraints that can block adoption .

A second layer in the same theme: building fast vs. a “Republic of Permissions”

“Fast” projects list (article/resource list) — Patrick Collison

  • Link/URL: https://patrickcollison.com/fast
  • Recommended by: Packy McCormick (Not Boring)
  • Key takeaway (as shared): Packy frames it as the “canonical list of projects from a time when we were able to build fast” .
  • Why it matters: Paired with the “Republic of Permissions” framing, it’s a useful contrast: what rapid building looked like vs. the modern environment where permissions can determine outcomes .

AI discourse: a pushback against overconfident forecasting

“Nobody Knows Anything” (article) — Derek Thompson

  • Link/URL: Not provided in the sources
  • Recommended by: David Sacks (on a YouTube podcast episode)
  • Key takeaway (as shared): Sacks points to Thompson’s argument that no one really knows what will happen with AI in two years (let alone 20), and that AI debate can become “science fiction writing masquerading as analysis” — a “marketplace of competing science fiction narratives” amid high uncertainty and limited real-time macro evidence .
  • Why it matters: This is a reading pick that implicitly raises the bar on evidence when evaluating big AI claims—and is useful as a lens for separating narrative from analysis .

Research: a nutrigenomics result worth reading end-to-end

“Vitamin B2 and B3 nutrigenomics reveals a therapy for NAXD disease” (research paper; Cell) — Arc Institute

  • Link/URL: https://www.cell.com/cell/fulltext/S0092-8674(26)00109-1
  • Recommended by: Packy McCormick, via Ulkar Aghayeva’s Scientific Breakthroughs curation
  • Key takeaway (as shared):
    • The work runs a genome-wide CRISPR screen (in K562 cancer cells) using vitamins B2 and B3 to identify genetic diseases responsive to vitamin supplementation; NAXD emerged as the top hit for vitamin B3.
    • The team reports that adding vitamin B3 to knockout mice food from birth increased lifespan more than 40-fold.
  • Why it matters: It’s a concrete example of how “vitamin biology” may still contain “unexplored territory in nutritional genomics,” and it raises the question of how many diseases might be addressable through similar supplementation strategies .

Additional source links surfaced alongside the recommendation:

History (as a prompt, not a thesis)

“Secret History” (article/blog post) — Steve Blank

  • Link/URL: https://steveblank.com/secret-history/
  • Recommended by: Keith Rabois (X)
  • Key takeaway (as shared): Rabois frames it as “Time for some of you to learn history” and points directly to Blank’s post .
  • Why it matters: A direct nudge from an investor to revisit a specific historical account—useful when you want to anchor current debates in prior cycles rather than novelty alone .

Lower-signal but notable: a book recommendation framed as a gift

Running Down a Dream (book) — Bill Gurley

  • Link/URL: Not provided in the sources
  • Recommended by: Jason Calacanis (with agreement from Chamath) on a YouTube podcast episode
  • Key takeaway (as shared): Calacanis calls it an “amazing new book,” urges people to “buy three copies” to give to “two young people and a parent,” and describes it as “inspiring for kids”; Chamath adds: “this book is incredible” .
  • Why it matters: While the endorsement is strong, it’s light on specifics; the clearest signal here is intended audience and use: a book they see as worth distributing to younger readers and families .
Biofuels mandates near decision as drought and South American harvest risk steer ag markets
Feb 28
9 min read
171 docs
homesteading, farming, gardening, self sufficiency and country life
农业致富经 Agriculture And Farming
Successful Farming
+11
Grain and livestock markets stayed highly headline-driven, with drought and wheat risk premium, mixed export signals for corn and soybeans, and biofuels mandates moving toward a late-March decision. This digest also highlights practical technology and management advances—from new soybean traits and AI decision tools to robotics and livestock best practices—plus key regional supply and trade developments across the U.S., Brazil, and Argentina.

1) Market Movers

Grains & oilseeds (U.S. and global)

  • Wheat: Price strength was tied to weather risk and positioning—dryness/wind and drought across parts of the U.S. winter wheat belt (noted as ~50% in drought) and rain forecasts being removed supported a weather premium . Separate coverage also pointed to drought spreading in Kansas and Oklahoma as a driver behind a futures jump . Market commentary added that Chicago wheat posted new highs while KC lagged, with the move framed as a longer rally and a “weather scare” .

  • Corn: Corn followed wheat higher, but multiple sources emphasized the overhang of a ~2 billion bushel carryout as a restraint without new demand catalysts . Fresh export demand signals included:

    • A reported flash export sale to “unknown” in market commentary .
    • Private exporters reporting 257,000 MT of corn sold for delivery to unknown destinations (MY 2025/2026) .
  • Soybeans: Soybeans hit a 2.5-month high, then pulled back after weekly export sales were reported down 49% versus the previous week . Trade uncertainty was part of the narrative, with a report saying negotiations toward a commodity deal hit a rough spot even as they progressed toward a March 31 summit.

  • Biofuels-driven demand (soybean oil, corn, sorghum): Several updates converged on biofuel policy as a key demand lever.

    • EPA’s RVO proposal was sent to the White House/OMB, with reporting that the rule was likely finalized by end of March.
    • Initial RVO proposals for 2026–2027 showed a sharp increase for biomass-based diesel, with discussion of ~5.25–5.61B gallons and framing that this could exceed the current 3.35B gallons by more than 2B gallons .
    • A Trump administration plan would require large refiners to cover at least 50% of blending volumes previously waived under small refinery exemptions, potentially increasing demand for blending credits .
    • USDA soybean oil use was discussed as potentially reaching 17 billion pounds under scenarios viewed as positive by industry voices .

Livestock (U.S. and Brazil)

  • U.S. cattle: Futures were sharply lower on the week (live cattle $243.29, April live cattle $232.23, March feeder cattle $355.43) while box beef moved higher (Choice $377.89, Select $370.79) . Weekly cattle slaughter was 516k head (down 50k YoY) and YTD slaughter was down 10.1%. USDA leadership was cited as saying there were no plans anytime soon to reopen points of entry for live cattle imports from Mexico .

  • U.S. hogs: National base carcass price was $89.34 (up $3.03 WoW) and nearby lean hog futures were $95.73 (up $2.05 WoW) . Hog slaughter was 2.516M head (up 23k WoW; down 9,170 YoY) with YTD slaughter down 2.1%.

  • Brazil poultry and eggs (São Paulo): Live broilers averaged R$5.04/kg (down 2.1% vs January; lowest real level since May 2024) and purchasing power versus corn/soy meal slipped month-over-month . Egg prices rose ~37%: R$147.98 (white, extra; 30-dozen box) and R$166.57 (red) .

2) Innovation Spotlight

Crop protection & traits (U.S.)

  • BASF “Nemosphere” soybean trait (targeting soybean cyst nematode): Described as the first biotech trait to control soybean cyst nematode, with a 2028 market target . SCN was characterized as the “number one yield robber” for 52 years and taking at least $1.5B of value out of the market today . The trait was also described as bringing a fourth herbicide mode of action to soybeans (HPPD tolerance enabling mesotrione pre-emergence), and panel examples suggested potential 20–30 bu differences in affected fields .

  • Fungicide results: A BASF fungicide trial set involving 1,800 farmers was described as showing 20–40 bu/acre yield differences versus untreated comparisons, with emphasis on planned applications for disease pressure (e.g., southern rust, tar spot) .

  • Decision support / verification tools (Xarvio): Growers were encouraged to work with local retailers to use Xarvio to capture existing practices (no-till, nitrogen stabilizers, cover crops) and qualify in a “five steps” process; the tool was also described as supporting fungicide timing alerts and seed/variety recommendations, noting that a wrong variety decision can cost 10–20 bushels.

Mechanization & robotics (global)

  • China agricultural robotics (field + greenhouse):

    • “小甜甜” was described as a robot system capable of full-cycle, unmanned rice production (plowing through harvest), with 100+ robot models deployed across China and exported, including a cited procurement demand of 100,000+ units.
    • A separate example described a field robot achieving ~8× manual efficiency in harvesting a taro-like crop, along with reported electricity use of ~2 kWh/mu at ~0.5 RMB/kWh.
  • Swine production analytics (PIC): Digital imaging/AI was described as enabling behavior recording on “thousands and thousands” of pigs, with measurable and heritable behavior traits for potential genetic selection . Camera-based phenotyping of feet/legs was described as three times more accurate/heritable than humans and used to predict longevity in sow herds .

Equipment upgrades (U.S./Europe)

  • High-horsepower tractors:

    • John Deere highlighted the 8R 540 within a high-horsepower 8R lineup (440/490/540), framed around wider implements, faster speeds, and fixed-frame maneuverability with 4WD power .
    • New Holland rolled out the T7 XD series (T7.360 XD, T7.390 XD, T7.440 XD) delivering up to 435 horsepower for haulage, silage, planting, and tillage .
  • Dairy feeding system upgrade (UK): A farm moved from a 12m³ to a 20m³ Keenan mixer feeder to reduce overloading and shift toward a single larger cow mix rather than split mixes, with expectations of more milk from a better/accurate mix .

3) Regional Developments

United States

  • USDA FY26 agricultural outlook (trade): The U.S. ag trade deficit was forecast at $29B, described as an improvement of $14.7B from FY25 and $8B versus December 2025 projections, tied to record export performance . Forecast record export components included:

    • Dairy: +15% by end FY25, led by demand for U.S. cheese and butter (growth cited in Mexico/Canada/EU) .
    • Corn: +29% projected record volumes (supported by sustained global demand) .
    • Ethanol: +11% forecast record exports (shipments cited to Canada/EU/UK/India) .
  • Drought and wildfire backdrop: Coverage cited 74% of the lower 48 in drought, high winds fueling wildfires burning 400,000+ acres, and extremely low Midwest snowpack heading into spring planting .

South America

  • Brazil soy (West Bahia): The 2025–26 soybean season was described as nearing final harvest phase with expectations of >9M tons and dryland yields of 65–70 sacks/ha, while rain (mid-March) was cited as a risk to harvest execution .

  • Brazil soy (Mato Grosso): Soy harvest was reported at 66% complete with the pace slow and behind last year due to more than 30 days of heavy rain; second-crop corn planting was reported 65% complete by Feb 20, also described as delayed, with producers citing losses from persistent rain .

  • Argentina corn: Notes cited a record corn production expectation of 62 million tons, 26% above last year, with harvest set to begin soon .

Europe–Mercosul trade lane (Brazil and Mercosul exporters)

  • Provisional EU–Mercosul application: The EU was reported to apply an interim trade deal eliminating roughly €4B in tariffs after 25 years of negotiations . Key implementation details and constraints included:
    • Meat quota barrier reduction: a 20% cut in quota barriers this year, with full quota expansion after 5 years (beef and poultry highlighted) .
    • Tariffs: progressive reductions with agricultural products generally waiting ~4 years, and some lines up to 15 years.
    • Legal/political risk: safeguards demanded by Italy were described as unclear, and a court review could create uncertainty over 18–24 months; margins in European approvals were characterized as narrow .

4) Best Practices

Crop planning and execution

  • Use decision tools to document practices and reduce preventable yield loss: Xarvio was described as a way to document existing practices for scoring/qualification (no-till, nitrogen stabilizers, cover crops), and as providing seed/variety guidance where variety misfit can cost 10–20 bushels.

  • Plan disease control (don’t chase it): BASF’s fungicide messaging emphasized “planned application” versus catching up to disease, alongside cited yield differences versus untreated comparisons .

Livestock management (practical, field-level)

  • Piglets (post-weaning mortality reduction): A case study attributed high post-weaning losses to cold stress and an abrupt switch to fermented feed; recommended actions included insulation lamps/dry bedding and using starter feed for ~2 weeks with a gradual transition .

  • Predator losses in open-range sheep systems: Drone herding was described as providing rapid aerial oversight and noise deterrence, reducing annual losses from ~15% to ~5% without adding herders .

  • Aquaculture feed hygiene: In grass carp systems fed soaked fava beans for “crisp meat,” uneaten beans settling and fermenting were linked to reduced intake/quality; daily removal of leftover beans before new feeding was presented as the fix .

Storage and on-farm maintenance

  • Grain bin maintenance: A “13 grain bin checkup tips” resource was shared for keeping bins in condition for grain storage .

Soil and garden management (specialty/small-scale)

  • Black walnut juglone mitigation: Guidance noted juglone sensitivity in some crops (especially nightshades), with mitigation options including raised beds and root barriers and locating plantings outside the 15–20m root zone .

5) Input Markets

  • Input cost direction (U.S. forecasts): Fertilizer was forecast to decrease 1.4%, seeds 1.3%, fuel nearly 7%, and pesticides 8.3%.

  • Tariffs and fertilizer availability risk (U.S.): A new 10% U.S. tariff was reported under Section 122 authority, with exemptions including several fertilizer products (e.g., urea, ammonium nitrate, UAN, ammonium sulfate; DAP and MAP also cited in one version), while products like ammonia and sulfuric acid were described as not exempt unless imported under USMCA . Ag groups urged policy certainty and avoiding tariffs on agricultural inputs .

  • Biofuel policy uncertainty showing up in production: Iowa biodiesel production was reported down nearly 25% in 2025, with industry calling for policy certainty while plants awaited the RFS rule .

  • Bridge assistance program (U.S.): USDA described the Farmer Bridge Assistance Program as offering one-time bridge payments tied to temporary trade disruptions and higher production costs, with enrollment open through April 17, 2026 (details: http://fsa.usda.gov/fba) .

6) Forward Outlook

  • Biofuels policy timeline (U.S.): With the RVO proposal already moved to White House/OMB review, reporting suggested a final rule could arrive by end of March, keeping soybean oil and corn demand expectations headline-sensitive into early spring .

  • Planting intentions sensitivity (U.S.): Commentary highlighted the RVO outcome and trade developments with China as key factors influencing acreage decisions and market tone, potentially leaving acreage clarity to later surveys if timing slips .

  • South American weather execution risk (Brazil): March rainfall was described as supportive for second-crop corn development but disruptive to fieldwork in several areas—above-average rainfall in Brazil’s Southeast was flagged as a challenge for producers who missed February second-crop corn planting windows, while short-term “windows” were emphasized for advancing soybean harvest before heavier rains return .

  • EU–Mercosul agreement planning caution: Multiple segments stressed that provisional application can support near-term commercial activity (tariffs/quotas), but exporters and producers may need to avoid planning that assumes permanence given legal reviews and narrow political margins .

Your time, back.

An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers one verified daily brief.

Save hours

AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.

Full control over the agent

Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.

Verify every claim

Citations link to the original source and the exact span.

Discover sources on autopilot

Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.

Multi-media sources

Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.

Private or Public

Create private agents for yourself, publish public ones, and subscribe to agents from others.

Get your briefs in 3 steps

1

Describe your goal

Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.

Stay updated on space exploration and electric vehicle innovations
Daily newsletter on AI news and research
Track startup funding trends and venture capital insights
Latest research on longevity, health optimization, and wellness breakthroughs
Auto-discover sources

2

Confirm your sources and launch

Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.

Discovering relevant sources...
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter · Gergely Orosz
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile ·
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS · Ben Thompson
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter · Gergely Orosz
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile ·
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS · Ben Thompson

3

Receive verified daily briefs

Get concise, daily updates with precise citations directly in your inbox. You control the focus, style, and length.

Multi-agent reality check: worktree-based parallelism, new Claude Code skills, and Codex 5.3 low-level wins
Feb 28
6 min read
162 docs
Greg Brockman
eigenron
Peter Steinberger
+12
Today’s highest-signal theme: multi-agent setups break down on research rigor, even as raw coding capabilities keep climbing. You’ll get concrete tool updates (Claude Code /batch + /simplify, Remote Control rollout), replicable workflows (spec→async agent run→deploy, worktree-based parallelism), and two watchable clips on long-horizon loops and evaluation scaffolding.

🔥 TOP SIGNAL

Multi-agent coding looks very different when the task isn’t “implement this,” but “do research.” Andrej Karpathy tried running 8 agents (4 Claude + 4 Codex) in parallel on nanochat experiments (1 GPU each) and found the system “doesn’t work” largely because agents’ idea generation and experimental rigor are weak—they skip solid baselines/ablations and run nonsensical variations, even if they can implement well-scoped instructions quickly . His framing: the real target is “programming an organization”—prompts, skills, tools, and rituals (even “daily standup”) become the “org code,” and the eval is how fast that org makes progress on arbitrary tasks .

🛠️ TOOLS & MODELS

  • Claude Code (next version): new Skills /simplify + /batch

    • /simplify: run parallel agents to improve code quality, tune efficiency, and ensure CLAUDE.md compliance.
    • /batch: interactively plan migrations, then execute with dozens of isolated agents using git worktrees; each agent tests before opening a PR .
    • Intended use: automate much of the work to shepherd PRs to production and to do straightforward, parallelizable migrations.
  • Claude Code Remote Control: rolling out to Pro users

    • Rollout: 10% and ramping; Team/Enterprise “coming soon” .
    • Enablement checklist: update to claude v2.1.58+, log out/in, then run /remote-control.
  • GPT-5.3-Codex: “default choice” signals for automation

    • OpenAI’s Tibo Sottiaux: since release in the API, he’s “consistently hearing” at meetups that GPT-5.3-Codex is the model to use to “get actual work done,” and a “clear winner” for background agents / automation at scale.
    • Also notes it’s breaking through on raw coding ability and that “the secret is out” on best results per $.
    • Docs: https://developers.openai.com/api/docs/models/gpt-5.3-codex.
  • Codex 5.3-high: one-shot, low-level infra surgery

    • Reported “one-shotted” task: bypassed HuggingFace KV cache abstraction, monkey-patched attention at module level, handled M-RoPE, coordinated prompt-memory state with KV cache state, and performed granular eviction with span tracking.
    • Greg Brockman points to Codex 5.3 for “complicated software engineering” .
  • Cursor adoption lens (workflow evolution)

    • Karpathy’s sketch of the “optimal setup” evolution as capabilities improve: None → Tab → Agent → Parallel agents → Agent Teams (?) → ???.
    • His process heuristic: 80% of time on what reliably works, 20% exploring the next step up—even if it’s messy .

💡 WORKFLOWS & TRICKS

  • Parallel agents with real isolation: git worktrees are emerging as the default primitive

    • Karpathy’s research-org simulation: each “research program” as a git branch, each scientist forks a feature branch, and git worktrees provide isolation; “simple files” handle comms .
    • Claude Code’s /batch mirrors this: each migration agent runs in full isolation via git worktrees, tests, then opens a PR .
  • “Research org” orchestration pattern (Karpathy): tmux as your control plane

    • One setup: a tmux window grid of interactive agent sessions so you can watch work, and “take over” when needed .
    • His finding: agents are strong at implementation, weak at experiment design (baselines, ablations, runtime/FLOPs controls), so expect humans to still provide taste + rigor .
  • Fast app-to-prod loop with the Codex app (from a live demo)

    • Romain Huet highlights a <30 min workflow: scaffold the app, use docs + Playwright MCP, add features with plan mode, then use skills for OpenAI image generation and Vercel deploy.
    • Demo link: https://x.com/kagigz/status/2027444590895063313.
  • Spec-first → async agent run against a real repo (Simon Willison)

  • Context-window hygiene via “stop-and-reset” loops (Ringo/OpenClaw example)

    • Ringo’s “RALPH loop” executes a task markdown file one step at a time, then stops so the next step starts with a fresh context window.
    • Practical takeaway: if your runs degrade over time, consider deliberately chunking work into restartable steps instead of trying to one-shot long horizons .
  • Safety guardrails for agentic tools with destructive capabilities (OpenClaw talk)

    • Patterns called out: mandatory confirmations for destructive actions, sandboxing/read-only modes, and using a separate phone number/SIM for the bot .
    • Failure mode to design around: rules stored only in the model’s working memory can be lost after context compaction—leading to destructive behavior .
  • Eval realism check: scaffolding juice is real, but overfit risk is too

    • METR’s Joel Becker describes harness/scaffold tuning for high performance on dev tasks while trying to avoid overfitting; they invest heavily in scaffolds to upper bound model capabilities for safety analysis .
    • He also notes how measuring productivity got harder: developers may refuse “AI-disallowed” randomization, and today’s concurrent workflows (multiple issues in parallel) don’t fit old study designs .

👤 PEOPLE TO WATCH

  • Andrej Karpathy — concrete, instrumented look at why “agent research orgs” are still messy: implementation is easy; ideas + rigor are the bottleneck.
  • Boris Cherny (Claude Code) — shipping practical agent “skills” that encode repeatable team workflows: /simplify + /batch, plus Remote Control rollout details .
  • Romain Huet (OpenAI/Codex) — curating high-signal Codex workflows and capability examples (rapid app shipping; low-level infra tasks) .
  • Max Woolf — detailed “skeptic tries agent coding” writeup; notable claim that Opus 4.6/Codex 5.3 feel “an order of magnitude better” for complex tasks than models from months earlier .
  • Simon Willison — repeatable “spec → async agent run → deploy” patterns with publicly inspectable artifacts .

🎬 WATCH & LISTEN

1) OpenClaw Manila — Ringo’s “idea → live prototype” loop (≈24:15–27:55)

How it works under the hood: a ReAct-style loop that writes a task file, executes one task per fresh context window, and uses infra integrations (GitHub/Cloudflare/etc.) to ship prototypes fast .

2) METR (Joel Becker) — harness/scaffold tuning and the overfit trap (≈56:25–57:35)

A grounded explanation of why different harnesses can swing results—and why METR invests in scaffolds to estimate “best possible” model capability without fooling themselves via overfitting .

📊 PROJECTS & REPOS


Editorial take: Raw coding is getting solved; the leverage is moving to orchestration + isolation + guardrails—and the hardest remaining gap is still tasteful, rigorous idea generation, not implementation .

OpenAI’s $110B raise, U.S. defense deployment deals, and DeepSeek V4 signals
Feb 28
10 min read
1015 docs
Sakana AI
Nous Research
Taalas Inc.
+43
OpenAI announces a $110B funding round and expanded infrastructure partnerships, then signs a classified-network deployment agreement with the Department of War as Anthropic faces a supply-chain risk designation and prepares a court challenge. Also: DeepSeek V4 timing signals and systems commits, plus notable advances in video generation, KV-cache efficiency, and privacy-preserving inference wrappers.

Top Stories

1) OpenAI’s $110B funding round reshapes the infra/partner map

Why it matters: This round ties OpenAI’s growth directly to specific cloud + chip roadmaps (AWS/Trainium, NVIDIA systems, Azure API exclusivity), and signals how competitive advantage is increasingly negotiated through infrastructure access and distribution.

OpenAI CEO Sam Altman said OpenAI raised a $110B round from Amazon, NVIDIA, and SoftBank. Reporting in the same cycle pegged the round at a $730B pre-money valuation, with amounts broken out as $50B (Amazon), $30B (SoftBank), $30B (NVIDIA). OpenAI’s own messaging framed this as scaling infrastructure “to bring AI to everyone,” supported by those partners .

Partnership details highlighted publicly include:

  • Amazon/AWS: New enterprise products including a stateful runtime environment and use of Trainium. A separate OpenAI-partner post described co-building a Stateful Runtime for agentic apps on Bedrock, scaling with 2GW of Trainium compute, and creating custom models for Amazon apps.
  • Microsoft: OpenAI said its stateless API will remain exclusive to Azure, alongside plans to build more capacity with Microsoft .
  • NVIDIA: OpenAI described NVIDIA chips as foundational, and said it’s excited to run NVIDIA systems in AWS. NVIDIA also said it’s entering a “next phase” with OpenAI to deploy 5GW on Vera Rubin for training and inference .

On finances, Epoch AI noted the round “nearly triples” OpenAI’s total raised so far and cited a projection (attributed to The Information) of $157B cash burn through 2028, saying this round plus $40B cash on hand roughly matches that projection .

2) OpenAI says it reached a classified-network agreement with the U.S. Department of War

Why it matters: Frontier-lab defense deployments are becoming contract + control-system design problems: what gets prohibited, who enforces it, and what technical safeguards accompany access.

Altman said OpenAI reached an agreement with the Department of War (DoW) to deploy models in its classified network. He said the agreement embeds OpenAI’s safety principles—prohibiting domestic mass surveillance and requiring human responsibility for use of force (including autonomous weapon systems)—and that DoW agrees and reflects these in law/policy and the agreement .

OpenAI also described additional deployment measures:

  • Technical safeguards to ensure model behavior
  • Field Deployed Engineers (FDEs) to help with the models and safety
  • Cloud networks only

Altman further said OpenAI is asking DoW to offer the “same terms to all AI companies,” and expressed a desire to de-escalate away from legal/government actions toward “reasonable agreements” .

A DoW official account characterized the OpenAI contract as flowing from a touchstone of “all lawful use”, referencing legal authorities and mutually agreed safety mechanisms (described as a compromise Anthropic was offered and rejected) .

3) Anthropic faces a U.S. government crackdown; says it will fight a “supply chain risk” label in court

Why it matters: “Supply chain risk” designations and procurement rules can reshape the AI market indirectly—by forcing contractors and cloud ecosystems to pick sides.

Secretary of War Pete Hegseth’s account said the DoW is designating Anthropic a “Supply-Chain Risk to National Security”, and that “no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic,” while allowing Anthropic to provide services for no more than six months for transition . The same post tied this to the President’s directive for the federal government to cease all use of Anthropic’s technology.

Anthropic subsequently said it will challenge any supply chain risk designation in court, arguing the DoW cannot restrict customers’ use of Claude outside of DoW contract work .

Several posts highlighted second-order implications: Anthropic serves models via cloud providers including AWS (primary), Google Cloud, and Azure, and another post argued that since those providers do business with the U.S. military, a literal interpretation could block Anthropic from serving via them .

4) DeepSeek V4 countdown: release timeline + chip collaborations + major systems work

Why it matters: DeepSeek’s next release is being framed as both a capability event and a hardware-optimization event, alongside continued investment in the systems stack.

Multiple posts citing the Financial Times said DeepSeek is set to release DeepSeek-V4 next week, and is working with Chinese AI chipmakers Huawei and Cambricon to optimize V4 for their latest products . The same reporting said a brief technical note will accompany the release, followed by a more comprehensive report about a month later .

Separately, DeepSeek made a “major commit” to DeepGEMM, adding mHC integration, early support for NVIDIA Blackwell (SM100), and FP4 ultra-low precision computing.

There were also signals of a web update: a post claimed DeepSeek was updated with knowledge cutoff May 2025 and 1M token context length, “likely V4 (or V4 lite),” pushed to the web .

5) Kling 3.0 claims the top Text-to-Video spot (with and without audio)

Why it matters: Video generation is rapidly stratifying around quality tiers, audio integration, and leaderboard-driven iteration, with pricing now comparable across leading tools.

Artificial Analysis reported Kling 3.0 (1080p Pro) took #1 in Text-to-Video across both With Audio and Without Audio leaderboards, surpassing Grok Imagine, Runway Gen-4.5, and Veo 3.1 . The release supports up to 15-second generations and native audio, with 1080p (Pro) and 720p (Standard) tiers .

Kling also released Kling 3.0 Omni, a unified multimodal model supporting image/video inputs, editing, and generation; Omni 1080p (Pro) placed #2 in Text-to-Video With Audio and #4 in No Audio . Pricing cited: ~$13/min (1080p Pro, no audio) and ~$20/min (with audio); 720p Standard ~$10/min (no audio) and ~$15/min (with audio) .

Research & Innovation

Why it matters: Several releases this period target practical bottlenecks—long-context cost, KV-cache memory, and stable post-training—which increasingly determine what “agentic” systems can do in production.

  • Instant model customization via Doc-to-LoRA / Text-to-LoRA (Sakana AI): Sakana introduced hypernetwork methods that generate task- or document-specific LoRA adapters “on the fly,” turning customization into a single forward pass rather than fine-tuning or long prompts . Reported results include near-perfect needle-in-a-haystack performance on instances 5× longer than the base model’s context window and sub-second latency for rapid experimentation . A separate summary emphasized Doc-to-LoRA compressing long documents into adapters to avoid repeated context re-reading, improving memory/update latency and serving cost for long-document agents .

  • Self-managed KV cache (NVIDIA SideQuest): SideQuest has the reasoning model decide which tokens remain useful and “garbage collect” the rest, running this management as an auxiliary task so it doesn’t pollute the main context . Trained with 215 samples, it reduced peak token usage by up to 65% with minimal accuracy loss .

  • Off-policy RL for reasoning (Databricks OAPL): Databricks said its OAPL approach shows you don’t need strict on-policy training to improve reasoning . Reported metrics: matches/beats GRPO, remains stable with large policy lag, and uses ~3× fewer training generations .

  • Agentic inference systems (DeepSeek DualPath): A DeepSeek/THU/PKU paper summary described DualPath pooling otherwise-mismatched NIC bandwidth between prefill and decode to move KV cache more efficiently . Reported results: up to 1.87× speedup on DS-660B offline inference and positioning for higher concurrency/lower cost in multi-agent systems with repeated long-context KV-cache access .

  • Physics-aware image editing (PhysicEdit): PhysicEdit reframes editing as a physical state transition and distills transition priors from videos into a latent representation for more physically plausible edits . It introduced the PhysicTran38K dataset (38K video trajectories with reasoning traces) and reported benchmark improvements over prior approaches .

  • Long-term coherence eval (YC Bench): YC Bench simulates “running a startup” for three years to test long-horizon agent coherence . It reported GPT-5.2 (and sometimes Sonnet 4.6) “quickly goes bankrupt” and fails to beat a sub-optimal greedy baseline , while Gemini-3-Flash was described as matching the baseline via multi-stage strategy in the provided scratchpad .

Products & Launches

Why it matters: The ecosystem continues shifting from chat to systems that execute work—with privacy wrappers, agents that run while you’re away, and developer-grade infrastructure for evaluation and ranking.

  • Open Anonymity “unlinkable inference” (privacy wrapper for remote models): Open Anonymity described a “VPN for AI inference” layer that uses decentralized proxies and blind signatures to make requests hard to link back to users across time . It emphasized ephemeral keys per session/request to combat longitudinal tracking and shipped an open chat app, oa-chat, with local chat history and temporary keys for OpenAI calls . Resources: https://chat.openanonymity.ai/ and https://openanonymity.ai/blog/unlinkable-inference/.

  • Hermes Agent (NousResearch) adds OCR/document extraction skill: Hermes Agent is positioned as an open-source agent with multi-level memory and persistent dedicated machine access . A recent update added broad OCR/document extraction (PDFs, ePubs, DocX, PowerPoint, etc.) .

  • Claude Code Remote Control: A rollout to Claude Code Pro users enables “remote control,” with instructions to update to v2.1.58+, log out/in for new flags, and use /remote-control.

  • Gemma on iOS via Google AI Edge Gallery: A post said the Google AI Edge Gallery app brings fully offline, on-device AI to iOS (chat, image Q&A, audio transcription/translation, voice commands), with an App Store link .

  • Perplexity embeddings open-sourced (bidirectional + context-aware variants): Perplexity open-sourced four bidirectional embedding models (0.6B and 4B parameters; standard and context-aware types) . The “context-aware” version processes an entire document so chunks “know” the full document meaning . Collection: https://huggingface.co/collections/perplexity-ai/pplx-embed.

  • Arena-Rank (open-source leaderboard construction): Arena released Arena-Rank, a Python package for statistically grounded, reproducible leaderboards using pairwise comparison data . GitHub: https://github.com/lmarena/arena-rank.

Industry Moves

Why it matters: This week’s biggest competitive moves were about capital + distribution + compute—and about who controls the “control plane” (cloud distribution, identity, and evaluation infrastructure).

  • AWS frames the OpenAI partnership as distribution + runtime + Trainium adoption: Amazon’s CEO described a stateful runtime environment on Amazon Bedrock powered by OpenAI intelligence for developers running OpenAI services on AWS . He also said OpenAI is “going big” on Trainium, describing Trainium as 30–40% more price performant than comparable GPUs, and said AWS will be the exclusive third-party cloud distribution provider for OpenAI Frontier (agent teams) .

  • Microsoft–OpenAI joint statement on AGI definition unchanged: A Microsoft/OpenAI joint statement was shared alongside commentary that the contractual definition and determination process for AGI remains unchanged despite new funding and partnerships . The AGI definition quoted: a system that can perform “most economically valuable tasks better than humans,” and is officially declared AGI by the OpenAI board .

  • Guidde raises $50M (agent training from screen-recordings): Guidde raised $50M to train AI agents on expert screen-recording videos rather than static documentation, claiming 41% reduction in video creation time and 34% fewer support tickets .

  • Taalas launches first product (models encoded into chips): Taalas said it launched its first product after $30M in development by 24 people, emphasizing specialization, speed, and power efficiency . A separate summary described “Hardcore Models” chips that store weights on-chip (mask ROM) and can reach 16–17k tokens/sec inference, with RAM for KV cache and small updates like LoRA .

  • OpenAI Codex usage growth: A post said Codex added 600k users in three weeks, moving from 1M WAU (Feb 4) to 1.6M WAU (Feb 27) .

Policy & Regulation

Why it matters: AI governance is being operationalized through procurement, designations, and deployment constraints, not just principles—and the effects spill into cloud ecosystems and enterprise buyers.

  • Anthropic refuses to enable domestic mass surveillance or fully autonomous weapons: A post quoted Anthropic’s position: “threats do not change our position: we cannot in good conscience accede to their request,” framed as a moral line against enabling mass domestic surveillance and fully autonomous weapons.

  • DoW designates Anthropic a supply-chain risk; broad procurement restrictions announced: DoW’s directive said Anthropic will be designated a Supply-Chain Risk to National Security, and barred contractors/suppliers/partners doing business with the U.S. military from commercial activity with Anthropic, with a ≤6-month transition window . Anthropic says it will challenge the designation in court .

  • OpenAI–DoW agreement highlights “no domestic mass surveillance” and “human responsibility for force”: OpenAI’s agreement announcement reiterated these core principles as incorporated into the contract and reflected in DoW law/policy framing .

  • Pentagon cyber tooling (FT-cited) aims at mapping/exploiting vulnerabilities in Chinese infrastructure: A post citing the FT said the Pentagon is developing AI-powered cyber tools to map and exploit vulnerabilities in Chinese infrastructure (e.g., power grids and sensitive networks), automating reconnaissance and speeding targeting .

Quick Takes

Why it matters: These smaller items collectively show where momentum is compounding: usage scale, agent reliability, model-serving efficiency, and fast-moving leaderboards.

  • ChatGPT scale: ChatGPT crossed 900M weekly users and 50M paying subscribers.
  • ChatGPT Android: The Android app (v1.2026.055) mentions a “Naughty chats” setting for 18+ users .
  • GPT-5.3-Codex cost/throughput notes: Reported as 28% cheaper than GPT-5.2 (xhigh) on Artificial Analysis , with a post also calling it more token efficient than 5.2 . Another post cited 400k context and “extra high thinking” in settings .
  • Open models: Feb Text Arena: Arena’s Top 3 open models were GLM-5 (1455), Qwen-3.5 397B A17B (1454), and Kimi-K2.5 Thinking (1452).
  • vLLM on AMD GPUs: vLLM described ROCm attention backends delivering up to 4.4× decode throughput on AMD GPUs, with model-specific benchmarks (e.g., Qwen3-235B MHA 2.7–4.4× TPS) and a one-env-var enablement path .
  • UI-agent click accuracy fix: Tzafon claimed scaling positional embeddings improved click accuracy from 40% to 80% with no retraining .
  • RF-DETR on Apple MLX: A post said RF-DETR on MLX runs at 100+ FPS on an M4 Pro Mac .
OpenAI’s $110B raise and AWS compute deal; classified-network deployment agreement with DoW safety terms
Feb 28
6 min read
192 docs
Greg Brockman
eigenron
Sam Altman
+16
OpenAI disclosed a $110B funding round and a multi-year AWS partnership centered on Trainium and a “stateful runtime environment,” while also announcing an agreement to deploy models in the DoW’s classified network with explicit safety terms. Elsewhere, Codex adoption and agent tooling kept accelerating, alongside notable research on RL training, tool-calling inference speedups, and privacy-preserving “unlinkable inference.”

OpenAI’s $110B raise + Amazon partnership (compute, chips, and “stateful runtime”)

OpenAI confirms a $110B funding round backed by Amazon, NVIDIA, and SoftBank

OpenAI leaders said the company raised a $110B round from Amazon, NVIDIA, and SoftBank. In a CNBC “Squawk Pod” segment, hosts and guests also described the round as valuing OpenAI at $730B, with Amazon as the largest investor committing $50B (structured as $15B upfront plus $35B tied to milestones) .

Why it matters: the scale and structure (two tranches, milestone-based) is a major signal about how capital is being deployed to secure compute supply and strategic distribution for frontier AI.

AWS becomes a multi-year strategic partner, with Trainium and a “stateful runtime environment” in Bedrock

OpenAI and Amazon announced a multi-year strategic partnership. Reporting in the same segment said OpenAI will consume 2 gigawatts of training capacity through AWS infrastructure—some of it described as exclusive to Amazon.

The partnership centers on a “stateful runtime environment” powered by OpenAI GPT models that will be available in Amazon Bedrock. Amazon CEO Andy Jassy described it as enabling developers to access state (e.g., memory, identity) and call tools/compute “in a stateful way,” claiming “there’s nothing else like that today” .

OpenAI CEO Sam Altman also highlighted Amazon’s Trainium as part of the relationship , while Jassy referenced 30–40% better price from leveraging Trainium for training and said Amazon now has “the two largest AI labs… significantly betting on Trainium” .

OpenAI’s cloud positioning: AWS expansion alongside continued Azure exclusivity for the stateless API

Altman said OpenAI “continue[s] to have a great relationship with Microsoft,” and that its stateless API will remain exclusive to Azure, while OpenAI will “build out much more capacity” with Microsoft .

Why it matters: OpenAI is explicitly describing a split deployment model: Azure exclusivity for one interface, while simultaneously scaling via AWS for other advanced workloads.


Pentagon / “Department of War” flashpoint: OpenAI reaches a classified-network deployment agreement

OpenAI says it reached an agreement to deploy models on the DoW’s classified network—with explicit safety principles

Altman said OpenAI reached an agreement with the Department of War (DoW) to deploy models in its classified network. He said the agreement includes two core safety principles: prohibitions on domestic mass surveillance and human responsibility for the use of force, including autonomous weapon systems—principles he said the DoW agrees with and reflects in law/policy .

Altman also said OpenAI will implement technical safeguards, deploy FDEs, and deploy “on cloud networks only” . He added OpenAI is asking the DoW to offer “these same terms to all AI companies” and said OpenAI wants de-escalation “away from legal and governmental actions” toward “reasonable agreements” .

Context: Anthropic dispute + DPA pressure, and diverging interpretations

Commentary across sources framed OpenAI’s agreement against the backdrop of the Pentagon/DoW dispute with Anthropic and reported talk of using the Defense Production Act (DPA) to compel access . Matt Wolfe summarized the dispute as the Pentagon demanding Claude for “all lawful purposes,” while Anthropic refused to lift safeguards around mass domestic surveillance and fully autonomous weapons.

Some reactions praised the stance as aligned with Anthropic’s position (Ilya Sutskever: “It’s extremely good that Anthropic has not backed down, and it’s significant that OpenAI has taken a similar stance.”) . Others described the outcome as confusing or politically fraught, with Nathan Lambert warning it “stands to fracture the AI community” and separately calling Altman’s move a “weird scooping up of the DoW contract” .

An account labeled @UnderSecretaryF argued the DoW contract approach is grounded in “all lawful use,” references legal authorities, and includes mutually agreed safety mechanisms—calling this a compromise that Anthropic “was offered, and rejected” .

Critics (including Gary Marcus) questioned whether the government effectively offered OpenAI terms similar to what it publicly rejected from Anthropic, and raised concerns about unequal treatment and political influence .


Agents & coding: Codex traction, “agentic” tooling, and enterprise demand signals

Codex usage and momentum: demand spike + weekly growth signals

In the Squawk Pod discussion, Altman said demand is “rapidly growing” and cited Codex as having grown “like 30 something percent in the last week,” as an indicator of enterprise readiness . Separately, @swyx reported OpenAI Codex added 600k users in the last 3 weeks, reaching 1.6M WAU on Feb 27 (up from 1M WAU on Feb 4), and said it is >3× up from Jan 1 (including a Feb 2 app launch) .

Why it matters: OpenAI is pointing to Codex as a near-term adoption barometer, while external tracking highlights unusually fast growth over a short window.

Brockman spotlights Codex 5.3’s software-engineering capability on a complex task

Greg Brockman highlighted Codex 5.3 for “complicated software engineering” , pointing to a description of the model “one-shotting” a task that involved bypassing HuggingFace’s KV cache abstraction, monkey-patching attention at the module level, handling M-RoPE, coordinating prompt-level memory state with KV cache state, and performing “granular surgical eviction with span tracking” .

Agent products continue to converge on “computer use” and long-running workflows

Matt Wolfe described new agent features across vendors, including Cursor agents that can control virtual computers and record videos of their actions , and Microsoft Copilot Tasks (waitlist) that aims to take plans/drafts into “completed tasks” such as building slide decks and booking appointments .

Andrej Karpathy, reacting to Cursor usage trends, described a shifting progression of workflows (“None → Tab → Agent → Parallel agents → Agent Teams (?) → ???”) and argued the practical balance is 80% work in reliable setups and 20% exploration of what’s next .


Research & infrastructure: RL training shift, faster tool-calling inference, and privacy architecture

Scale AI: reinforcement learning becomes the majority of training work (for agents)

Scale AI’s Chetan Rane said more than half of Scale’s training work now involves reinforcement learning (RL)—up from less than a quarter six months ago . The piece describes RL as goal-oriented training meant to push models from “responses” into doing things online, but notes limits around generalization and higher compute needs .

ContextCache: persistent KV caching for tool schemas claims large TTFT gains without quality loss

A paper and code release introduced ContextCache, a persistent KV cache system for tool-calling LLMs that caches prefill states for tool schemas indexed by a content hash and restores them on subsequent requests . The authors report ~200ms cached TTFT from 5 to 50 tools versus full prefill growing from 466ms to 5,625ms, with a 29× speedup at 50 tools (skipping 99% of prompt tokens per request) and “zero quality degradation” on listed metrics .

Paper/code: https://zenodo.org/records/18795189 and https://github.com/spranab/contextcache

Open Anonymity: “unlinkable inference” as a VPN-like layer for AI prompts

The Open Anonymity project described an “unlinkable inference” layer—framed as a “VPN for AI inference”—to reduce longitudinal user tracking by routing requests through decentralized proxies and using blind authentication plus ephemeral keys . The project also released oa-chat, storing chat history locally and sending each query to OpenAI under a temporary key intended to be unlinkable to other queries; Percy Liang said he switched to it as a “convenient drop-in replacement” .

Project links: https://openanonymity.ai/ and blog: https://openanonymity.ai/blog/unlinkable-inference/


Labor + markets: layoffs attributed to AI, and competing interpretations

A Squawk Pod segment described Block’s plan to reduce headcount from 10,000+ to under 6,000, and quoted Jack Dorsey saying “AI tools have changed what it means to run a company” while predicting many companies will make similar structural changes . In contrast, Elad Gil argued many “layoffs due to AI” narratives are near-term corrections for 2020-era over-hiring rather than direct AI effects, adding that most AI productivity impact is “still around [the] bend” .

Why it matters: “AI-driven layoffs” is becoming a public storyline, but even AI investors and executives are split on how much is signal versus post-hiring-cycle adjustment.

Shadow PMs, behavioral evidence, and AI prototyping workflows reshaping product work
Feb 28
10 min read
88 docs
The community for ventures designed to scale rapidly | Read our rules before posting ❤️
Melissa Perri
Julie Zhuo
+6
This edition connects a set of fast-moving PM shifts: AI systems as “Shadow PMs” shaping expectations and acquisition, behavioral data as the grounding layer for better decisions, and agentic execution pushing teams to redesign discovery and decision-making. You’ll also get a concrete AI prototyping workflow, several recent case studies (JLR, The Economist, Ramp/Alloy, Just Eat, Perk), and career resources for building AI-native skills and interview leverage.

Big Ideas

1) Your product is being “decided” before users ever land on it (the Shadow PM)

AI systems increasingly act as the first product manager your users interact with—setting expectations, framing capabilities, and influencing which brand gets chosen . This coincides with a funnel shift where AI can pick a brand and drive a purchase in hours, compressing the traditional multi-week comparison journey .

Why it matters:

  • AI search traffic is described as growing 5x YoY, with 80% of B2B tech buyers using AI as much as traditional search during vendor research—often showing up as “unexplained” direct traffic .

How to apply:

  • Treat AI visibility and accuracy as a product + growth surface you manage explicitly (see Tactical Playbook).

“AI is now your first product manager your users talk to.”


2) AI doesn’t create value by itself; it amplifies the systems (and signals) you already have

Jaguar Land Rover’s Jim Kennedy argues that AI “amplifies the systems that you’ve built,” so weak signals and fragmented systems get amplified too . His practical shift: leaders move from managing backlogs to designing systems and asking better questions grounded in behavioral context and intent .

Why it matters:

  • “Behavioral context” is positioned as what makes AI useful—grounding predictions in real intent rather than averages .

How to apply:

  • Invest in instrumentation and behavioral insight that removes debate and drives prioritization from evidence, not opinion .

3) When execution accelerates, decision-making and discovery become the bottleneck

Multiple threads converge on the same operational shift:

  • Teams can ship far faster with agents (e.g., smaller project teams, rapid shipping cycles), making decision-making the constraint rather than execution speed .
  • In this environment, craft and “taste” still matter: knowing what “excellent” looks like, and whether what you built is important and will resonate with users .

Why it matters:

  • If you don’t redesign how decisions get made (and how discovery stays continuous), faster build cycles can just produce more churn, rework, or misaligned output.

How to apply:

  • Shift PM time from coordination toward user understanding, prioritization, and experiment design .

4) Advancing as a product leader increasingly means thinking like a GM (P&L + distribution + outcomes)

Mastercard’s CPO frames why CPOs rarely become CEOs: only 1% make it, in part because PMs often lack P&L ownership and are seen as a delivery function . Her prescription is to adopt GM behaviors inside the product role:

  • Put revenue in OKRs, and treat pricing and costs as design inputs (not afterthoughts) .
  • Design for distribution channels, not “the product” in isolation (e.g., tailoring to platform channels vs. SME portals) .
  • Translate roadmaps into outcome narratives for senior leadership .

Why it matters:

  • It’s a concrete path to “boardroom language” without waiting for a formal title change.

How to apply:

  • Reframe initiatives as outcome narratives with numbers (e.g., “improve authorization rates, reduce fraud, +3% transactions volume”) .

5) AI ROI is constrained less by tooling and more by organizational “debt” (process, data, and access)

Just Eat Takeaway’s CPO describes AI as an “HD mirror” that exposes technical debt, data silos, and “human glue” workarounds—plus avoidance of root-cause fixes . Their stance: without structured and accessible data, strategy and ROI are “permanently limited” .

Why it matters:

  • They cite having 104 petabytes of data and 570M daily events, but still not being able to use it effectively until they restructured it around intent and business logic .

How to apply:

  • Audit where tribal knowledge substitutes for organizational intelligence, then codify and make context available to machines .

Tactical Playbook

1) “Manage the Shadow PM”: a practical AI visibility loop

A Semrush product leader outlines a three-part approach—prompts, sources, processes—to shape how AI describes and recommends your product .

Step-by-step

  1. Audit per model (not “AI” as one thing). Pick 4–5 platforms and track ~50 queries; check if you’re mentioned, whether it’s accurate, and who wins instead .
  2. Make your product crawlable and extractable. The talk claims most AI crawlers don’t render JavaScript; “JS walls” can drop AI visibility to zero .
  3. Fix docs like they’re churn drivers. Users may screenshot UI issues and ask ChatGPT for help; if docs are clear/crawlable, the user stays—if not, they may blame your product and leave .
  4. Structure content for AI consumption. “Short answer first” chunks + schema markup; the talk cites up to 30% higher citation rates for sites with structured data .
  5. Invest in earned media and third-party sources. The talk claims 77% of brand citations come from third parties (Reddit, G2, LinkedIn, YouTube, industry reviews) .
  6. Close the loop with product outcomes. The proposed flywheel: shape AI perception → accurate recommendations → better activation/retention → customer advocacy → stronger AI signal .

Metrics to watch:

  • AI-referred visits are cited as having 27% lower bounce, 12% more engagement, and 5% higher conversion (attributed to arriving with the right expectations) .

2) AI prototyping workflow (from “idea” to handoff with fewer open questions)

Nadav Abrahami (Wix co-founder, Dazl CEO) lays out an end-to-end workflow for AI prototyping .

Prerequisites (don’t skip):

  • Lock down the problem, user story, and rough shape of the solution in one paragraph before prototyping .

Workflow

  1. Start from your design system. Recreate an existing page from a screenshot as the template .
  2. Generate 3–4 divergent variants. Don’t move forward with only one option .
  3. Prompt for structure; visually edit for fine-tuning. Use manual edits for the last-mile polish .
  4. Build the full end-to-end flow. Connecting pages surfaces edge cases earlier .
  5. Test with your actual users (on video). Prefer users who requested the feature, not a testing platform .

PRD + prototype (new standard):

  • Prototype holds the core 90% flows; PRD covers edge cases, errors, tracking, rollout plan—and can live inside the prototype project for AI context .

Engineering handoff tactic:

  • Share the published prototype link (claimed to answer ~90% of questions), then use Cursor/Claude Code to port interactions into the production codebase .

3) Prevent AI-speed execution from outrunning planning (without “more meetings”)

A startup thread describes a common failure mode: implementation moves ~10x faster with tools like Cursor/Claude Code, but teams still miss edge cases, permissions, and “the thing discussed in Slack” because context capture didn’t speed up too .

Step-by-step

  1. Identify recurring “missed detail” categories (edge cases, permissions, workflow steps) .
  2. Don’t solve it by defaulting to extra meetings (noted as a common but slowing response) .
  3. Embed structured context capture at decision points (where decisions happen), rather than making documentation a separate ritual .

4) Use behavioral analytics to remove debate and ship what users actually complete

Jim Kennedy’s JLR examples emphasize behavioral insights (hesitations, retries, abandonments) that dashboards often don’t show . The claim: behavior data removed debate about intent and changed prioritization quickly .

Step-by-step

  1. Instrument journeys to see “moments” (not just outcomes) .
  2. Use real behavior to decide what ships and what scales—require completion/engagement/value before scaling .
  3. Reframe design from pages to journeys and “joined-up moments” .

5) Make surveys behave like product funnels (so they stop bleeding completions)

A PM describes treating surveys like static forms, with poor completion despite good open rates . The fix was to analyze surveys like funnels.

Step-by-step

  1. Add per-question drop-off tracking to see where users abandon .
  2. Remove high-friction questions (e.g., make a demographic question optional) once you confirm it’s the main abandonment driver .
  3. Avoid grids/matrices on mobile; break into single-focus, tap-friendly steps .

Case Studies & Lessons

1) Jaguar Land Rover: from 58 apps to a unified experience system (and behavior-led shipping)

Starting point: when Kennedy joined, JLR had 58 disconnected apps, little analytics signal, and a 1.2 average store rating .

What changed

  • The program was framed less as tech consolidation and more as reducing customer cognitive load and decision friction .
  • They reframed work around customer experience, platform capability, and value creation—and moved from journeys to “joined-up moments” .

Homepage carousel lesson

  • A “conversational carousel” that looked great in design reviews failed in behavior: users scrolled to only one tile in beta testing .
  • They reprioritized what users saw first, personalized from prior sessions, simplified flow, and only scaled after behavior improved (deeper scrolls, higher completion) .

Platform non-negotiables

  • Design systems over one-off designs
  • Configuration over code (release toggles for brands/services)
  • Platform services as modular backbone (brands express via theming/content)

Key takeaway: fragmentation was described as an operating model problem more than a tech problem .


2) The Economist: building trusted AI voice as an additive bridge (not a replacement)

Problem: Digital content is published Mon–Wed, but the narrated audio edition is created on Thursdays, creating a gap for audio-first users .

Solution: Use text-to-speech (TTS) to bridge the gap .

Outcome: Quantitative results indicated people listened earlier and consumed more audio (not less) .

Trust lesson: Loyal subscribers complained about TTS mistakes (example: “6m” read as “6 meters”), reinforcing that TTS worked as a bridge/additive but wasn’t yet ready to replace human connection .

Design principle: Build an “audio identity” that is distinct and authoritative, anchored in trust (reliable, credible, intimate/relevant) .


3) Vibe coding + cloud playgrounds: from prototype to pull request (and why visibility matters)

Alloy’s CEO describes a 2026 inflection where non-technical PMs prototype directly on real codebases, share prototypes broadly, and sometimes produce pull requests for engineering review .

Ramp example: Ramp built an internal “background agent” system; they reportedly went from 0% of PRs merged via that system to 70% in 2–3 months.

Key takeaway: unlocking collaboration is partly about moving beyond “localhost prototypes” so the rest of the company (and customers) can see and react to work in progress .


4) Perk: “radical urgency” requires explicit trade-offs and tight scope

Perk accelerated a spend product launch from January to November via tight scope and cross-company effort . Trade-offs included shipping with fewer features and concentrating specific team expertise on the milestone .

Their broader principle: speed to learning comes from getting to real users quickly for feedback .


5) Just Eat Takeaway: internal data agent to eliminate handoffs

They developed an internal conversational agent that allows PMs and others to query much of their data, reducing analysis handoffs from hours/calls into seconds/minutes and enabling rapid follow-up questioning cycles .

Key takeaway: data access and transparency were positioned as foundational to ROI and decision speed .


Career Corner

1) Strategy execution often fails in the middle: teach middle managers how their job changes

Melissa Perri argues that companies get stuck when middle management isn’t coached on how their role shifts: setting local strategy that ties back to global strategy (instead of pushing down solutions) . She also emphasizes success needs leadership teams to operate as “one team,” not fiefdoms .

How to apply:

  • Define 1–3 strategic intents (concrete outcomes) and force solution proposals to ladder up to them .

2) “Builder” skills are becoming a PM differentiator—but ship safely

Community posts describe PMs being asked to build AI-generated features and test directly with customers, gaining deeper engineering understanding and firsthand user feedback . A key caution: for complex systems, changes shouldn’t hit production without senior engineer review .

How to apply:

  • Use prototypes to learn and validate quickly, but formalize review gates for production codepaths .

3) Use AI to close the engineering communication gap (without pretending to be an engineer)

Aakash Gupta’s guidance: open the actual project, ask AI to explain architecture and components, and do this for weeks to close the communication gap faster than most alternatives .


4) Job search leverage: AI-native PM skills and interview systems are getting packaged

  • Lenny Rachitsky promoted free live workshops on “The AI-Native PM” across AI workflows, becoming more technical, and product sense & influence—with 75,000+ registrations. Sign-up link: http://bit.ly/ai-native-pm.
  • A free Claude Code–based interview coach (built from interviews with 30 job seekers) offers response scoring, mock interviews with pushback, company-specific prep, and negotiation scripts; v2 adds memory, higher challenge, and clearer next steps . Download: https://github.com/noamseg/interview-coach-skill.

Tools & Resources

Progress, permissions, and evidence: Stripe’s agentic-commerce framework, an AI narrative check, and a nutrigenomics breakthrough
Feb 28
4 min read
181 docs
All-In Podcast
Keith Rabois
David Sacks
+1
Today’s strongest cluster is “progress and its constraints”: a standout recommendation of Stripe’s 2025 Annual Letter for its agentic-commerce levels and regulatory framing, plus a companion “build fast” resource list. Also included: a check on AI overconfidence via Derek Thompson, a high-impact Arc/Cell nutrigenomics paper, and a history prompt from Keith Rabois.

Most compelling recommendation: Stripe’s 2025 Annual Letter — a concrete framework for “agentic commerce” (plus real payments data)

Stripe’s 2025 Annual Letter (annual letter) — Patrick and John Collison

  • Link/URL: https://x.com/stripe/status/2026294241450979364?s=20
  • Recommended by: Packy McCormick (Not Boring)
  • Key takeaway (as shared):
    • Packy calls it “one of the best pieces of economic writing you’ll read all year,” citing Stripe’s view into the internet economy via its data and writing .
    • He highlights Stripe’s “agentic commerce” framework: five levels, from agents that merely fill checkout forms to agents that anticipate needs and buy before you ask—while noting we’re “hovering between levels 1 and 2” .
    • He pulls out the “Republic of Permissions” idea: technology success depends on whether regulators, committees, and courts allow deployment .
  • Why it matters: This is a rare combination of a staged framework (how agent-driven buying might evolve) plus operational-market observations about the constraints that can block adoption .

A second layer in the same theme: building fast vs. a “Republic of Permissions”

“Fast” projects list (article/resource list) — Patrick Collison

  • Link/URL: https://patrickcollison.com/fast
  • Recommended by: Packy McCormick (Not Boring)
  • Key takeaway (as shared): Packy frames it as the “canonical list of projects from a time when we were able to build fast” .
  • Why it matters: Paired with the “Republic of Permissions” framing, it’s a useful contrast: what rapid building looked like vs. the modern environment where permissions can determine outcomes .

AI discourse: a pushback against overconfident forecasting

“Nobody Knows Anything” (article) — Derek Thompson

  • Link/URL: Not provided in the sources
  • Recommended by: David Sacks (on a YouTube podcast episode)
  • Key takeaway (as shared): Sacks points to Thompson’s argument that no one really knows what will happen with AI in two years (let alone 20), and that AI debate can become “science fiction writing masquerading as analysis” — a “marketplace of competing science fiction narratives” amid high uncertainty and limited real-time macro evidence .
  • Why it matters: This is a reading pick that implicitly raises the bar on evidence when evaluating big AI claims—and is useful as a lens for separating narrative from analysis .

Research: a nutrigenomics result worth reading end-to-end

“Vitamin B2 and B3 nutrigenomics reveals a therapy for NAXD disease” (research paper; Cell) — Arc Institute

  • Link/URL: https://www.cell.com/cell/fulltext/S0092-8674(26)00109-1
  • Recommended by: Packy McCormick, via Ulkar Aghayeva’s Scientific Breakthroughs curation
  • Key takeaway (as shared):
    • The work runs a genome-wide CRISPR screen (in K562 cancer cells) using vitamins B2 and B3 to identify genetic diseases responsive to vitamin supplementation; NAXD emerged as the top hit for vitamin B3.
    • The team reports that adding vitamin B3 to knockout mice food from birth increased lifespan more than 40-fold.
  • Why it matters: It’s a concrete example of how “vitamin biology” may still contain “unexplored territory in nutritional genomics,” and it raises the question of how many diseases might be addressable through similar supplementation strategies .

Additional source links surfaced alongside the recommendation:

History (as a prompt, not a thesis)

“Secret History” (article/blog post) — Steve Blank

  • Link/URL: https://steveblank.com/secret-history/
  • Recommended by: Keith Rabois (X)
  • Key takeaway (as shared): Rabois frames it as “Time for some of you to learn history” and points directly to Blank’s post .
  • Why it matters: A direct nudge from an investor to revisit a specific historical account—useful when you want to anchor current debates in prior cycles rather than novelty alone .

Lower-signal but notable: a book recommendation framed as a gift

Running Down a Dream (book) — Bill Gurley

  • Link/URL: Not provided in the sources
  • Recommended by: Jason Calacanis (with agreement from Chamath) on a YouTube podcast episode
  • Key takeaway (as shared): Calacanis calls it an “amazing new book,” urges people to “buy three copies” to give to “two young people and a parent,” and describes it as “inspiring for kids”; Chamath adds: “this book is incredible” .
  • Why it matters: While the endorsement is strong, it’s light on specifics; the clearest signal here is intended audience and use: a book they see as worth distributing to younger readers and families .
Biofuels mandates near decision as drought and South American harvest risk steer ag markets
Feb 28
9 min read
171 docs
homesteading, farming, gardening, self sufficiency and country life
农业致富经 Agriculture And Farming
Successful Farming
+11
Grain and livestock markets stayed highly headline-driven, with drought and wheat risk premium, mixed export signals for corn and soybeans, and biofuels mandates moving toward a late-March decision. This digest also highlights practical technology and management advances—from new soybean traits and AI decision tools to robotics and livestock best practices—plus key regional supply and trade developments across the U.S., Brazil, and Argentina.

1) Market Movers

Grains & oilseeds (U.S. and global)

  • Wheat: Price strength was tied to weather risk and positioning—dryness/wind and drought across parts of the U.S. winter wheat belt (noted as ~50% in drought) and rain forecasts being removed supported a weather premium . Separate coverage also pointed to drought spreading in Kansas and Oklahoma as a driver behind a futures jump . Market commentary added that Chicago wheat posted new highs while KC lagged, with the move framed as a longer rally and a “weather scare” .

  • Corn: Corn followed wheat higher, but multiple sources emphasized the overhang of a ~2 billion bushel carryout as a restraint without new demand catalysts . Fresh export demand signals included:

    • A reported flash export sale to “unknown” in market commentary .
    • Private exporters reporting 257,000 MT of corn sold for delivery to unknown destinations (MY 2025/2026) .
  • Soybeans: Soybeans hit a 2.5-month high, then pulled back after weekly export sales were reported down 49% versus the previous week . Trade uncertainty was part of the narrative, with a report saying negotiations toward a commodity deal hit a rough spot even as they progressed toward a March 31 summit.

  • Biofuels-driven demand (soybean oil, corn, sorghum): Several updates converged on biofuel policy as a key demand lever.

    • EPA’s RVO proposal was sent to the White House/OMB, with reporting that the rule was likely finalized by end of March.
    • Initial RVO proposals for 2026–2027 showed a sharp increase for biomass-based diesel, with discussion of ~5.25–5.61B gallons and framing that this could exceed the current 3.35B gallons by more than 2B gallons .
    • A Trump administration plan would require large refiners to cover at least 50% of blending volumes previously waived under small refinery exemptions, potentially increasing demand for blending credits .
    • USDA soybean oil use was discussed as potentially reaching 17 billion pounds under scenarios viewed as positive by industry voices .

Livestock (U.S. and Brazil)

  • U.S. cattle: Futures were sharply lower on the week (live cattle $243.29, April live cattle $232.23, March feeder cattle $355.43) while box beef moved higher (Choice $377.89, Select $370.79) . Weekly cattle slaughter was 516k head (down 50k YoY) and YTD slaughter was down 10.1%. USDA leadership was cited as saying there were no plans anytime soon to reopen points of entry for live cattle imports from Mexico .

  • U.S. hogs: National base carcass price was $89.34 (up $3.03 WoW) and nearby lean hog futures were $95.73 (up $2.05 WoW) . Hog slaughter was 2.516M head (up 23k WoW; down 9,170 YoY) with YTD slaughter down 2.1%.

  • Brazil poultry and eggs (São Paulo): Live broilers averaged R$5.04/kg (down 2.1% vs January; lowest real level since May 2024) and purchasing power versus corn/soy meal slipped month-over-month . Egg prices rose ~37%: R$147.98 (white, extra; 30-dozen box) and R$166.57 (red) .

2) Innovation Spotlight

Crop protection & traits (U.S.)

  • BASF “Nemosphere” soybean trait (targeting soybean cyst nematode): Described as the first biotech trait to control soybean cyst nematode, with a 2028 market target . SCN was characterized as the “number one yield robber” for 52 years and taking at least $1.5B of value out of the market today . The trait was also described as bringing a fourth herbicide mode of action to soybeans (HPPD tolerance enabling mesotrione pre-emergence), and panel examples suggested potential 20–30 bu differences in affected fields .

  • Fungicide results: A BASF fungicide trial set involving 1,800 farmers was described as showing 20–40 bu/acre yield differences versus untreated comparisons, with emphasis on planned applications for disease pressure (e.g., southern rust, tar spot) .

  • Decision support / verification tools (Xarvio): Growers were encouraged to work with local retailers to use Xarvio to capture existing practices (no-till, nitrogen stabilizers, cover crops) and qualify in a “five steps” process; the tool was also described as supporting fungicide timing alerts and seed/variety recommendations, noting that a wrong variety decision can cost 10–20 bushels.

Mechanization & robotics (global)

  • China agricultural robotics (field + greenhouse):

    • “小甜甜” was described as a robot system capable of full-cycle, unmanned rice production (plowing through harvest), with 100+ robot models deployed across China and exported, including a cited procurement demand of 100,000+ units.
    • A separate example described a field robot achieving ~8× manual efficiency in harvesting a taro-like crop, along with reported electricity use of ~2 kWh/mu at ~0.5 RMB/kWh.
  • Swine production analytics (PIC): Digital imaging/AI was described as enabling behavior recording on “thousands and thousands” of pigs, with measurable and heritable behavior traits for potential genetic selection . Camera-based phenotyping of feet/legs was described as three times more accurate/heritable than humans and used to predict longevity in sow herds .

Equipment upgrades (U.S./Europe)

  • High-horsepower tractors:

    • John Deere highlighted the 8R 540 within a high-horsepower 8R lineup (440/490/540), framed around wider implements, faster speeds, and fixed-frame maneuverability with 4WD power .
    • New Holland rolled out the T7 XD series (T7.360 XD, T7.390 XD, T7.440 XD) delivering up to 435 horsepower for haulage, silage, planting, and tillage .
  • Dairy feeding system upgrade (UK): A farm moved from a 12m³ to a 20m³ Keenan mixer feeder to reduce overloading and shift toward a single larger cow mix rather than split mixes, with expectations of more milk from a better/accurate mix .

3) Regional Developments

United States

  • USDA FY26 agricultural outlook (trade): The U.S. ag trade deficit was forecast at $29B, described as an improvement of $14.7B from FY25 and $8B versus December 2025 projections, tied to record export performance . Forecast record export components included:

    • Dairy: +15% by end FY25, led by demand for U.S. cheese and butter (growth cited in Mexico/Canada/EU) .
    • Corn: +29% projected record volumes (supported by sustained global demand) .
    • Ethanol: +11% forecast record exports (shipments cited to Canada/EU/UK/India) .
  • Drought and wildfire backdrop: Coverage cited 74% of the lower 48 in drought, high winds fueling wildfires burning 400,000+ acres, and extremely low Midwest snowpack heading into spring planting .

South America

  • Brazil soy (West Bahia): The 2025–26 soybean season was described as nearing final harvest phase with expectations of >9M tons and dryland yields of 65–70 sacks/ha, while rain (mid-March) was cited as a risk to harvest execution .

  • Brazil soy (Mato Grosso): Soy harvest was reported at 66% complete with the pace slow and behind last year due to more than 30 days of heavy rain; second-crop corn planting was reported 65% complete by Feb 20, also described as delayed, with producers citing losses from persistent rain .

  • Argentina corn: Notes cited a record corn production expectation of 62 million tons, 26% above last year, with harvest set to begin soon .

Europe–Mercosul trade lane (Brazil and Mercosul exporters)

  • Provisional EU–Mercosul application: The EU was reported to apply an interim trade deal eliminating roughly €4B in tariffs after 25 years of negotiations . Key implementation details and constraints included:
    • Meat quota barrier reduction: a 20% cut in quota barriers this year, with full quota expansion after 5 years (beef and poultry highlighted) .
    • Tariffs: progressive reductions with agricultural products generally waiting ~4 years, and some lines up to 15 years.
    • Legal/political risk: safeguards demanded by Italy were described as unclear, and a court review could create uncertainty over 18–24 months; margins in European approvals were characterized as narrow .

4) Best Practices

Crop planning and execution

  • Use decision tools to document practices and reduce preventable yield loss: Xarvio was described as a way to document existing practices for scoring/qualification (no-till, nitrogen stabilizers, cover crops), and as providing seed/variety guidance where variety misfit can cost 10–20 bushels.

  • Plan disease control (don’t chase it): BASF’s fungicide messaging emphasized “planned application” versus catching up to disease, alongside cited yield differences versus untreated comparisons .

Livestock management (practical, field-level)

  • Piglets (post-weaning mortality reduction): A case study attributed high post-weaning losses to cold stress and an abrupt switch to fermented feed; recommended actions included insulation lamps/dry bedding and using starter feed for ~2 weeks with a gradual transition .

  • Predator losses in open-range sheep systems: Drone herding was described as providing rapid aerial oversight and noise deterrence, reducing annual losses from ~15% to ~5% without adding herders .

  • Aquaculture feed hygiene: In grass carp systems fed soaked fava beans for “crisp meat,” uneaten beans settling and fermenting were linked to reduced intake/quality; daily removal of leftover beans before new feeding was presented as the fix .

Storage and on-farm maintenance

  • Grain bin maintenance: A “13 grain bin checkup tips” resource was shared for keeping bins in condition for grain storage .

Soil and garden management (specialty/small-scale)

  • Black walnut juglone mitigation: Guidance noted juglone sensitivity in some crops (especially nightshades), with mitigation options including raised beds and root barriers and locating plantings outside the 15–20m root zone .

5) Input Markets

  • Input cost direction (U.S. forecasts): Fertilizer was forecast to decrease 1.4%, seeds 1.3%, fuel nearly 7%, and pesticides 8.3%.

  • Tariffs and fertilizer availability risk (U.S.): A new 10% U.S. tariff was reported under Section 122 authority, with exemptions including several fertilizer products (e.g., urea, ammonium nitrate, UAN, ammonium sulfate; DAP and MAP also cited in one version), while products like ammonia and sulfuric acid were described as not exempt unless imported under USMCA . Ag groups urged policy certainty and avoiding tariffs on agricultural inputs .

  • Biofuel policy uncertainty showing up in production: Iowa biodiesel production was reported down nearly 25% in 2025, with industry calling for policy certainty while plants awaited the RFS rule .

  • Bridge assistance program (U.S.): USDA described the Farmer Bridge Assistance Program as offering one-time bridge payments tied to temporary trade disruptions and higher production costs, with enrollment open through April 17, 2026 (details: http://fsa.usda.gov/fba) .

6) Forward Outlook

  • Biofuels policy timeline (U.S.): With the RVO proposal already moved to White House/OMB review, reporting suggested a final rule could arrive by end of March, keeping soybean oil and corn demand expectations headline-sensitive into early spring .

  • Planting intentions sensitivity (U.S.): Commentary highlighted the RVO outcome and trade developments with China as key factors influencing acreage decisions and market tone, potentially leaving acreage clarity to later surveys if timing slips .

  • South American weather execution risk (Brazil): March rainfall was described as supportive for second-crop corn development but disruptive to fieldwork in several areas—above-average rainfall in Brazil’s Southeast was flagged as a challenge for producers who missed February second-crop corn planting windows, while short-term “windows” were emphasized for advancing soybean harvest before heavier rains return .

  • EU–Mercosul agreement planning caution: Multiple segments stressed that provisional application can support near-term commercial activity (tariffs/quotas), but exporters and producers may need to avoid planning that assumes permanence given legal reviews and narrow political margins .

Discover agents

Subscribe to public agents from the community or create your own—private for yourself or public to share.

Active

Coding Agents Alpha Tracker

Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.

110 sources
Active

AI in EdTech Weekly

Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.

92 sources
Active

Bitcoin Payment Adoption Tracker

Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics

101 sources
Active

AI News Digest

Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves

114 sources
Active

Global Agricultural Developments

Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs

86 sources
Active

Recommended Reading from Tech Founders

Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media

137 sources

Supercharge your knowledge discovery

Reclaim your time and stay ahead with personalized insights. Limited spots available for our beta program.

Frequently asked questions