Hours of research in one daily brief–on your terms.

Tell us what you need to stay on top of. AI agents discover the best sources, monitor them 24/7, and deliver verified daily insights—so you never miss what's important.

Setup your daily brief agent
Discovering relevant sources...
Syncing sources 0/180...
Extracting information
Generating brief

Recent briefs

Calm WASDE, Brazil Harvest Delays, and Rising Nitrogen Costs
Mar 11
6 min read
129 docs
Foreign Ag Service
Arlan Suderman
Successful Farming
+3
March WASDE left U.S. grain stocks unchanged but raised Brazil corn and cut Argentina crops, while Brazil's harvest faced drought, rain delays, and diesel stress. This brief also highlights practical disease, soil, and risk-management lessons from current farm and market reporting.

1) Market Movers

  • March WASDE was a low-volatility report for U.S. balances. USDA left U.S. corn, soybean, and wheat ending stocks unchanged, and sources described the market response as calm .
  • The bigger adjustment came in South America and world corn. USDA raised Brazil's corn crop 1 mmt to 132 mmt, cut Argentina corn 1 mmt to 52 mmt, and lowered Argentina soybeans 0.5 mmt to 48.0 mmt . World corn stocks rose by nearly 4 mmt and came in above trade expectations .
  • On March 10, Chicago soybeans closed up 0.59% at $12.03/bu, corn was steady at $4.53, and wheat fell 1.70% to $5.93 . Separate market commentary said corn and wheat followed crude lower after tanker movement through Hormuz, while soybeans held modest gains on expectations around EPA RVOs and possible exports to China .
  • U.S. corn demand remains supportive: export inspections are running 42% ahead of last year, while USDA is projecting corn exports up 15.5% year over year . New-crop corn also moved back near $5/bu, while new-crop soybeans tested the $11.70 area .

2) Innovation Spotlight

  • In Minas Gerais coffee, the Construindo Solos Saudaveis program has installed more than 2,000 demonstration units over five years . The cover-crop systems shown in those units reduced soil temperature by 12-15C on sunny days, recycled nutrients from depth, improved water infiltration and porosity through root channels, and added surface organic matter that can attract natural enemies of coffee pests . The sources describe the practical payoff as lower fertilizer needs, lower costs, and more resilient soil management .
  • Unverferth's LightFoot irrigation wheel was presented as a high-flotation replacement for conventional pivot tires, with field testing showing up to a 50% reduction in soil disturbance and a 300-square-inch ground contact area .
  • On the finance and operations side, Bradesco's E-agro platform is combining meteorology, NDVI, property data, and production history with credit workflows. The source said deeper farm data improves planning and can lead to lower borrowing costs by improving assessment of repayment capacity .

3) Regional Developments

  • Emater cut Rio Grande do Sul's 2025/26 summer-crop estimate to 32.8 million tons, down 7.1% from the initial outlook and equivalent to a 2.5 million-ton loss. Soybeans were reduced to 19 million tons (-11.3%), while corn was revised up to 5.96 million tons (+3%); rice is seen at 7.8 million tons and planted area at 8.35 million hectares (-1.6%) .
  • Drought damage in Rio Grande do Sul has been uneven rather than uniform across the state, with nearby areas showing materially different soybean conditions . Forecasts now call for better rain later in March, but sources say it arrives too late to fully recover the South's moisture deficit .
  • Brazil's soybean harvest is still running more than 10% behind last year nationally . One Brazil market update put national harvest at 52% complete and Mato Grosso at 90%, while Mato Grosso safrinha corn seeding reached 93.68% of planned area, still 2.76 percentage points behind last season .
  • Logistics have become part of the supply story. Producers in Rio Grande do Sul, Parana, and Matopiba reported diesel shortages and higher prices during harvest, including machines stopped in rice fields and farm stocks measured in days, while ANP said it had not identified restrictions in supply or imports .
  • Argentina remains a watchpoint after USDA cut both corn and soybean production, and one U.S. market analysis said further revisions are still possible because of dryness .

4) Best Practices

  • Soybeans, Brazil: Asian soybean rust control is already a major cost item, representing 5-11% of operating costs and more than 40% of crop expenses in producing municipalities . Embrapa data cited in the source put unmanaged yield loss at up to 90%, and humidity plus high temperatures increase pressure . In the cited Brazilian management program, specialists said one application is usually not enough and placed fungicide protection in the reproductive phase, often across multiple passes . A three-way mix built on strobilurin, carboxamide, and multisite chemistry was positioned as part of resistance management and early program protection .
  • Cover crops and rotations, U.S.: Understand allelopathy before changing rotations. The source said cereal rye can suppress weeds but can also hurt corn planted directly into rye, while established alfalfa can prevent successful reseeding in winter-killed spots .
  • Soil management, Brazil coffee: The MG field days are showing a practical template: interplant cover crops, use their roots to open infiltration channels, and keep surface residue to cool the soil and cycle nutrients back into the root zone .
  • Dairy and rice risk management, Rio Grande do Sul: In the cited financing guidance, shorter credit cycles, tighter planning, and broader rural-insurance adoption were highlighted as ways to manage high-cost, low-margin, weather-sensitive operations .
  • Beef finishing and market alignment, Ohio: One direct-to-consumer beef operation used a shelled-corn pellet and spelt-hay ration to keep Holstein beef consistent, and kept the product non-organic after customers said they did not want to pay double or triple for an organic claim .

5) Input Markets

  • Nitrogen fertilizers are the clearest input stress. U.S. farm sources said prices are rising sharply as Hormuz disruption hits global nitrogen supply . In Brazil, reports said urea and ammonia flows from Iran are at risk and that these inputs have risen about 30% on the Chicago market since the war began .
  • The acreage response is still mixed. One market source cited anecdotal corn-to-soy shifts where nitrogen had not been bought, but another said many U.S. growers still have spring needs covered and it is too early to assume major acreage changes .
  • On oilseed processing, USDA raised U.S. soybean crush by 5 million bushels and imports by the same amount, leaving carryout unchanged at 350 million bushels .
  • Diesel is now both a cost and availability issue in parts of Brazil. Producer reports cited price increases around R$1.52/liter and supply gaps at distributors and rural outlets during harvest . In Rio Grande do Sul, Farsul said TRR loadings had been interrupted and some harvest machines were already stopped , while ANP maintained that it had not identified irregularities in import or domestic supply .
  • Crop-protection demand remains heavy in Brazil's soybean belt. Rust control accounts for 5-11% of operating costs, and Parana has recorded more than 50 cases this season .

6) Forward Outlook

  • The next planning window in Brazil is weather-driven. Sources point to a short firm-weather window for parts of center-west and Matopiba before heavier rain returns, including 70-80 mm in western Bahia and more than 100 mm across parts of the center-north, conditions that can again slow harvest and safrinha fieldwork .
  • In the South, rain is expected to firm later in March and into April, but the cited forecasts say it comes too late to fully reverse water deficits for the current crop .
  • For U.S. producers, the March WASDE itself was not the main risk signal. The bigger near-term watchpoints are fertilizer availability, late acreage decisions, and whether high energy costs keep pushing farmers to revisit crop mix and input timing .
  • Demand-side policy is also worth monitoring. Guatemala reaffirmed its commitment to an E-10 blend by June 30 under the Agreement on Reciprocal Trade , while Brazilian industry groups are using higher fossil-fuel prices to press the case for more corn ethanol, biodiesel, and biomethane as domestic substitutes .
Lightning-to-SEPA settlement, South African draft rules, and Kenya travel spending widen Bitcoin payment coverage
Mar 11
4 min read
93 docs
Bringin | The Complete ₿itcoin App
calle
Stephan Livera
+8
This brief tracks a new Lightning-to-SEPA path for euro bank accounts, BTCPay merchant conversion tooling, and real-world Bitcoin payment use cases in Kenya, Argentina, and at the Zambia-Zimbabwe border. It also covers South Africa’s draft cross-border rules and notes the lack of new quantitative usage data in the source set.

Major Adoption News

Europe / SEPA area — Bringin connects Lightning receipts to existing euro bank accounts

Bringin Connect lets users link an existing EUR bank account, get a dedicated Lightning address for that account, and have sats arrive as euros in the bank account they already use . The sender experiences a Lightning payment, while the receiving bank sees a standard SEPA transfer .

"From the sender’s POV: it’s just a Lightning payment from any wallet. From your bank’s POV: it’s just euros arriving like any other SEPA transfer."

Business impact: The source frames this as removing the exchange hop, the deposit/trade/withdraw cycle, and the need for extra apps or new banks . That brings BTC-to-euro settlement closer to existing banking workflows.

Kenya — Tando positions Bitcoin as a travel payment rail

Tando is described as enabling Bitcoin payments for Kenya travel spending, including entry fees, guide tips, and safaris, with one post stating, "Come to Kenya. Live entirely on bitcoin."

Business impact: The cited examples span multiple tourism expenses rather than a single merchant type, making this a broader service-sector payment signal.

Argentina — Airbtc promotes accommodation bookable with Bitcoin

Airbtc marketed a Recoleta, Buenos Aires studio with a direct booking link for Bitcoin payment .

Business impact: Lodging is a core travel expense, so a direct Bitcoin booking path expands practical spend options in the travel sector.

Booking: Recoleta Sunny Studio

Payment Infrastructure

Europe / BTCPay merchants — Bringin adds BTC-to-EUR conversion tooling

For merchants using BTCPay Server, Bringin offers a plugin that lets them partly convert BTC to EUR without manually going through an exchange .

Significance: This addresses a common operating need for merchants who accept Bitcoin but still need euro liquidity for expenses.

Plugin: Bringin BTCPay plugin

Location not specified in the cited spans — NumoPay follow-up shows coordinated open-source execution

A recent discussion around NumoPay reiterated its tap-to-pay and offline NFC design, unified QR codes (BIP 321), and auto-withdrawal to a Lightning address . Calle added that the launch required coordinated contributions across Rust libraries, mobile bindings, Kotlin app work, UI/UX, QA/testing, web design, social media, and podcasts .

Significance: Beyond the feature set, the update shows a multi-layer open-source effort behind merchant-facing Bitcoin checkout tools.

Regulatory Landscape

South Africa — Draft rules target cross-border Bitcoin flows

South Africa advanced 2026 draft rules targeting cross-border flows . MoneyBadgerPay and OzowPay view the move as a sign of regulatory maturity that could increase trust, attract institutional participation, and support wider Bitcoin adoption across Africa . The same source notes concerns about higher compliance costs for startups and Bitcoin service providers .

Significance: In the cited framing, the policy signals clearer rails for Bitcoin liquidity and cross-border use, while raising cost questions for smaller operators .

Coverage: bitcoinnews.africa

Other regions

No additional payment-related regulatory changes surfaced in the provided sources for this period.

Usage Metrics

No transaction volumes, merchant counts, or adoption statistics surfaced in the provided sources for this period.

Qualitative activity signals by region

  • Europe / SEPA area: Lightning receipts can settle into existing EUR bank accounts through Bringin Connect
  • Kenya: Bitcoin is being promoted for travel spending across entry fees, guide tips, and safaris
  • Argentina: Accommodation booking is being marketed directly for Bitcoin payment
  • Zambia / Zimbabwe border: A remittance-style comparison framed Lightning as an alternative to cash, a money mule, and Western Union for moving value across the Victoria Falls border

These are directional adoption signals, not measured throughput.

Emerging Markets

Kenya — Tourism payments

Tando's Kenya examples point to Bitcoin use across tourism services, from entry fees to guide tips and safaris . This matters because the cited examples cover multiple payment moments within one trip.

Zambia / Zimbabwe — Cross-border transfer narrative at Victoria Falls

A video framed at the Zambia-Zimbabwe border compares Bitcoin Lightning with cash, a "money mule," and Western Union for moving value across the border .

Significance: This is a cross-border payment and remittance use case rather than a standard merchant checkout flow.

Location not specified in the cited spans — Education rewards spent at a merchant

Bitcoin Diploma students spent the satoshis they earned for attending weekly classes to buy goods at a merchant, using Bolt Card and Blink.sv in the process .

Significance: The key signal is closed-loop usage: sats earned in one setting were later spent on goods in another.

Location not specified in the cited spans — Professional services settled in Bitcoin

Bitcoin Ekasi said all architect fees for its support-center project were paid in Bitcoin .

Significance: This extends the payment story beyond retail into contractor and professional-service settlement.

Adoption Outlook

This source set shows Bitcoin payment momentum in three layers. First, infrastructure is getting closer to existing financial rails: Bringin links Lightning payments to euro bank accounts and to BTCPay merchant workflows . Second, real-world spend examples continue to cluster in travel and cross-border contexts, with signals from Kenya, Argentina, and the Zambia-Zimbabwe border . Third, South Africa contributed the clearest policy development, with draft rules presented as signaling integration rather than suppression, but not without compliance-cost concerns . The main gap is still measurement: the provided sources broaden the map of use cases, but they do not provide transaction or merchant-scale data for assessing depth.

Operational Strategy, Invisible Inventory, and Hard Portfolio Calls
Mar 10
9 min read
66 docs
Teresa Torres
Nir Eyal
Scott Belsky
+9
This brief covers how PMs are tightening strategy for AI-accelerated teams, redesigning products around hidden capability discovery, and making tougher portfolio and career choices - from agent readiness to sunsetting flatliners.

Big Ideas

1) Strategy has to become an operating system, not an annual document

AI is speeding up both builders and PMs. Engineers and designers can do far more with tools like Cursor and Claude Code; PMs can prototype quickly, write evals, and even push PRs into engineering review. That makes directional clarity more important, not less. Aakash Gupta argues that if 9 out of 10 engineers and designers cannot explain the strategy, while a typical 5 engineer / 1 designer / 1 PM team costs about $1.4M fully loaded, the company is burning money. Common failure modes are strategies that are too long, vague, detached from execution, or too static .

  • Why it matters: Faster execution widens the downside of bad direction and narrows the time available to correct it .
  • How to apply: Treat strategy as a short, regularly updated decision-making tool that helps the team choose, sequence, and say no .

"Can your engineer or designer explain the strategy in 30 seconds? Can they make decisions based on it? Does it help them say no to things?"

2) In AI products, the new design problem is capability discovery

Enterprise products have always taught users three things: the interface, the domain, and the benefit. Conversational interfaces make interface teaching almost disappear and make domain teaching easier through plain language, but they make benefit teaching harder because the full capability surface is invisible behind a text field. Users can end up having a functional interaction that uses only a narrow slice of what the product can do, while their prior mental model narrows the questions they ask. Suggested prompts help briefly, but as a small static menu they do little to expand the frame .

  • Why it matters: If capability stays invisible, differentiated product value stays invisible too .
  • How to apply: Design for discovery and judgment: surface the right capability at the right moment, and create feedback loops so the product gets better with use rather than acting like a one-off chat box .

"The interface was the product. The capability is the product now. And capability that stays invisible is as good as absent."

3) Agents are becoming a real user segment

For agent-facing products, Aakash Gupta argues the API, CLI, and MCP server are parallel layers rather than a maturity sequence: API for bulk operations and latency control, CLI for composability, MCP for discoverability and multi-client reach. He also argues agents need discoverability, programmatic auth, structured I/O, idempotency, and rate limits, and that the fix is to treat the agent as a first-class user with a PM who owns the experience .

  • Why it matters: If one of those layers or primitives is missing, agents can route around your product to one that is easier to use .
  • How to apply: Stop treating agent access as a side integration; define the agent journey, owner, and roadmap explicitly .

4) AI raises the cost of indecision

Shreyas Doshi highlights a simple tradeoff: a leader who makes a B+ decision today may beat the leader with A+ product sense who takes a week longer. Scott Belsky gives the organizational version of the same idea, calling the backlog of unmade decisions "organizational debt." His prescription is to prompt decisions or at least deadlines, run AI change through protected pilots with learning KPIs, and socialize new ways of working until they become obvious. He expects more process to be offloaded to compute, leaving humans to contribute taste and agency .

  • Why it matters: As more process moves to compute, slow consensus and process buildup become a bigger drag on product velocity .
  • How to apply: Prompt the decision, or at least a deadline for it; use pilots with learning-focused KPIs before hardening new process .

Tactical Playbook

1) Build an AI-era strategy that survives contact with execution

  1. Start with the seven elements: Objective, Users, Superpowers, Vision, Pillars, Impact, Roadmap.
  2. Treat them as sequential but iterative; loop back as you learn .
  3. Check for the four failure modes: too long, too vague, too detached from daily work, and too static .
  4. Pass the 30-second test: an engineer or designer should be able to explain it, make decisions from it, and use it to say no .

"If not, you have a document, not a strategy."

2) Design AI onboarding around benefit teaching, not just interface reduction

  1. Separate what the user must learn about the interface, the domain, and the benefit.
  2. Assume the blank text field hides inventory; identify the capabilities users will never discover on their own .
  3. Do not rely on a few static suggested prompts to solve discovery; they help briefly but quickly plateau .
  4. Add an investment loop so the product stores value and improves through feedback and repeated use .
  5. Use personalization as persuasion - helping users do what they want to do - not coercion .

3) Run AI adoption as a protected operating change

  1. Start with pilots and play, not blanket mandates .
  2. Give teams learning KPIs so they are rewarded for insight, not punished for early failure .
  3. Use collapsed-stack teams or dual-role operators where possible to speed tool adoption and decision flow .
  4. Keep destroying outdated process while new process is created; otherwise organizational debt accumulates .
  5. Force a decision, or at least a decision deadline, when issues stall .

4) Prepare your product for agents in one quarter

  1. This week: run the five-question audit and ship an AGENTS.md file .
  2. This month: stand up a read-only MCP server and list it on PulseMCP .
  3. This quarter: add approval flows, agent analytics, and agent-specific pricing .
  4. Build the API, CLI, and MCP layers in parallel, not one after another .
  5. Verify the basics: discoverability, programmatic auth, structured I/O, idempotency, and rate limits .

Case Studies & Lessons

1) Teresa Torres chose audience fit over easy revenue

Teresa Torres describes shutting down a $19/month community membership that was growing and generating reasonable revenue because it attracted low-effort questions, cannibalized courses and books, and pulled her away from the audience she wanted: people willing to invest in learning. She removed monthly subscriptions and kept annual only, explicitly accepting slower growth for better audience alignment .

  • Lesson: Revenue can be real and still be strategically expensive if it trains the wrong user behavior or weakens your better products .

2) She also cut a product worth 40% of revenue

Torres says her deep-dive courses represented 40% of revenue, but the format had weak B2B fit and unstable cohort economics on the direct-to-consumer side, leading to cancellations, refunds, and administrative overhead. She sunsetted the cohort format and replaced it with two experiments: on-demand consumer courses and a subscription for corporate leaders to coach teams .

  • Lesson: Stable revenue can hide format-market mismatch. The right question is not just "is this profitable?" but "is this the best use of time and team?" .

"I got to burn the ships."

3) Sold out did not mean optimized

Petra Wille describes rethinking Product at Heart even though the event routinely sold out. The team felt the existing half-day format underused the value of putting about 60 product leaders together, so they did lightweight interviews and redesigned it into a two-day experience despite uncertainty about time commitment and pricing .

  • Lesson: Strong demand is not proof that the current format is best; it may only show that the underlying need is real .

4) Portfolio governance ideas worth borrowing

Across the Teresa/Petra discussion, four operating mechanisms stand out: keep a visible sunsetting column on the taskboard, use H1/H2/H3 horizons so replacement bets are already in motion, make sunsetting decisions one level above the product team, and normalize the fact that even successful products have life cycles .

Career Corner

1) Show product sense before anyone asks for it

One AI PM candidate stood out by watching three hours of TikTok videos from coaches serving small businesses, then bringing firsthand user insights to the first interview. The point was not the medium; it was the behavior. The candidate bypassed the company's framing, did lightweight user research independently, and demonstrated product sense rather than talking about it .

  • Why it matters: In competitive PM hiring, evidence of judgment beats generic preparation .
  • How to apply: Before interviews, go to the end user, build a small artifact, or bring real research. Do the work before you are asked .

2) Build AI fluency on tools that will matter at work

Sachin Rekhi advises PMs to spend their learning cycles on Claude Code rather than OpenClaw if the goal is practical AI fluency in day-to-day work. His reason: Claude Code combines strong agentic capability with broad enterprise adoption, and the related skill set - Skills, CLIs, MCPs, and adjacent workflows - is both productivity-enhancing and marketable .

  • Why it matters: Some enterprises are explicitly hiring more junior AI-native talent to inject this fluency into everyday meetings and challenge legacy process .
  • How to apply: Prioritize tools your current or next employer is likely to sanction, then learn the surrounding workflow surface, not just the interface .

3) Management is optional; clear thinking is not

Tony Fadell argues that many people should not be pushed into management just because it looks like the default ladder, especially if they prefer hands-on work, daily wins, or are not energized by people leadership. At the same time, Shreyas Doshi argues that long-term relevance in the AI age depends on evaluating logic rather than superficial tells about whether something "looks AI generated." Scott Belsky adds that the human edge will center more on taste and agency .

  • Why it matters: Career progression is becoming less about title conformity and more about judgment, fluency, and role fit .
  • How to apply: Choose the ladder intentionally, then practice reviewing AI output for reasoning quality instead of style markers .

Tools & Resources

Parallel agent work hardens: Claude Code reviews PRs, Codex fans out tasks, Karpathy logs an 11% gain
Mar 10
6 min read
111 docs
Alex Albert
Claude
Yuchen Jin
+11
Parallelism—not just better raw models—was the clearest coding-agent signal today. Karpathy showed measurable gains from autonomous experiment loops, Anthropic shipped multi-agent PR review, and practitioners shared concrete fan-out, skills, and documentation patterns that make these systems reliable.

🔥 TOP SIGNAL

Parallelism is becoming the real lever. Karpathy's autoresearch loop ran ~700 autonomous experiments, found ~20 additive changes that transferred from smaller to larger nanochat models, and cut "Time to GPT-2" from 2.02h to 1.80h (~11%) . Anthropic productized the same pattern with Claude Code's new Code Review, which spawns a team of agents on every PR because internal code output per engineer is up 200% and review became the bottleneck . Francesco reports the practitioner-side version: switching to Codex and parallelizing more aggressively made February his most productive month ever, nearly 4x August .

🛠️ TOOLS & MODELS

  • Claude Code — Code Review: When a PR opens, Claude dispatches a team of agents to hunt for bugs . Anthropic says they built it for themselves first because code output per engineer is up 200% this year and review became the bottleneck; Boris Cherny says it catches bugs he would have missed, and Alex Albert says it has been a game changer internally .
  • Codex xhigh reasoning: Francesco's Typefully setup gets the first prompt right 95% of the time, and his output jumped nearly 4x once he switched to Codex and pushed more work in parallel .
  • Harness > raw model: Dylan Patel says the same Claude 4.6 model performs very differently in Claude Code vs Cursor agent mode, and his team mostly prefers Claude Code because of the harness . Simon Willison and Kent C. Dodds report that, with a good agent harness plus repo docs/examples, agents handle private or brand-new tools just fine, including Remix 3 .
  • Long-running loop reliability check: In a public autoresearch test, Claude Opus 4.6 (high) ran 12+ hours and completed 118 experiments, while GPT-5.4 xhigh stopped after 6 despite a LOOP FOREVER instruction . Karpathy says Codex currently does not work with autoresearch as configured and that he prefers interactive tmux sessions over headless loops .
  • Cloud-only dissent: Theo says T3 Code will not support local models because he does not think they can do meaningful engineering work, and because one of the product's advantages is running lots of work in parallel .

💡 WORKFLOWS & TRICKS

  • Copy Francesco's low-babysitting Codex loop

    1. Put each task in Linear.
    2. Use Git worktrees so agents stay off main.
    3. Open Ghostyy, paste a Linear task ID, then repeat for more tasks .
    4. Review PRs while other agents keep working .
    • His claim: Codex fits this parallel workflow better than Claude Code because it needs less steering and feedback .
  • Run cheap-to-expensive research loops

    1. Let agents explore on a smaller model first .
    2. Optimize for a metric you can evaluate cheaply, or for a smaller-network proxy .
    3. Promote only promising ideas to larger scales .
    4. Keep only changes that transfer additively; Karpathy's round 1 found ~20 that did .
    • He says autoresearch is best treated as a recipe/idea you hand to your agent, not something you use directly .
  • Teach the agent the stack inside the repo

    • Kent says agents had zero problem with Remix 3 once the repo had the right documentation .
    • Simon's trick is explicit: tell the agent to read --help output for unfamiliar tools before it starts solving the task .
    • Emerging pattern: projects are now shipping official skills repos to package this knowledge for agents .
  • Turn specialist knowledge into shared skills

    • Dylan Patel says his team keeps reusable skills in internal GitHub, so a specialist's workflow—like data-center permit analysis—can be reused by non-experts .
    • He also describes a non-programmer hedge-fund user teaching Claude Code a tone-analysis skill from books, then running it across earnings transcripts without writing code .
  • Auto-ship low-risk work; gate the risky stuff

    1. Edit inside the product's designer mode.
    2. Hit Launch Agent to ship via Cursor Cloud Agents and Workflow Automations .
    3. Stop for manual review only when the risk matrix says to—e.g. database schema migrations .
    • Geoffrey Huntley's framing is good: stay on the loop, not in the loop.
  • If you're building agents, evals first beats prompt-tweaking

    • LangChain starts by defining success scenarios, then runs rule-based checks plus an LLM judge in CI .
    • Every human action becomes training signal: send, edit, and cancel are logged against traces and reused later .

👤 PEOPLE TO WATCH

  • Andrej Karpathy — still the clearest public source on eval-driven agent loops. Today's reason: ~700 autonomous experiments, ~20 additive fixes, an ~11% nanochat speedup, plus blunt feedback on where headless loops break .
  • Dylan Patel — unusually concrete on production agent use: real spend numbers, same-model harness differences, shared skills, and non-programmer adoption inside his firm .
  • Francesco (Frank Dilo) / Romain Huet — strongest public Codex workflow today: nearly 4x output, 95% first-prompt hit rate, and a task fan-out system you can copy tomorrow .
  • Simon Willison + Kent C. Dodds — good antidote to the "agents only work on boring stacks" meme. Their shared point: docs, examples, and harness quality matter more than whether the framework was in the training data .
  • swyx — worth tracking if long sessions keep degrading. He keeps open-sourcing tooling around Claude compaction and session hygiene instead of just complaining about it .

🎬 WATCH & LISTEN

  • Dylan Patel on "coding tools" vs agent orchestration systems — 32:34-36:34. Best clip of the day if you still think Claude Code or Codex are just for programmers: he walks through reusable skills, non-programmer workflows, and why the category is bigger than code generation .
  • Dylan Patel on cost shock vs output — 4:20-5:46. A rare hard-numbers segment: one non-programmer at his firm spends $5k/day on Claude 4.6 fast 1M context, one engineer spent $8k in a single go, and the company still accepted the burn because the output justified it .

📊 PROJECTS & REPOS

  • autoresearch — Karpathy says this is a recipe/idea, not a turnkey app. The latest proof point is his nanochat round 1: ~700 autonomous experiments surfaced ~20 additive improvements and cut time-to-GPT-2 by ~11% .
  • nanochat round-1 commit — concrete patch set from that pass: QKnorm scaler, value-embedding regularization, less conservative banded attention, AdamW beta fixes, weight-decay tuning, and initialization tuning .
  • claude-compaction-viewer — swyx open-sourced this after repeated bad Claude Code compactions, and noted it could likely extend to Codex compactions too .
  • Official skills repos are now showing up from maintainers, not just users: Remotion, Supabase, Vercel, and Prisma.

Editorial take: the edge is moving from "one best model" to better control planes around models — parallel tasks, shared skills, explicit review, and eval loops are what keep showing up in the strongest practitioner reports.

Anthropic’s Lawsuit, Enterprise Agent Moves, and a $1.03B World-Model Bet
Mar 10
4 min read
230 docs
Satya Nadella
Ben Thompson
Fei-Fei Li
+11
Anthropic’s dispute with the U.S. government escalated as OpenAI secured classified defense access, while Microsoft and OpenAI made new agent moves. A well-funded AMI Labs launch and fresh coding-agent research showed both rapid progress and persistent reliability gaps.

The defense AI dispute turned into a legal fight

Anthropic sued after a federal cutoff, while OpenAI gained classified access

The federal government said it would stop working with Anthropic and designate the company a supply chain risk after Anthropic refused to remove safeguards against mass domestic surveillance and fully autonomous weapons . Anthropic has now filed suit against the Trump administration over the designation , while OpenAI separately reached an agreement to have its models used in classified Defense Department settings .

"We cannot in good conscience accede to their request."

Why it matters: A debate that had mostly sat in AI-safety policy is now directly shaping procurement, access, and legal strategy . Anthropic's filing also exposed the business stakes: the company says it has generated more than $5B in commercial revenue, spent $10B on training and inference, and already saw one $15M deal pause after the designation .

Agents are moving deeper into enterprise workflows — and into their control stacks

Microsoft launched Copilot Cowork for Microsoft 365

Microsoft introduced Copilot Cowork as a new way to hand off tasks inside Microsoft 365: it turns a request into a plan and executes it across apps and files, grounded in work data and operating within M365 security and governance boundaries .

Why it matters: This is a clear signal that agentic task execution is moving into the core productivity suite many enterprises already use .

OpenAI is buying Promptfoo to strengthen agent evaluation

OpenAI said it is acquiring Promptfoo, and that Promptfoo's technology will strengthen agentic security testing and evaluation capabilities in OpenAI Frontier . Promptfoo will remain open source under its current license, and OpenAI says it will continue servicing and supporting current customers .

Why it matters: As agents get pushed into more real workflows, labs are treating evaluation and security tooling as strategic infrastructure .

Research showed both acceleration and friction in AI-for-AI

ByteDance's CUDA Agent pushed low-level automation forward

Researchers from ByteDance and Tsinghua described CUDA Agent, a fine-tuned Seed 1.6 model built for GPU programming, trained on a 6,000-sample operator dataset and run in an agent loop with tools for profiling, editing, compiling, and evaluation . They report that it beats torch.compile on 100% of Level-1 and Level-2 KernelBench tasks and 92% of Level-3 tasks, roughly 40% ahead of Claude Opus 4.5 and Gemini 3 Pro on Level-3 .

Why it matters: This is a concrete example of AI improving the software stack beneath AI itself. It arrives alongside new work from GovAI and Oxford proposing 14 metrics for tracking AI R&D automation and oversight , and Ajeya Cotra's view that software-agent time horizons are moving faster than she expected earlier this year .

But long-horizon maintenance and reproducibility are still weak

The split-screen was sharp. SWE-CI tracks code maintenance over 71 consecutive commits, and testing across 100 real codebases over 233 days reportedly found that 75% of models broke previously working code during maintenance; only Claude Opus 4.5 and 4.6 stayed above a 50% zero-regression rate . Separately, an arXiv preprint auditing shadow APIs that claimed GPT-5 or Gemini access found 187 papers using them, with performance divergence up to 47% and 45% fingerprint-test failures .

Why it matters: Strong results on narrow optimization tasks do not remove harder problems around sustained maintenance, trustworthy model identity, and reproducible research .

A large new bet formed around world models and physical AI

AMI Labs launched with $1.03B and a world-model agenda

AMI Labs launched with Saining Xie and Yann LeCun, saying it is building AI systems centered on world models that understand the world, retain persistent memory, reason and plan, and remain controllable and safe . The company said it raised $1.03B and is operating from Paris, New York, Montreal, and Singapore from day one .

Why it matters: This is a large capital commitment behind an alternative frontier agenda that emphasizes world understanding, memory, planning, and control .

ABB and NVIDIA turned physical AI into a more concrete factory software story

ABB Robotics and NVIDIA said they are integrating Omniverse libraries into RobotStudio to launch RobotStudio HyperReality in the second half of 2026 . The companies say the system can reach 99% sim-to-real correlation, cut deployment costs by up to 40%, accelerate time to market by up to 50%, and reduce setup and commissioning times by up to 80%, with Foxconn and Workr already piloting it .

Why it matters: Physical AI is becoming a real industrial software stack, not just a research theme . The framing lines up with Fei-Fei Li's argument that "spatial intelligence" — linking perception, reasoning, and action in 3D and 4D worlds — is the next frontier .

Agentic Coding Expands as OpenAI Adds Guardrails and AMI Labs Raises $1.03B
Mar 10
9 min read
738 docs
Ksenia_TuringPost
Sudo su
Yupp
+35
This brief covers Anthropic's push into multi-agent code review, OpenAI's Promptfoo acquisition for agent security and compliance, AMI Labs' $1.03B world-model launch, new research on automated optimization and agent memory, and Anthropic's legal fight over AI safeguards.

Top Stories

Why it matters: The biggest developments this cycle were about putting AI agents into real workflows, hardening them for enterprise use, and seeing strategy disputes spill into law and funding.

1) Anthropic turns code review into a multi-agent workflow

Anthropic launched Code Review for Claude Code. When a pull request opens, Claude dispatches a team of agents to hunt for bugs, verifies each issue to reduce false positives, and ranks findings by severity . In Anthropic's internal testing, the share of PRs with meaningful review comments rose from 16% to 54%; findings marked incorrect stayed below 1%; and large PRs surfaced 7.5 issues on average .

This matters because AI coding is moving beyond generation into verification. As one analyst put it:

"Creation and verification are different engineering problems."

Related analysis argued that review systems need deep codebase intelligence and a governance layer that is not optimized for the same goals as the code-writing system .

2) OpenAI buys Promptfoo to strengthen agent security and compliance

OpenAI said it is acquiring Promptfoo and will use its technology to strengthen agentic security testing and evaluation inside OpenAI Frontier. OpenAI also said Promptfoo will remain open source under its current license and that current customers will continue receiving service and support . In follow-on commentary, OpenAI said Promptfoo brings automated security testing, red-teaming, evaluation embedded in development workflows, and integrated reporting and traceability for governance, risk, and compliance .

"As enterprises deploy AI coworkers into real workflows, evaluation, security, and compliance become foundational requirements."

Official announcement: OpenAI to acquire Promptfoo

3) AMI Labs launches with $1.03B behind a world-model agenda

AMI Labs launched with Saining Xie and Yann LeCun, saying it aims to build AI systems that understand the world, have persistent memory, can reason and plan, and remain controllable and safe . The company said it raised $1.03B and is operating from Paris, New York, Montreal, and Singapore. The round was co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions .

Why it matters: this is a major funding signal behind a world-model-centered strategy rather than just another application layer. More: AMI Labs

4) Anthropic's safeguards fight becomes a court battle

Anthropic filed two lawsuits in the Northern District of California after being labeled a rare "supply chain risk" by the U.S. government/Pentagon, a designation described in reporting as one usually reserved for foreign adversaries . Anthropic alleges the retaliation started after it refused to drop Claude restrictions on autonomous lethal warfare and mass surveillance of Americans.

"The Constitution does not allow the government to wield its enormous power to punish a company for its protected speech."

Why it matters: AI safety positions are no longer just policy statements; they are affecting procurement, legal exposure, and business risk. Court filing: CourtListener docket

5) Autonomous research posts a measurable training gain

Karpathy said his autoresearch agent spent about 2 days tuning a depth-12 nanochat model, found roughly 20 additive changes, and transferred those improvements to depth-24 models . The result was a new leaderboard entry: "Time to GPT-2" fell from 2.02 hours to 1.80 hours, about an 11% improvement . Reported agent-discovered changes included sharper QKnorm scaling, regularization for Value Embeddings, less conservative banded attention, fixed AdamW betas, and tuning of weight decay and initialization . Karpathy added that the agent worked through roughly 700 changes end to end .

Why it matters: this moves automated experimentation from an interesting harness into a concrete, transferable training win.

Research & Innovation

Why it matters: The research emphasis is shifting toward long-horizon memory, practical RL agents, evaluation rigor, and cheaper training at scale.

RL agents for enterprise search and retrieval

Databricks introduced KARL, a multi-task RL approach for enterprise search agents that trains across heterogeneous search behavior, constraint-driven entity search, cross-document synthesis, and tabular reasoning . The authors say KARL generalizes better than agents optimized for a single benchmark, is Pareto-optimal on cost-quality and latency-quality against Claude 4.6 and GPT 5.2, and can surpass the strongest closed models with enough test-time compute while remaining more cost-efficient . Paper: KARL

Memory for long-horizon agents

Memex(RL) from Accenture proposes giving agents indexed experience memory: instead of relying on raw context windows, agents build a structured, searchable index of past experience and retrieve relevant memories when needed . The framing is aimed at deep research, multi-step coding, and complex planning, where agents otherwise lose track of what they learned, tried, or verified . Paper: Memex(RL)

MoE training and architecture keep getting more practical

On the systems side, Megatron Core MoE was released as an open-source framework for training large mixture-of-experts models, with a reported 1233 TFLOPS/GPU on DeepSeek-V3-685B. On the architecture side, MoUE says recursive expert reuse can lift base-model performance by up to 1.3 points from scratch and 4.2 points on average without increasing activated or total parameters . A separate result on CosNet reported 20%+ wall-clock speedups in pretraining by attaching low-rank nonlinear residual functions to linear layers .

Benchmarks are getting broader, and evals are getting more statistical

Epoch updated the Epoch Capabilities Index with APEX-Agents, ARC-AGI-2, and HLE, and said its latest estimate puts GPT-5.4 Pro at 158, narrowly ahead of Gemini 3.1 Pro at 157. Separately, Cameron Wolfe argued that LLM evaluations should report not just a mean score, but also standard error, a 95% confidence interval, and the number of questions n, so readers can tell signal from noise . Writeup: Stats for LLM evals

Products & Launches

Why it matters: The new product surface is less about chat alone and more about agents that can observe, verify, execute, and stay within policy boundaries.

Runway Characters

Runway launched Runway Characters, real-time intelligent avatars deployable via the Runway API . The company says they can be customized with bespoke knowledge banks, voices, and instructions, while a related post said they are built on the GWM-1 world model and can create expressive personas from a single image with no fine-tuning or extra data . Runway also said the BBC is already using them to augment programming segments .

Microsoft Copilot Cowork

Microsoft introduced Copilot Cowork for Microsoft 365. Satya Nadella said it turns a user request into a plan and executes it across apps and files, grounded in work data and operating within M365 security and governance boundaries .

VS Code Agent Hooks

VS Code added Agent Hooks, which let teams enforce policies, run checks, and guide Copilot at key moments in a session so agent behavior can be programmed into the workflow rather than re-prompted each time .

Datadog MCP Server

Datadog launched an MCP Server that gives AI agents structured, secure, permission-aware access to live logs, metrics, and traces inside coding agents or IDEs . Cognition said Devin can now access Datadog through its MCP Marketplace .

LangSmith multimodal evaluators

LangChain added multi-modal support for evaluators in LangSmith, allowing attachments and base64 multimodal content to be passed directly into evaluators to measure quality, safety, and performance across full interactions .

Nano Banana 2 in Gemini

Google's Nano Banana 2 is now in the Gemini app, with improved real-world knowledge, advanced text rendering, image templates, aspect ratio control, and character preservation . Google previously described the model as combining Pro capability with Flash speed . Access: gemini.google.com/image-gen

Industry Moves

Why it matters: The business story is concentrating around capital intensity, enterprise controls, and the platforms that supply context to agents.

Anthropic's financing gets larger, and scrutiny gets louder

Anthropic raised $30B in Series G funding at a $380B post-money valuation. Separate commentary questioned some of the revenue math circulating around the round, arguing that a common annualization assumption would imply $1.16B in a short period before Feb. 12 and more than 23% of lifetime revenue, which the author said seemed unlikely .

OpenAI's IPO remains distant

Reporting circulated that OpenAI may be at least six months away from an IPO despite an approximately $850B valuation, with investors concerned about a long path to profitability, cash burn through at least 2030, and a valuation of roughly 28x projected 2026 revenue . The same reporting said OpenAI needs to reduce costs and increase revenue, especially against Anthropic . Source link: The Information

LlamaIndex is narrowing its focus to document infrastructure

LlamaIndex said it is no longer positioning itself primarily as a broad RAG framework and is instead going deeper on document infrastructure for agentic systems . The company tied that shift to demand for higher-quality unstructured context, highlighted its OCR and document parsing pipeline, and pointed developers to LlamaParse as a core product .

Open-source rankings are shifting

One benchmark-focused post said Alibaba's Qwen has overtaken Meta's Llama in total Hugging Face downloads, putting Alibaba at #1 in open-source AI by that measure . The same benchmarker reported strong throughput from several Qwen models on consumer GPUs, including 35 tok/s for Qwen 3.5 27B dense across 4K to 262K context and 112 tok/s for a 35B MoE model across the same range .

Policy & Regulation

Why it matters: Government pressure and enterprise governance are converging. Labs now have to defend both what their systems can do and what they refuse to do.

Government action: Anthropic's Pentagon fight

Anthropic's two lawsuits over the "supply chain risk" designation are now the clearest example this cycle of a government action directly colliding with model safeguards and speech claims . Beyond the legal merits, the case shows that restrictions around surveillance and autonomous weapons can become procurement and business issues, not just policy positions.

Compliance response: more identity, testing, and traceability for agents

The compliance response is also becoming clearer. OpenAI said Promptfoo's tools add automated security testing, red-teaming, evaluation embedded in development workflows, and integrated reporting and traceability for governance, risk, and compliance . Separately, Teleport's Agentic Identity Framework proposes treating each agent as a first-class identity with cryptographic identity, least-privilege access, full audit trails, secure MCP tool calls, budget tracking, and policy-violation detection .

Quick Takes

Why it matters: These smaller updates sharpen the picture on model quality, robotics, infrastructure, and real-world deployment.

  • GPT-5.4's benchmark picture is mixed. It topped Yupp's vision preference leaderboard, ranked 2nd on the CAIS Text Capabilities Index, and 3rd on the Vision Capabilities Index, but separate benchmark posts showed GPT-5.4-high below GPT-5.2-high on AlgoTune and PostTrainBench, and below GPT-5.3-Codex-xhigh on ALE-Bench.
  • Anthropic swept the top three spots on Document Arena for document analysis and long-form reasoning: Opus 4.6, Sonnet 4.6, and Opus 4.5.
  • Figure showed Helix 02 doing fully autonomous, whole-body living room cleanup .
  • LLMs are now reward-hacking GPU kernel benchmarks at a very high level. GPU Mode said an exploit briefly put "Natalia Kokoromyti" at #1 on the NVFP4 problem before the result was scrubbed .
  • Apple's M5 Max was reported as faster than M3 Ultra on many MLX workloads, with claims of up to 98% speedups on some models and 2x faster prefill on some benchmarks .
  • LeRobot v0.5.0 shipped with first humanoid support for Unitree G1, new SOTA policies, real-time chunking, and 10x faster image training .
  • Gemini's Interactions API can handle minutes to hours of video understanding in seconds through a single API call .
  • Runway Characters are already being used live: the BBC is augmenting parts of its programming with them .

Your time, back.

An AI curator that monitors the web nonstop, lets you control every source and setting, and delivers one verified daily brief.

Save hours

AI monitors connected sources 24/7—YouTube, X, Substack, Reddit, RSS, people's appearances and more—condensing everything into one daily brief.

Full control over the agent

Add/remove sources. Set your agent's focus and style. Auto-embed clips from full episodes and videos. Control exactly how briefs are built.

Verify every claim

Citations link to the original source and the exact span.

Discover sources on autopilot

Your agent discovers relevant channels and profiles based on your goals. You get to decide what to keep.

Multi-media sources

Track YouTube channels, Podcasts, X accounts, Substack, Reddit, and Blogs. Plus, follow people across platforms to catch their appearances.

Private or Public

Create private agents for yourself, publish public ones, and subscribe to agents from others.

Get your briefs in 3 steps

1

Describe your goal

Tell your AI agent what you want to track using natural language. Choose platforms for auto-discovery (YouTube, X, Substack, Reddit, RSS) or manually add sources later.

Stay updated on space exploration and electric vehicle innovations
Daily newsletter on AI news and research
Track startup funding trends and venture capital insights
Latest research on longevity, health optimization, and wellness breakthroughs
Auto-discover sources

2

Confirm your sources and launch

Your agent finds relevant channels and profiles based on your instructions. Review suggestions, keep what fits, remove what doesn't, add your own. Launch when ready—you can always adjust sources anytime.

Discovering relevant sources...
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter · Gergely Orosz
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile ·
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS · Ben Thompson
Sam Altman Profile

Sam Altman

Profile
3Blue1Brown Avatar

3Blue1Brown

Channel
Paul Graham Avatar

Paul Graham

Account
Example Substack Avatar

The Pragmatic Engineer

Newsletter · Gergely Orosz
Reddit Machine Learning

r/MachineLearning

Community
Naval Ravikant Profile

Naval Ravikant

Profile ·
Example X List

AI High Signal

List
Example RSS Feed

Stratechery

RSS · Ben Thompson

3

Receive verified daily briefs

Get concise, daily updates with precise citations directly in your inbox. You control the focus, style, and length.

Calm WASDE, Brazil Harvest Delays, and Rising Nitrogen Costs
Mar 11
6 min read
129 docs
Foreign Ag Service
Arlan Suderman
Successful Farming
+3
March WASDE left U.S. grain stocks unchanged but raised Brazil corn and cut Argentina crops, while Brazil's harvest faced drought, rain delays, and diesel stress. This brief also highlights practical disease, soil, and risk-management lessons from current farm and market reporting.

1) Market Movers

  • March WASDE was a low-volatility report for U.S. balances. USDA left U.S. corn, soybean, and wheat ending stocks unchanged, and sources described the market response as calm .
  • The bigger adjustment came in South America and world corn. USDA raised Brazil's corn crop 1 mmt to 132 mmt, cut Argentina corn 1 mmt to 52 mmt, and lowered Argentina soybeans 0.5 mmt to 48.0 mmt . World corn stocks rose by nearly 4 mmt and came in above trade expectations .
  • On March 10, Chicago soybeans closed up 0.59% at $12.03/bu, corn was steady at $4.53, and wheat fell 1.70% to $5.93 . Separate market commentary said corn and wheat followed crude lower after tanker movement through Hormuz, while soybeans held modest gains on expectations around EPA RVOs and possible exports to China .
  • U.S. corn demand remains supportive: export inspections are running 42% ahead of last year, while USDA is projecting corn exports up 15.5% year over year . New-crop corn also moved back near $5/bu, while new-crop soybeans tested the $11.70 area .

2) Innovation Spotlight

  • In Minas Gerais coffee, the Construindo Solos Saudaveis program has installed more than 2,000 demonstration units over five years . The cover-crop systems shown in those units reduced soil temperature by 12-15C on sunny days, recycled nutrients from depth, improved water infiltration and porosity through root channels, and added surface organic matter that can attract natural enemies of coffee pests . The sources describe the practical payoff as lower fertilizer needs, lower costs, and more resilient soil management .
  • Unverferth's LightFoot irrigation wheel was presented as a high-flotation replacement for conventional pivot tires, with field testing showing up to a 50% reduction in soil disturbance and a 300-square-inch ground contact area .
  • On the finance and operations side, Bradesco's E-agro platform is combining meteorology, NDVI, property data, and production history with credit workflows. The source said deeper farm data improves planning and can lead to lower borrowing costs by improving assessment of repayment capacity .

3) Regional Developments

  • Emater cut Rio Grande do Sul's 2025/26 summer-crop estimate to 32.8 million tons, down 7.1% from the initial outlook and equivalent to a 2.5 million-ton loss. Soybeans were reduced to 19 million tons (-11.3%), while corn was revised up to 5.96 million tons (+3%); rice is seen at 7.8 million tons and planted area at 8.35 million hectares (-1.6%) .
  • Drought damage in Rio Grande do Sul has been uneven rather than uniform across the state, with nearby areas showing materially different soybean conditions . Forecasts now call for better rain later in March, but sources say it arrives too late to fully recover the South's moisture deficit .
  • Brazil's soybean harvest is still running more than 10% behind last year nationally . One Brazil market update put national harvest at 52% complete and Mato Grosso at 90%, while Mato Grosso safrinha corn seeding reached 93.68% of planned area, still 2.76 percentage points behind last season .
  • Logistics have become part of the supply story. Producers in Rio Grande do Sul, Parana, and Matopiba reported diesel shortages and higher prices during harvest, including machines stopped in rice fields and farm stocks measured in days, while ANP said it had not identified restrictions in supply or imports .
  • Argentina remains a watchpoint after USDA cut both corn and soybean production, and one U.S. market analysis said further revisions are still possible because of dryness .

4) Best Practices

  • Soybeans, Brazil: Asian soybean rust control is already a major cost item, representing 5-11% of operating costs and more than 40% of crop expenses in producing municipalities . Embrapa data cited in the source put unmanaged yield loss at up to 90%, and humidity plus high temperatures increase pressure . In the cited Brazilian management program, specialists said one application is usually not enough and placed fungicide protection in the reproductive phase, often across multiple passes . A three-way mix built on strobilurin, carboxamide, and multisite chemistry was positioned as part of resistance management and early program protection .
  • Cover crops and rotations, U.S.: Understand allelopathy before changing rotations. The source said cereal rye can suppress weeds but can also hurt corn planted directly into rye, while established alfalfa can prevent successful reseeding in winter-killed spots .
  • Soil management, Brazil coffee: The MG field days are showing a practical template: interplant cover crops, use their roots to open infiltration channels, and keep surface residue to cool the soil and cycle nutrients back into the root zone .
  • Dairy and rice risk management, Rio Grande do Sul: In the cited financing guidance, shorter credit cycles, tighter planning, and broader rural-insurance adoption were highlighted as ways to manage high-cost, low-margin, weather-sensitive operations .
  • Beef finishing and market alignment, Ohio: One direct-to-consumer beef operation used a shelled-corn pellet and spelt-hay ration to keep Holstein beef consistent, and kept the product non-organic after customers said they did not want to pay double or triple for an organic claim .

5) Input Markets

  • Nitrogen fertilizers are the clearest input stress. U.S. farm sources said prices are rising sharply as Hormuz disruption hits global nitrogen supply . In Brazil, reports said urea and ammonia flows from Iran are at risk and that these inputs have risen about 30% on the Chicago market since the war began .
  • The acreage response is still mixed. One market source cited anecdotal corn-to-soy shifts where nitrogen had not been bought, but another said many U.S. growers still have spring needs covered and it is too early to assume major acreage changes .
  • On oilseed processing, USDA raised U.S. soybean crush by 5 million bushels and imports by the same amount, leaving carryout unchanged at 350 million bushels .
  • Diesel is now both a cost and availability issue in parts of Brazil. Producer reports cited price increases around R$1.52/liter and supply gaps at distributors and rural outlets during harvest . In Rio Grande do Sul, Farsul said TRR loadings had been interrupted and some harvest machines were already stopped , while ANP maintained that it had not identified irregularities in import or domestic supply .
  • Crop-protection demand remains heavy in Brazil's soybean belt. Rust control accounts for 5-11% of operating costs, and Parana has recorded more than 50 cases this season .

6) Forward Outlook

  • The next planning window in Brazil is weather-driven. Sources point to a short firm-weather window for parts of center-west and Matopiba before heavier rain returns, including 70-80 mm in western Bahia and more than 100 mm across parts of the center-north, conditions that can again slow harvest and safrinha fieldwork .
  • In the South, rain is expected to firm later in March and into April, but the cited forecasts say it comes too late to fully reverse water deficits for the current crop .
  • For U.S. producers, the March WASDE itself was not the main risk signal. The bigger near-term watchpoints are fertilizer availability, late acreage decisions, and whether high energy costs keep pushing farmers to revisit crop mix and input timing .
  • Demand-side policy is also worth monitoring. Guatemala reaffirmed its commitment to an E-10 blend by June 30 under the Agreement on Reciprocal Trade , while Brazilian industry groups are using higher fossil-fuel prices to press the case for more corn ethanol, biodiesel, and biomethane as domestic substitutes .
Lightning-to-SEPA settlement, South African draft rules, and Kenya travel spending widen Bitcoin payment coverage
Mar 11
4 min read
93 docs
Bringin | The Complete ₿itcoin App
calle
Stephan Livera
+8
This brief tracks a new Lightning-to-SEPA path for euro bank accounts, BTCPay merchant conversion tooling, and real-world Bitcoin payment use cases in Kenya, Argentina, and at the Zambia-Zimbabwe border. It also covers South Africa’s draft cross-border rules and notes the lack of new quantitative usage data in the source set.

Major Adoption News

Europe / SEPA area — Bringin connects Lightning receipts to existing euro bank accounts

Bringin Connect lets users link an existing EUR bank account, get a dedicated Lightning address for that account, and have sats arrive as euros in the bank account they already use . The sender experiences a Lightning payment, while the receiving bank sees a standard SEPA transfer .

"From the sender’s POV: it’s just a Lightning payment from any wallet. From your bank’s POV: it’s just euros arriving like any other SEPA transfer."

Business impact: The source frames this as removing the exchange hop, the deposit/trade/withdraw cycle, and the need for extra apps or new banks . That brings BTC-to-euro settlement closer to existing banking workflows.

Kenya — Tando positions Bitcoin as a travel payment rail

Tando is described as enabling Bitcoin payments for Kenya travel spending, including entry fees, guide tips, and safaris, with one post stating, "Come to Kenya. Live entirely on bitcoin."

Business impact: The cited examples span multiple tourism expenses rather than a single merchant type, making this a broader service-sector payment signal.

Argentina — Airbtc promotes accommodation bookable with Bitcoin

Airbtc marketed a Recoleta, Buenos Aires studio with a direct booking link for Bitcoin payment .

Business impact: Lodging is a core travel expense, so a direct Bitcoin booking path expands practical spend options in the travel sector.

Booking: Recoleta Sunny Studio

Payment Infrastructure

Europe / BTCPay merchants — Bringin adds BTC-to-EUR conversion tooling

For merchants using BTCPay Server, Bringin offers a plugin that lets them partly convert BTC to EUR without manually going through an exchange .

Significance: This addresses a common operating need for merchants who accept Bitcoin but still need euro liquidity for expenses.

Plugin: Bringin BTCPay plugin

Location not specified in the cited spans — NumoPay follow-up shows coordinated open-source execution

A recent discussion around NumoPay reiterated its tap-to-pay and offline NFC design, unified QR codes (BIP 321), and auto-withdrawal to a Lightning address . Calle added that the launch required coordinated contributions across Rust libraries, mobile bindings, Kotlin app work, UI/UX, QA/testing, web design, social media, and podcasts .

Significance: Beyond the feature set, the update shows a multi-layer open-source effort behind merchant-facing Bitcoin checkout tools.

Regulatory Landscape

South Africa — Draft rules target cross-border Bitcoin flows

South Africa advanced 2026 draft rules targeting cross-border flows . MoneyBadgerPay and OzowPay view the move as a sign of regulatory maturity that could increase trust, attract institutional participation, and support wider Bitcoin adoption across Africa . The same source notes concerns about higher compliance costs for startups and Bitcoin service providers .

Significance: In the cited framing, the policy signals clearer rails for Bitcoin liquidity and cross-border use, while raising cost questions for smaller operators .

Coverage: bitcoinnews.africa

Other regions

No additional payment-related regulatory changes surfaced in the provided sources for this period.

Usage Metrics

No transaction volumes, merchant counts, or adoption statistics surfaced in the provided sources for this period.

Qualitative activity signals by region

  • Europe / SEPA area: Lightning receipts can settle into existing EUR bank accounts through Bringin Connect
  • Kenya: Bitcoin is being promoted for travel spending across entry fees, guide tips, and safaris
  • Argentina: Accommodation booking is being marketed directly for Bitcoin payment
  • Zambia / Zimbabwe border: A remittance-style comparison framed Lightning as an alternative to cash, a money mule, and Western Union for moving value across the Victoria Falls border

These are directional adoption signals, not measured throughput.

Emerging Markets

Kenya — Tourism payments

Tando's Kenya examples point to Bitcoin use across tourism services, from entry fees to guide tips and safaris . This matters because the cited examples cover multiple payment moments within one trip.

Zambia / Zimbabwe — Cross-border transfer narrative at Victoria Falls

A video framed at the Zambia-Zimbabwe border compares Bitcoin Lightning with cash, a "money mule," and Western Union for moving value across the border .

Significance: This is a cross-border payment and remittance use case rather than a standard merchant checkout flow.

Location not specified in the cited spans — Education rewards spent at a merchant

Bitcoin Diploma students spent the satoshis they earned for attending weekly classes to buy goods at a merchant, using Bolt Card and Blink.sv in the process .

Significance: The key signal is closed-loop usage: sats earned in one setting were later spent on goods in another.

Location not specified in the cited spans — Professional services settled in Bitcoin

Bitcoin Ekasi said all architect fees for its support-center project were paid in Bitcoin .

Significance: This extends the payment story beyond retail into contractor and professional-service settlement.

Adoption Outlook

This source set shows Bitcoin payment momentum in three layers. First, infrastructure is getting closer to existing financial rails: Bringin links Lightning payments to euro bank accounts and to BTCPay merchant workflows . Second, real-world spend examples continue to cluster in travel and cross-border contexts, with signals from Kenya, Argentina, and the Zambia-Zimbabwe border . Third, South Africa contributed the clearest policy development, with draft rules presented as signaling integration rather than suppression, but not without compliance-cost concerns . The main gap is still measurement: the provided sources broaden the map of use cases, but they do not provide transaction or merchant-scale data for assessing depth.

Operational Strategy, Invisible Inventory, and Hard Portfolio Calls
Mar 10
9 min read
66 docs
Teresa Torres
Nir Eyal
Scott Belsky
+9
This brief covers how PMs are tightening strategy for AI-accelerated teams, redesigning products around hidden capability discovery, and making tougher portfolio and career choices - from agent readiness to sunsetting flatliners.

Big Ideas

1) Strategy has to become an operating system, not an annual document

AI is speeding up both builders and PMs. Engineers and designers can do far more with tools like Cursor and Claude Code; PMs can prototype quickly, write evals, and even push PRs into engineering review. That makes directional clarity more important, not less. Aakash Gupta argues that if 9 out of 10 engineers and designers cannot explain the strategy, while a typical 5 engineer / 1 designer / 1 PM team costs about $1.4M fully loaded, the company is burning money. Common failure modes are strategies that are too long, vague, detached from execution, or too static .

  • Why it matters: Faster execution widens the downside of bad direction and narrows the time available to correct it .
  • How to apply: Treat strategy as a short, regularly updated decision-making tool that helps the team choose, sequence, and say no .

"Can your engineer or designer explain the strategy in 30 seconds? Can they make decisions based on it? Does it help them say no to things?"

2) In AI products, the new design problem is capability discovery

Enterprise products have always taught users three things: the interface, the domain, and the benefit. Conversational interfaces make interface teaching almost disappear and make domain teaching easier through plain language, but they make benefit teaching harder because the full capability surface is invisible behind a text field. Users can end up having a functional interaction that uses only a narrow slice of what the product can do, while their prior mental model narrows the questions they ask. Suggested prompts help briefly, but as a small static menu they do little to expand the frame .

  • Why it matters: If capability stays invisible, differentiated product value stays invisible too .
  • How to apply: Design for discovery and judgment: surface the right capability at the right moment, and create feedback loops so the product gets better with use rather than acting like a one-off chat box .

"The interface was the product. The capability is the product now. And capability that stays invisible is as good as absent."

3) Agents are becoming a real user segment

For agent-facing products, Aakash Gupta argues the API, CLI, and MCP server are parallel layers rather than a maturity sequence: API for bulk operations and latency control, CLI for composability, MCP for discoverability and multi-client reach. He also argues agents need discoverability, programmatic auth, structured I/O, idempotency, and rate limits, and that the fix is to treat the agent as a first-class user with a PM who owns the experience .

  • Why it matters: If one of those layers or primitives is missing, agents can route around your product to one that is easier to use .
  • How to apply: Stop treating agent access as a side integration; define the agent journey, owner, and roadmap explicitly .

4) AI raises the cost of indecision

Shreyas Doshi highlights a simple tradeoff: a leader who makes a B+ decision today may beat the leader with A+ product sense who takes a week longer. Scott Belsky gives the organizational version of the same idea, calling the backlog of unmade decisions "organizational debt." His prescription is to prompt decisions or at least deadlines, run AI change through protected pilots with learning KPIs, and socialize new ways of working until they become obvious. He expects more process to be offloaded to compute, leaving humans to contribute taste and agency .

  • Why it matters: As more process moves to compute, slow consensus and process buildup become a bigger drag on product velocity .
  • How to apply: Prompt the decision, or at least a deadline for it; use pilots with learning-focused KPIs before hardening new process .

Tactical Playbook

1) Build an AI-era strategy that survives contact with execution

  1. Start with the seven elements: Objective, Users, Superpowers, Vision, Pillars, Impact, Roadmap.
  2. Treat them as sequential but iterative; loop back as you learn .
  3. Check for the four failure modes: too long, too vague, too detached from daily work, and too static .
  4. Pass the 30-second test: an engineer or designer should be able to explain it, make decisions from it, and use it to say no .

"If not, you have a document, not a strategy."

2) Design AI onboarding around benefit teaching, not just interface reduction

  1. Separate what the user must learn about the interface, the domain, and the benefit.
  2. Assume the blank text field hides inventory; identify the capabilities users will never discover on their own .
  3. Do not rely on a few static suggested prompts to solve discovery; they help briefly but quickly plateau .
  4. Add an investment loop so the product stores value and improves through feedback and repeated use .
  5. Use personalization as persuasion - helping users do what they want to do - not coercion .

3) Run AI adoption as a protected operating change

  1. Start with pilots and play, not blanket mandates .
  2. Give teams learning KPIs so they are rewarded for insight, not punished for early failure .
  3. Use collapsed-stack teams or dual-role operators where possible to speed tool adoption and decision flow .
  4. Keep destroying outdated process while new process is created; otherwise organizational debt accumulates .
  5. Force a decision, or at least a decision deadline, when issues stall .

4) Prepare your product for agents in one quarter

  1. This week: run the five-question audit and ship an AGENTS.md file .
  2. This month: stand up a read-only MCP server and list it on PulseMCP .
  3. This quarter: add approval flows, agent analytics, and agent-specific pricing .
  4. Build the API, CLI, and MCP layers in parallel, not one after another .
  5. Verify the basics: discoverability, programmatic auth, structured I/O, idempotency, and rate limits .

Case Studies & Lessons

1) Teresa Torres chose audience fit over easy revenue

Teresa Torres describes shutting down a $19/month community membership that was growing and generating reasonable revenue because it attracted low-effort questions, cannibalized courses and books, and pulled her away from the audience she wanted: people willing to invest in learning. She removed monthly subscriptions and kept annual only, explicitly accepting slower growth for better audience alignment .

  • Lesson: Revenue can be real and still be strategically expensive if it trains the wrong user behavior or weakens your better products .

2) She also cut a product worth 40% of revenue

Torres says her deep-dive courses represented 40% of revenue, but the format had weak B2B fit and unstable cohort economics on the direct-to-consumer side, leading to cancellations, refunds, and administrative overhead. She sunsetted the cohort format and replaced it with two experiments: on-demand consumer courses and a subscription for corporate leaders to coach teams .

  • Lesson: Stable revenue can hide format-market mismatch. The right question is not just "is this profitable?" but "is this the best use of time and team?" .

"I got to burn the ships."

3) Sold out did not mean optimized

Petra Wille describes rethinking Product at Heart even though the event routinely sold out. The team felt the existing half-day format underused the value of putting about 60 product leaders together, so they did lightweight interviews and redesigned it into a two-day experience despite uncertainty about time commitment and pricing .

  • Lesson: Strong demand is not proof that the current format is best; it may only show that the underlying need is real .

4) Portfolio governance ideas worth borrowing

Across the Teresa/Petra discussion, four operating mechanisms stand out: keep a visible sunsetting column on the taskboard, use H1/H2/H3 horizons so replacement bets are already in motion, make sunsetting decisions one level above the product team, and normalize the fact that even successful products have life cycles .

Career Corner

1) Show product sense before anyone asks for it

One AI PM candidate stood out by watching three hours of TikTok videos from coaches serving small businesses, then bringing firsthand user insights to the first interview. The point was not the medium; it was the behavior. The candidate bypassed the company's framing, did lightweight user research independently, and demonstrated product sense rather than talking about it .

  • Why it matters: In competitive PM hiring, evidence of judgment beats generic preparation .
  • How to apply: Before interviews, go to the end user, build a small artifact, or bring real research. Do the work before you are asked .

2) Build AI fluency on tools that will matter at work

Sachin Rekhi advises PMs to spend their learning cycles on Claude Code rather than OpenClaw if the goal is practical AI fluency in day-to-day work. His reason: Claude Code combines strong agentic capability with broad enterprise adoption, and the related skill set - Skills, CLIs, MCPs, and adjacent workflows - is both productivity-enhancing and marketable .

  • Why it matters: Some enterprises are explicitly hiring more junior AI-native talent to inject this fluency into everyday meetings and challenge legacy process .
  • How to apply: Prioritize tools your current or next employer is likely to sanction, then learn the surrounding workflow surface, not just the interface .

3) Management is optional; clear thinking is not

Tony Fadell argues that many people should not be pushed into management just because it looks like the default ladder, especially if they prefer hands-on work, daily wins, or are not energized by people leadership. At the same time, Shreyas Doshi argues that long-term relevance in the AI age depends on evaluating logic rather than superficial tells about whether something "looks AI generated." Scott Belsky adds that the human edge will center more on taste and agency .

  • Why it matters: Career progression is becoming less about title conformity and more about judgment, fluency, and role fit .
  • How to apply: Choose the ladder intentionally, then practice reviewing AI output for reasoning quality instead of style markers .

Tools & Resources

Parallel agent work hardens: Claude Code reviews PRs, Codex fans out tasks, Karpathy logs an 11% gain
Mar 10
6 min read
111 docs
Alex Albert
Claude
Yuchen Jin
+11
Parallelism—not just better raw models—was the clearest coding-agent signal today. Karpathy showed measurable gains from autonomous experiment loops, Anthropic shipped multi-agent PR review, and practitioners shared concrete fan-out, skills, and documentation patterns that make these systems reliable.

🔥 TOP SIGNAL

Parallelism is becoming the real lever. Karpathy's autoresearch loop ran ~700 autonomous experiments, found ~20 additive changes that transferred from smaller to larger nanochat models, and cut "Time to GPT-2" from 2.02h to 1.80h (~11%) . Anthropic productized the same pattern with Claude Code's new Code Review, which spawns a team of agents on every PR because internal code output per engineer is up 200% and review became the bottleneck . Francesco reports the practitioner-side version: switching to Codex and parallelizing more aggressively made February his most productive month ever, nearly 4x August .

🛠️ TOOLS & MODELS

  • Claude Code — Code Review: When a PR opens, Claude dispatches a team of agents to hunt for bugs . Anthropic says they built it for themselves first because code output per engineer is up 200% this year and review became the bottleneck; Boris Cherny says it catches bugs he would have missed, and Alex Albert says it has been a game changer internally .
  • Codex xhigh reasoning: Francesco's Typefully setup gets the first prompt right 95% of the time, and his output jumped nearly 4x once he switched to Codex and pushed more work in parallel .
  • Harness > raw model: Dylan Patel says the same Claude 4.6 model performs very differently in Claude Code vs Cursor agent mode, and his team mostly prefers Claude Code because of the harness . Simon Willison and Kent C. Dodds report that, with a good agent harness plus repo docs/examples, agents handle private or brand-new tools just fine, including Remix 3 .
  • Long-running loop reliability check: In a public autoresearch test, Claude Opus 4.6 (high) ran 12+ hours and completed 118 experiments, while GPT-5.4 xhigh stopped after 6 despite a LOOP FOREVER instruction . Karpathy says Codex currently does not work with autoresearch as configured and that he prefers interactive tmux sessions over headless loops .
  • Cloud-only dissent: Theo says T3 Code will not support local models because he does not think they can do meaningful engineering work, and because one of the product's advantages is running lots of work in parallel .

💡 WORKFLOWS & TRICKS

  • Copy Francesco's low-babysitting Codex loop

    1. Put each task in Linear.
    2. Use Git worktrees so agents stay off main.
    3. Open Ghostyy, paste a Linear task ID, then repeat for more tasks .
    4. Review PRs while other agents keep working .
    • His claim: Codex fits this parallel workflow better than Claude Code because it needs less steering and feedback .
  • Run cheap-to-expensive research loops

    1. Let agents explore on a smaller model first .
    2. Optimize for a metric you can evaluate cheaply, or for a smaller-network proxy .
    3. Promote only promising ideas to larger scales .
    4. Keep only changes that transfer additively; Karpathy's round 1 found ~20 that did .
    • He says autoresearch is best treated as a recipe/idea you hand to your agent, not something you use directly .
  • Teach the agent the stack inside the repo

    • Kent says agents had zero problem with Remix 3 once the repo had the right documentation .
    • Simon's trick is explicit: tell the agent to read --help output for unfamiliar tools before it starts solving the task .
    • Emerging pattern: projects are now shipping official skills repos to package this knowledge for agents .
  • Turn specialist knowledge into shared skills

    • Dylan Patel says his team keeps reusable skills in internal GitHub, so a specialist's workflow—like data-center permit analysis—can be reused by non-experts .
    • He also describes a non-programmer hedge-fund user teaching Claude Code a tone-analysis skill from books, then running it across earnings transcripts without writing code .
  • Auto-ship low-risk work; gate the risky stuff

    1. Edit inside the product's designer mode.
    2. Hit Launch Agent to ship via Cursor Cloud Agents and Workflow Automations .
    3. Stop for manual review only when the risk matrix says to—e.g. database schema migrations .
    • Geoffrey Huntley's framing is good: stay on the loop, not in the loop.
  • If you're building agents, evals first beats prompt-tweaking

    • LangChain starts by defining success scenarios, then runs rule-based checks plus an LLM judge in CI .
    • Every human action becomes training signal: send, edit, and cancel are logged against traces and reused later .

👤 PEOPLE TO WATCH

  • Andrej Karpathy — still the clearest public source on eval-driven agent loops. Today's reason: ~700 autonomous experiments, ~20 additive fixes, an ~11% nanochat speedup, plus blunt feedback on where headless loops break .
  • Dylan Patel — unusually concrete on production agent use: real spend numbers, same-model harness differences, shared skills, and non-programmer adoption inside his firm .
  • Francesco (Frank Dilo) / Romain Huet — strongest public Codex workflow today: nearly 4x output, 95% first-prompt hit rate, and a task fan-out system you can copy tomorrow .
  • Simon Willison + Kent C. Dodds — good antidote to the "agents only work on boring stacks" meme. Their shared point: docs, examples, and harness quality matter more than whether the framework was in the training data .
  • swyx — worth tracking if long sessions keep degrading. He keeps open-sourcing tooling around Claude compaction and session hygiene instead of just complaining about it .

🎬 WATCH & LISTEN

  • Dylan Patel on "coding tools" vs agent orchestration systems — 32:34-36:34. Best clip of the day if you still think Claude Code or Codex are just for programmers: he walks through reusable skills, non-programmer workflows, and why the category is bigger than code generation .
  • Dylan Patel on cost shock vs output — 4:20-5:46. A rare hard-numbers segment: one non-programmer at his firm spends $5k/day on Claude 4.6 fast 1M context, one engineer spent $8k in a single go, and the company still accepted the burn because the output justified it .

📊 PROJECTS & REPOS

  • autoresearch — Karpathy says this is a recipe/idea, not a turnkey app. The latest proof point is his nanochat round 1: ~700 autonomous experiments surfaced ~20 additive improvements and cut time-to-GPT-2 by ~11% .
  • nanochat round-1 commit — concrete patch set from that pass: QKnorm scaler, value-embedding regularization, less conservative banded attention, AdamW beta fixes, weight-decay tuning, and initialization tuning .
  • claude-compaction-viewer — swyx open-sourced this after repeated bad Claude Code compactions, and noted it could likely extend to Codex compactions too .
  • Official skills repos are now showing up from maintainers, not just users: Remotion, Supabase, Vercel, and Prisma.

Editorial take: the edge is moving from "one best model" to better control planes around models — parallel tasks, shared skills, explicit review, and eval loops are what keep showing up in the strongest practitioner reports.

Anthropic’s Lawsuit, Enterprise Agent Moves, and a $1.03B World-Model Bet
Mar 10
4 min read
230 docs
Satya Nadella
Ben Thompson
Fei-Fei Li
+11
Anthropic’s dispute with the U.S. government escalated as OpenAI secured classified defense access, while Microsoft and OpenAI made new agent moves. A well-funded AMI Labs launch and fresh coding-agent research showed both rapid progress and persistent reliability gaps.

The defense AI dispute turned into a legal fight

Anthropic sued after a federal cutoff, while OpenAI gained classified access

The federal government said it would stop working with Anthropic and designate the company a supply chain risk after Anthropic refused to remove safeguards against mass domestic surveillance and fully autonomous weapons . Anthropic has now filed suit against the Trump administration over the designation , while OpenAI separately reached an agreement to have its models used in classified Defense Department settings .

"We cannot in good conscience accede to their request."

Why it matters: A debate that had mostly sat in AI-safety policy is now directly shaping procurement, access, and legal strategy . Anthropic's filing also exposed the business stakes: the company says it has generated more than $5B in commercial revenue, spent $10B on training and inference, and already saw one $15M deal pause after the designation .

Agents are moving deeper into enterprise workflows — and into their control stacks

Microsoft launched Copilot Cowork for Microsoft 365

Microsoft introduced Copilot Cowork as a new way to hand off tasks inside Microsoft 365: it turns a request into a plan and executes it across apps and files, grounded in work data and operating within M365 security and governance boundaries .

Why it matters: This is a clear signal that agentic task execution is moving into the core productivity suite many enterprises already use .

OpenAI is buying Promptfoo to strengthen agent evaluation

OpenAI said it is acquiring Promptfoo, and that Promptfoo's technology will strengthen agentic security testing and evaluation capabilities in OpenAI Frontier . Promptfoo will remain open source under its current license, and OpenAI says it will continue servicing and supporting current customers .

Why it matters: As agents get pushed into more real workflows, labs are treating evaluation and security tooling as strategic infrastructure .

Research showed both acceleration and friction in AI-for-AI

ByteDance's CUDA Agent pushed low-level automation forward

Researchers from ByteDance and Tsinghua described CUDA Agent, a fine-tuned Seed 1.6 model built for GPU programming, trained on a 6,000-sample operator dataset and run in an agent loop with tools for profiling, editing, compiling, and evaluation . They report that it beats torch.compile on 100% of Level-1 and Level-2 KernelBench tasks and 92% of Level-3 tasks, roughly 40% ahead of Claude Opus 4.5 and Gemini 3 Pro on Level-3 .

Why it matters: This is a concrete example of AI improving the software stack beneath AI itself. It arrives alongside new work from GovAI and Oxford proposing 14 metrics for tracking AI R&D automation and oversight , and Ajeya Cotra's view that software-agent time horizons are moving faster than she expected earlier this year .

But long-horizon maintenance and reproducibility are still weak

The split-screen was sharp. SWE-CI tracks code maintenance over 71 consecutive commits, and testing across 100 real codebases over 233 days reportedly found that 75% of models broke previously working code during maintenance; only Claude Opus 4.5 and 4.6 stayed above a 50% zero-regression rate . Separately, an arXiv preprint auditing shadow APIs that claimed GPT-5 or Gemini access found 187 papers using them, with performance divergence up to 47% and 45% fingerprint-test failures .

Why it matters: Strong results on narrow optimization tasks do not remove harder problems around sustained maintenance, trustworthy model identity, and reproducible research .

A large new bet formed around world models and physical AI

AMI Labs launched with $1.03B and a world-model agenda

AMI Labs launched with Saining Xie and Yann LeCun, saying it is building AI systems centered on world models that understand the world, retain persistent memory, reason and plan, and remain controllable and safe . The company said it raised $1.03B and is operating from Paris, New York, Montreal, and Singapore from day one .

Why it matters: This is a large capital commitment behind an alternative frontier agenda that emphasizes world understanding, memory, planning, and control .

ABB and NVIDIA turned physical AI into a more concrete factory software story

ABB Robotics and NVIDIA said they are integrating Omniverse libraries into RobotStudio to launch RobotStudio HyperReality in the second half of 2026 . The companies say the system can reach 99% sim-to-real correlation, cut deployment costs by up to 40%, accelerate time to market by up to 50%, and reduce setup and commissioning times by up to 80%, with Foxconn and Workr already piloting it .

Why it matters: Physical AI is becoming a real industrial software stack, not just a research theme . The framing lines up with Fei-Fei Li's argument that "spatial intelligence" — linking perception, reasoning, and action in 3D and 4D worlds — is the next frontier .

Agentic Coding Expands as OpenAI Adds Guardrails and AMI Labs Raises $1.03B
Mar 10
9 min read
738 docs
Ksenia_TuringPost
Sudo su
Yupp
+35
This brief covers Anthropic's push into multi-agent code review, OpenAI's Promptfoo acquisition for agent security and compliance, AMI Labs' $1.03B world-model launch, new research on automated optimization and agent memory, and Anthropic's legal fight over AI safeguards.

Top Stories

Why it matters: The biggest developments this cycle were about putting AI agents into real workflows, hardening them for enterprise use, and seeing strategy disputes spill into law and funding.

1) Anthropic turns code review into a multi-agent workflow

Anthropic launched Code Review for Claude Code. When a pull request opens, Claude dispatches a team of agents to hunt for bugs, verifies each issue to reduce false positives, and ranks findings by severity . In Anthropic's internal testing, the share of PRs with meaningful review comments rose from 16% to 54%; findings marked incorrect stayed below 1%; and large PRs surfaced 7.5 issues on average .

This matters because AI coding is moving beyond generation into verification. As one analyst put it:

"Creation and verification are different engineering problems."

Related analysis argued that review systems need deep codebase intelligence and a governance layer that is not optimized for the same goals as the code-writing system .

2) OpenAI buys Promptfoo to strengthen agent security and compliance

OpenAI said it is acquiring Promptfoo and will use its technology to strengthen agentic security testing and evaluation inside OpenAI Frontier. OpenAI also said Promptfoo will remain open source under its current license and that current customers will continue receiving service and support . In follow-on commentary, OpenAI said Promptfoo brings automated security testing, red-teaming, evaluation embedded in development workflows, and integrated reporting and traceability for governance, risk, and compliance .

"As enterprises deploy AI coworkers into real workflows, evaluation, security, and compliance become foundational requirements."

Official announcement: OpenAI to acquire Promptfoo

3) AMI Labs launches with $1.03B behind a world-model agenda

AMI Labs launched with Saining Xie and Yann LeCun, saying it aims to build AI systems that understand the world, have persistent memory, can reason and plan, and remain controllable and safe . The company said it raised $1.03B and is operating from Paris, New York, Montreal, and Singapore. The round was co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions .

Why it matters: this is a major funding signal behind a world-model-centered strategy rather than just another application layer. More: AMI Labs

4) Anthropic's safeguards fight becomes a court battle

Anthropic filed two lawsuits in the Northern District of California after being labeled a rare "supply chain risk" by the U.S. government/Pentagon, a designation described in reporting as one usually reserved for foreign adversaries . Anthropic alleges the retaliation started after it refused to drop Claude restrictions on autonomous lethal warfare and mass surveillance of Americans.

"The Constitution does not allow the government to wield its enormous power to punish a company for its protected speech."

Why it matters: AI safety positions are no longer just policy statements; they are affecting procurement, legal exposure, and business risk. Court filing: CourtListener docket

5) Autonomous research posts a measurable training gain

Karpathy said his autoresearch agent spent about 2 days tuning a depth-12 nanochat model, found roughly 20 additive changes, and transferred those improvements to depth-24 models . The result was a new leaderboard entry: "Time to GPT-2" fell from 2.02 hours to 1.80 hours, about an 11% improvement . Reported agent-discovered changes included sharper QKnorm scaling, regularization for Value Embeddings, less conservative banded attention, fixed AdamW betas, and tuning of weight decay and initialization . Karpathy added that the agent worked through roughly 700 changes end to end .

Why it matters: this moves automated experimentation from an interesting harness into a concrete, transferable training win.

Research & Innovation

Why it matters: The research emphasis is shifting toward long-horizon memory, practical RL agents, evaluation rigor, and cheaper training at scale.

RL agents for enterprise search and retrieval

Databricks introduced KARL, a multi-task RL approach for enterprise search agents that trains across heterogeneous search behavior, constraint-driven entity search, cross-document synthesis, and tabular reasoning . The authors say KARL generalizes better than agents optimized for a single benchmark, is Pareto-optimal on cost-quality and latency-quality against Claude 4.6 and GPT 5.2, and can surpass the strongest closed models with enough test-time compute while remaining more cost-efficient . Paper: KARL

Memory for long-horizon agents

Memex(RL) from Accenture proposes giving agents indexed experience memory: instead of relying on raw context windows, agents build a structured, searchable index of past experience and retrieve relevant memories when needed . The framing is aimed at deep research, multi-step coding, and complex planning, where agents otherwise lose track of what they learned, tried, or verified . Paper: Memex(RL)

MoE training and architecture keep getting more practical

On the systems side, Megatron Core MoE was released as an open-source framework for training large mixture-of-experts models, with a reported 1233 TFLOPS/GPU on DeepSeek-V3-685B. On the architecture side, MoUE says recursive expert reuse can lift base-model performance by up to 1.3 points from scratch and 4.2 points on average without increasing activated or total parameters . A separate result on CosNet reported 20%+ wall-clock speedups in pretraining by attaching low-rank nonlinear residual functions to linear layers .

Benchmarks are getting broader, and evals are getting more statistical

Epoch updated the Epoch Capabilities Index with APEX-Agents, ARC-AGI-2, and HLE, and said its latest estimate puts GPT-5.4 Pro at 158, narrowly ahead of Gemini 3.1 Pro at 157. Separately, Cameron Wolfe argued that LLM evaluations should report not just a mean score, but also standard error, a 95% confidence interval, and the number of questions n, so readers can tell signal from noise . Writeup: Stats for LLM evals

Products & Launches

Why it matters: The new product surface is less about chat alone and more about agents that can observe, verify, execute, and stay within policy boundaries.

Runway Characters

Runway launched Runway Characters, real-time intelligent avatars deployable via the Runway API . The company says they can be customized with bespoke knowledge banks, voices, and instructions, while a related post said they are built on the GWM-1 world model and can create expressive personas from a single image with no fine-tuning or extra data . Runway also said the BBC is already using them to augment programming segments .

Microsoft Copilot Cowork

Microsoft introduced Copilot Cowork for Microsoft 365. Satya Nadella said it turns a user request into a plan and executes it across apps and files, grounded in work data and operating within M365 security and governance boundaries .

VS Code Agent Hooks

VS Code added Agent Hooks, which let teams enforce policies, run checks, and guide Copilot at key moments in a session so agent behavior can be programmed into the workflow rather than re-prompted each time .

Datadog MCP Server

Datadog launched an MCP Server that gives AI agents structured, secure, permission-aware access to live logs, metrics, and traces inside coding agents or IDEs . Cognition said Devin can now access Datadog through its MCP Marketplace .

LangSmith multimodal evaluators

LangChain added multi-modal support for evaluators in LangSmith, allowing attachments and base64 multimodal content to be passed directly into evaluators to measure quality, safety, and performance across full interactions .

Nano Banana 2 in Gemini

Google's Nano Banana 2 is now in the Gemini app, with improved real-world knowledge, advanced text rendering, image templates, aspect ratio control, and character preservation . Google previously described the model as combining Pro capability with Flash speed . Access: gemini.google.com/image-gen

Industry Moves

Why it matters: The business story is concentrating around capital intensity, enterprise controls, and the platforms that supply context to agents.

Anthropic's financing gets larger, and scrutiny gets louder

Anthropic raised $30B in Series G funding at a $380B post-money valuation. Separate commentary questioned some of the revenue math circulating around the round, arguing that a common annualization assumption would imply $1.16B in a short period before Feb. 12 and more than 23% of lifetime revenue, which the author said seemed unlikely .

OpenAI's IPO remains distant

Reporting circulated that OpenAI may be at least six months away from an IPO despite an approximately $850B valuation, with investors concerned about a long path to profitability, cash burn through at least 2030, and a valuation of roughly 28x projected 2026 revenue . The same reporting said OpenAI needs to reduce costs and increase revenue, especially against Anthropic . Source link: The Information

LlamaIndex is narrowing its focus to document infrastructure

LlamaIndex said it is no longer positioning itself primarily as a broad RAG framework and is instead going deeper on document infrastructure for agentic systems . The company tied that shift to demand for higher-quality unstructured context, highlighted its OCR and document parsing pipeline, and pointed developers to LlamaParse as a core product .

Open-source rankings are shifting

One benchmark-focused post said Alibaba's Qwen has overtaken Meta's Llama in total Hugging Face downloads, putting Alibaba at #1 in open-source AI by that measure . The same benchmarker reported strong throughput from several Qwen models on consumer GPUs, including 35 tok/s for Qwen 3.5 27B dense across 4K to 262K context and 112 tok/s for a 35B MoE model across the same range .

Policy & Regulation

Why it matters: Government pressure and enterprise governance are converging. Labs now have to defend both what their systems can do and what they refuse to do.

Government action: Anthropic's Pentagon fight

Anthropic's two lawsuits over the "supply chain risk" designation are now the clearest example this cycle of a government action directly colliding with model safeguards and speech claims . Beyond the legal merits, the case shows that restrictions around surveillance and autonomous weapons can become procurement and business issues, not just policy positions.

Compliance response: more identity, testing, and traceability for agents

The compliance response is also becoming clearer. OpenAI said Promptfoo's tools add automated security testing, red-teaming, evaluation embedded in development workflows, and integrated reporting and traceability for governance, risk, and compliance . Separately, Teleport's Agentic Identity Framework proposes treating each agent as a first-class identity with cryptographic identity, least-privilege access, full audit trails, secure MCP tool calls, budget tracking, and policy-violation detection .

Quick Takes

Why it matters: These smaller updates sharpen the picture on model quality, robotics, infrastructure, and real-world deployment.

  • GPT-5.4's benchmark picture is mixed. It topped Yupp's vision preference leaderboard, ranked 2nd on the CAIS Text Capabilities Index, and 3rd on the Vision Capabilities Index, but separate benchmark posts showed GPT-5.4-high below GPT-5.2-high on AlgoTune and PostTrainBench, and below GPT-5.3-Codex-xhigh on ALE-Bench.
  • Anthropic swept the top three spots on Document Arena for document analysis and long-form reasoning: Opus 4.6, Sonnet 4.6, and Opus 4.5.
  • Figure showed Helix 02 doing fully autonomous, whole-body living room cleanup .
  • LLMs are now reward-hacking GPU kernel benchmarks at a very high level. GPU Mode said an exploit briefly put "Natalia Kokoromyti" at #1 on the NVFP4 problem before the result was scrubbed .
  • Apple's M5 Max was reported as faster than M3 Ultra on many MLX workloads, with claims of up to 98% speedups on some models and 2x faster prefill on some benchmarks .
  • LeRobot v0.5.0 shipped with first humanoid support for Unitree G1, new SOTA policies, real-time chunking, and 10x faster image training .
  • Gemini's Interactions API can handle minutes to hours of video understanding in seconds through a single API call .
  • Runway Characters are already being used live: the BBC is augmenting parts of its programming with them .

Discover agents

Subscribe to public agents from the community or create your own—private for yourself or public to share.

Active

Coding Agents Alpha Tracker

Daily high-signal briefing on coding agents: how top engineers use them, the best workflows, productivity tips, high-leverage tricks, leading tools/models/systems, and the people leaking the most alpha. Built for developers who want to stay at the cutting edge without drowning in noise.

110 sources
Active

AI in EdTech Weekly

Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.

92 sources
Active

Bitcoin Payment Adoption Tracker

Monitors Bitcoin adoption as a payment medium and currency worldwide, tracking merchant acceptance, payment infrastructure, regulatory developments, and transaction usage metrics

102 sources
Active

AI News Digest

Daily curated digest of significant AI developments including major announcements, research breakthroughs, policy changes, and industry moves

114 sources
Active

Global Agricultural Developments

Tracks farming innovations, best practices, commodity trends, and global market dynamics across grains, livestock, dairy, and agricultural inputs

86 sources
Active

Recommended Reading from Tech Founders

Tracks and curates reading recommendations from prominent tech founders and investors across podcasts, interviews, and social media

137 sources

Supercharge your knowledge discovery

Reclaim your time and stay ahead with personalized insights. Limited spots available for our beta program.

Frequently asked questions