# White House Slows GPT-5.6 as Agents Move Deeper Into Work

*By AI News Digest • June 26, 2026*

A reported White House intervention in GPT-5.6 access made frontier-model governance the day’s central story. Elsewhere, agents became more operational, new research showed reliability is still the bottleneck, and capital continued flowing to world models and open infrastructure.

## Frontier access becomes a live policy lever

### White House reportedly slows GPT-5.6 and moves to customer-by-customer approvals

The Trump administration reportedly asked OpenAI to stagger GPT-5.6 over security concerns, with access approved customer by customer during a preview period. Gary Marcus and Nathan Lambert both argued the larger issue is the emergence of opaque, ad hoc frontier-model governance without transparent criteria or clear next steps. Ethan Mollick added that the US government could effectively block open-weight models at the company level even if individuals could still download weights, and Marc Andreessen agreed. [^1][^2][^3][^4][^5][^6][^7]

> "What we really need is a bipartisan committee—with transparent criteria and the judgements of independent scientists—and not just snap judgements from the White House." [^3]

**Why it matters:** The decision landed the same day 35 nations signed a "pro-growth, pro-innovation" AI statement centered on more energy, compute, chips, talent, and private investment, making the tension between capacity-building rhetoric and opaque access controls unusually visible. [^8][^9]

## Agents move deeper into real work

### Gemini gets native computer use as Codex turns into a background worker

Google DeepMind says Gemini 3.5 Flash now supports native computer use, letting developers build agents that can see and act across browser, mobile, and desktop interfaces. OpenAI says Codex is already changing work across the company by handling more complex, longer-running, and cross-functional tasks, and a new DigitalOcean plugin lets it spin up persistent cloud development environments that keep working after the user steps away. Jeff Dean also said 75% of code at Google is now written by agents and coding models, up from 50% last year, with models now able to write modules and tests for multi-hour tasks autonomously. [^10][^11][^12][^13]

**Why it matters:** The center of gravity is moving from chat interfaces to agents that can act inside software, persist across sessions, and take on longer-running work. [^10][^11][^12][^13]

### The harder problem now is operating agents, not just demoing them

LangChain CEO Harrison Chase described the lifecycle top organizations are using to ship reliable agents at scale as build, test, deploy, monitor, and govern, with traces at the center of understanding and improving behavior. In Thomas Wolf's week-long open experiment, more than 100 agents collaborating on Gemma 4 inference speed achieved a 5x improvement, but they also had to invent shared playbooks, quota-pooling norms, and self-policing against private side channels and invalid verification shortcuts. [^14][^15]

**Why it matters:** Durable execution, evaluation, observability, human approval, and governance are becoming part of the agent product itself rather than back-office plumbing. [^14]

## Reliability and safety stay central

### A large document-Q&A study says hallucinations still worsen with long context

A study covering 172B tokens found that no tested model fully avoided fabrication in document-based question answering: the best model still hallucinated 1.19% of the time at 32K context, strong models more typically landed around 5-7%, and at 200K context every model fabricated at least 10% of the time. The writeup argues this is not just a retrieval failure, because a model can find real facts and still answer too confidently when the requested fact is absent. [^16]

**Why it matters:** This helps explain why reliability is attracting dedicated capital. Scaled Cognition announced a $100M Series A focused on solving AI reliability, and Vinod Khosla said some applications "just can't afford hallucinations." [^17][^18]

### Bengio's "scientist AI" thesis is paired with a warning about self-reports

Yoshua Bengio said frontier-model training can produce emergent self-preservation, deception, and power-seeking, and argued that new mathematical results now make it possible to design neural nets with guarantees of good behavior. His Law Zero group is building "scientist AI": an honest predictive model trained to explain observations rather than pursue unchosen goals, and to act as a guardrail layer that can flag or block harmful agent behavior. [^19]

> "The takeaway: LLM self-reports should not be treated as context-free behavioral diagnostics." [^20]

A separate ICML oral paper found that self-report–behavior coherence in LLMs is selective, can reach human-level intention–behavior baselines when self-reports and behavior happen in the same conversation, and often collapses across separate conversations. [^20]

**Why it matters:** Both threads point in the same direction: safety work is shifting toward behavioral validation, monitoring, and independent guardrails rather than trusting what a model says about itself. [^19][^20]

## Capital and commercial traction

### Capital kept flowing to world models and open infrastructure

General Intuition raised a $320M Series A at a $2.3B valuation and said it is training large action foundation models on billions of action-labeled gameplay clips from Medal's 17M monthly active users to build world models and generate infinite training environments. Separately, Hugging Face crossed $100M annual run-rate while saying it still keeps the platform free and open-source for 97% of users and stores and serves hundreds of petabytes of models and datasets. [^21][^22]

**Why it matters:** The day's industry signals were not limited to closed frontier labs: one big bet targeted action and world models, while another showed that open model and dataset infrastructure can support a sizable business. [^23][^22]

---

### Sources

[^1]: [𝕏 post by @steph_palazzolo](https://x.com/steph_palazzolo/status/2070241787180966279)
[^2]: [𝕏 post by @amir](https://x.com/amir/status/2070240190338371994)
[^3]: [𝕏 post by @GaryMarcus](https://x.com/GaryMarcus/status/2070257533076582550)
[^4]: [𝕏 post by @natolambert](https://x.com/natolambert/status/2070288961449411057)
[^5]: [𝕏 post by @natolambert](https://x.com/natolambert/status/2070291713466519809)
[^6]: [𝕏 post by @emollick](https://x.com/emollick/status/2070308786032578766)
[^7]: [𝕏 post by @pmarca](https://x.com/pmarca/status/2070314415589847107)
[^8]: [𝕏 post by @UnderSecE](https://x.com/UnderSecE/status/2070220039622365642)
[^9]: [𝕏 post by @GaryMarcus](https://x.com/GaryMarcus/status/2070258844186669109)
[^10]: [𝕏 post by @GoogleDeepMind](https://x.com/GoogleDeepMind/status/2070180509523546481)
[^11]: [𝕏 post by @OpenAI](https://x.com/OpenAI/status/2070196105745518913)
[^12]: [𝕏 post by @OpenAIDevs](https://x.com/OpenAIDevs/status/2070261549391024403)
[^13]: [Behind Google's strategic bets in AI: A conversation with the hosts of Acquired](https://www.youtube.com/watch?v=BpnJYJmbXcM)
[^14]: [The Agent Development Lifecycle 101 by Harrison Chase](https://www.youtube.com/watch?v=5faasPgyxwE)
[^15]: [𝕏 post by @Thom_Wolf](https://x.com/Thom_Wolf/status/2070134136304517284)
[^16]: [𝕏 post by @rohanpaul_ai](https://x.com/rohanpaul_ai/status/2070193617554264542)
[^17]: [𝕏 post by @ScaledCognition](https://x.com/ScaledCognition/status/2070179898673537074)
[^18]: [𝕏 post by @vkhosla](https://x.com/vkhosla/status/2070189184036540726)
[^19]: [I Used to Think We Couldn't Control AI. New Math Changed My Mind - Yoshua Bengio](https://www.youtube.com/watch?v=4A43Y2fNtZo)
[^20]: [𝕏 post by @AnimaAnandkumar](https://x.com/AnimaAnandkumar/status/2070038153679847845)
[^21]: [𝕏 post by @gen_intuition](https://x.com/gen_intuition/status/2070177308539818005)
[^22]: [𝕏 post by @ClementDelangue](https://x.com/ClementDelangue/status/2070104323481104674)
[^23]: [𝕏 post by @vkhosla](https://x.com/vkhosla/status/2070190800546435347)