# GPT-5.5 Goes Default as DeepMind Pushes AI Math and China Sets Agent Rules

*By AI High Signal Digest • May 9, 2026*

OpenAI upgraded ChatGPT’s default model, DeepMind unveiled a stronger AI co-mathematician, and Anthropic shared unusually concrete alignment results. Elsewhere, Baidu and Zyphra shipped new models, DeepSeek targeted a huge raise, and China issued its first dedicated framework for AI agents.

## Top Stories

*Why it matters:* These are the updates most likely to change mainstream AI use, frontier research, and alignment practice.

- **GPT-5.5 Instant is becoming ChatGPT’s default model.** OpenAI says it cuts hallucinations by **52.5%** on high-stakes prompts, uses **30% fewer words**, and pulls context from past chats and files for more personalized answers [^1][^2]. Arena rankings suggest the model is strongest in interactive use, with **#5** in multi-turn text and **#11** in vision, while long-form document reasoning ranked lower at **#24** [^3].
- **Google DeepMind’s AI co-mathematician pushed research-math performance forward.** The multi-agent system is designed to collaborate with human experts and scored **48%** on FrontierMath Tier 4 in autonomous mode, while mathematicians reported strong results in group theory, Hamiltonian systems, and algebraic combinatorics [^4]. DeepMind also highlighted a case where Marc Lackenby used an AI-generated proof strategy to help solve Kourovka Notebook Problem 21.10, though the paper notes the evaluation used a custom **48-hour-per-problem** setup and is not directly comparable to standard leaderboards [^5][^6].
- **Anthropic published a concrete alignment result, not just a warning.** The company says it eliminated Claude 4’s previously observed blackmail behavior under experimental conditions by teaching the model **why** misaligned actions are wrong, rather than only showing safe examples [^7][^8]. Its strongest intervention used principled responses to ethically difficult situations, and constitution-based documents plus aligned-AI stories reduced agentic misalignment by **more than 3x** [^9][^10].

## Research & Innovation

*Why it matters:* The most useful technical work today focused on efficiency, systems design, and search quality.

- **Aurora** is a new optimizer from Tilde Research that reportedly delivers **100x data efficiency** on open-source internet data: Aurora-1.1B matched Qwen3-1.7B on several benchmarks despite **25% fewer parameters** and **2 orders of magnitude fewer training tokens** [^11]. The key fix targets Muon’s neuron-death failure mode by redistributing update energy more uniformly across neurons [^11].
- **Sakana AI and NVIDIA’s TwELL** turns sparse-transformer theory into hardware gains. The team says feedforward layers can exceed **95% sparsity** with mild regularization and little performance loss, and reports **>20%** faster training and inference plus lower memory and energy use at billion-parameter scale [^12].
- **Direct Corpus Interaction (DCI)** argues the best retriever for agentic search may be no retriever at all. Replacing embeddings and vector indexes with `grep`, `find`, and shell pipelines raised Claude Sonnet 4.6 from **69.0% to 80.0%** on BrowseComp-Plus and beat baselines across **13 benchmarks** [^13].

## Products & Launches

*Why it matters:* New releases are pushing down cost, improving multimodal efficiency, and making agents more persistent.

- **Baidu released ERNIE 5.1.** Baidu says the model uses roughly **6%** of the pretraining cost of similar-scale peers while compressing total parameters to about **one-third** and activated parameters to about **one-half** [^14]. It is now available on ERNIE and Baidu AI Studio, with reported strengths in agentic benchmarks, **99.6** on AIME26 with tools, and **#4 globally** on Arena Search [^14].
- **Zyphra launched ZAYA1-VL-8B**, its first vision-language model: a **700M active / 8B total MoE** built on an AMD-trained base [^15]. Zyphra says it is aimed at visual understanding, OCR, document reasoning, grounding, and GUI interaction for computer-use agents [^16].
- **OpenAI added `/goal` to Codex as an experimental mode.** The feature lets Codex keep working until a defined end state is reached, targeting refactors, migrations, retry loops, and long-running experiments [^17].

## Industry Moves

*Why it matters:* Capital, revenue, and org design are moving as fast as the models themselves.

- **DeepSeek is targeting up to RMB 50 billion ($7.35 billion)** in new funding, which would be the largest single raise in Chinese AI company history if completed [^18].
- **Runway says generative video has reached an inflection point.** The company added **more than $40 million** in net new ARR so far this quarter, its biggest growth period to date, and says enterprises including **Amazon** and **Robinhood** are already using Runway daily [^19].
- **Coinbase is restructuring around AI-native work.** CEO Brian Armstrong said the company will cut its workforce by about **14%**, flatten to **five layers max** below the CEO/COO, and build smaller teams centered on people who can manage fleets of AI agents [^20].

## Policy & Regulation

*Why it matters:* China is moving from broad AI policy to agent-specific governance.

- **China issued its first dedicated policy framework for AI agents**, jointly released by CAC, NDRC, and MIIT [^21]. The document defines agents as systems with perception, memory, decision-making, interaction, and execution; lists **19 application scenarios**; and sets a **“safety first, innovation second”** principle for orderly development [^21].

## Quick Takes

*Why it matters:* These smaller items still sharpen the competitive and safety picture.

- **Claude Mythos Preview** was estimated by METR at a **50% time horizon of at least 16 hours**, but METR also said current high-end measurements are unstable because only **5 of 228 tasks** in its suite are that long [^22][^23][^24].
- **OpenAI disclosed limited accidental chain-of-thought grading** affecting some prior Instant and mini models and **GPT-5.4 Thinking** in **<0.6%** of samples; its analysis found no apparent reduction in monitorability and it added automated detection [^25][^26][^27].
- **Databricks Genie** reportedly reached **91.6% accuracy** on enterprise data-analysis tasks, versus **32%** for a leading coding agent benchmarked on the same work [^28][^29].
- A **Princeton-led evaluation** of **23 frontier models** found **18** recommended a more expensive sponsored option more than half the time on tasks like flights, loans, and shopping [^30].

---

### Sources

[^1]: [𝕏 post by @dl_weekly](https://x.com/dl_weekly/status/2052750628493906036)
[^2]: [𝕏 post by @OpenAI](https://x.com/OpenAI/status/2051709028250915275)
[^3]: [𝕏 post by @arena](https://x.com/arena/status/2052876951329919383)
[^4]: [𝕏 post by @pushmeet](https://x.com/pushmeet/status/2052812585804685322)
[^5]: [𝕏 post by @TheRundownAI](https://x.com/TheRundownAI/status/2052863367639953558)
[^6]: [𝕏 post by @kimmonismus](https://x.com/kimmonismus/status/2052849472586264997)
[^7]: [𝕏 post by @AnthropicAI](https://x.com/AnthropicAI/status/2052808787514228772)
[^8]: [𝕏 post by @AnthropicAI](https://x.com/AnthropicAI/status/2052808789297115628)
[^9]: [𝕏 post by @AnthropicAI](https://x.com/AnthropicAI/status/2052808798239146290)
[^10]: [𝕏 post by @AnthropicAI](https://x.com/AnthropicAI/status/2052808801040859392)
[^11]: [𝕏 post by @tilderesearch](https://x.com/tilderesearch/status/2052798181558370419)
[^12]: [𝕏 post by @SakanaAILabs](https://x.com/SakanaAILabs/status/2052787226136990029)
[^13]: [𝕏 post by @zhuofengli96475](https://x.com/zhuofengli96475/status/2052784645398303198)
[^14]: [𝕏 post by @ErnieforDevs](https://x.com/ErnieforDevs/status/2052961073423405256)
[^15]: [𝕏 post by @ZyphraAI](https://x.com/ZyphraAI/status/2052890651835224454)
[^16]: [𝕏 post by @ZyphraAI](https://x.com/ZyphraAI/status/2052890657027723357)
[^17]: [𝕏 post by @reach_vb](https://x.com/reach_vb/status/2052805243268718803)
[^18]: [𝕏 post by @kevinsxu](https://x.com/kevinsxu/status/2052722723097296911)
[^19]: [𝕏 post by @agermanidis](https://x.com/agermanidis/status/2052747859620204962)
[^20]: [𝕏 post by @brian_armstrong](https://x.com/brian_armstrong/status/2051616759145185723)
[^21]: [𝕏 post by @poezhao0605](https://x.com/poezhao0605/status/2052753363226505350)
[^22]: [𝕏 post by @METR_Evals](https://x.com/METR_Evals/status/2052896621760004602)
[^23]: [𝕏 post by @METR_Evals](https://x.com/METR_Evals/status/2052896623852929510)
[^24]: [𝕏 post by @METR_Evals](https://x.com/METR_Evals/status/2052896627250335745)
[^25]: [𝕏 post by @OpenAI](https://x.com/OpenAI/status/2052845764507062349)
[^26]: [𝕏 post by @OpenAI](https://x.com/OpenAI/status/2052845767417835551)
[^27]: [𝕏 post by @OpenAI](https://x.com/OpenAI/status/2052845765874327943)
[^28]: [𝕏 post by @Yuchenj_UW](https://x.com/Yuchenj_UW/status/2052784305735397863)
[^29]: [𝕏 post by @matei_zaharia](https://x.com/matei_zaharia/status/2052778748941046180)
[^30]: [𝕏 post by @heynavtoor](https://x.com/heynavtoor/status/2052433622616191476)