# Math Breakthroughs, Google’s Scale, and the Agent Benchmark Shift

*By AI High Signal Digest • May 27, 2026*

Frontier models showed fresh scientific reasoning range, Google disclosed the scale of its Gemini distribution, and agentic coding competition moved closer to real-world workflows. This brief also covers new multimodal products, inference economics, and the latest policy signals.

## Top Stories

*Why it matters: today’s biggest signals were about scientific reasoning, distribution scale, and the shift from chat models to full agent systems.*

- **Frontier models posted another meaningful math result.** Claude Mythos solved the decades-old Erdős unit distance problem, and one account said it found a cleaner proof than the known OpenAI approach while running air-gapped, with no internet access [^1]. Sebastien Bubeck added that, with the right harness, both Mythos and GPT-5.5 can now reproduce the one-shot unit distance solution [^2]. The broader takeaway from today’s research discussion is that more capability is showing up through model-plus-harness design, not just new base models [^3].

- **Google showed the scale of its Gemini footprint.** Google later clarified that Gemini now has 900M monthly active users in the Gemini app alone, excluding AI Overviews and other Gemini-powered surfaces [^4]. One analysis of the company’s recent disclosures also noted token processing rising from 480T last year to 3.2 quadrillion now, with TPUs central to serving AI at that scale [^5].

- **Agentic coding is being measured more like real work.** DeepSWE launched as a benchmark meant to show where top models actually diverge in day-to-day developer workflows, and GPT-5.5 was cited as #1 on it [^6][^7]. Alibaba also shipped Qwen3.7-Max as a flagship for the Agent Era, with end-to-end coding, office workflows, 35-hour autonomy on a kernel task, and a #4 debut in Code Arena: Frontend [^8][^9].

## Research & Innovation

*Why it matters: the most interesting research today focused on memory, biology, and inference efficiency.*

- **Language Models Need Sleep proposed a low-latency path for long-horizon agents.** The idea is to periodically consolidate recent context through offline recurrent passes, write the result into persistent fast weights, and clear the KV cache; gains rose with longer “sleep” on deeper reasoning tasks [^10].

- **Carbon pushed open DNA modeling much closer to whole-genome scale.** The model is described as 275x faster than the previous state of the art at its size and can process the full human genome on a single GPU in under two days [^11]. Its tokenizer splits DNA into 6-base chunks while preserving single-base resolution [^11].

- **Shard targeted one of inference’s costliest bottlenecks.** On Llama-3.1-8B, it reported 10x KV-cache compression with zero quality loss, including 11.2x at 32K context and near-zero LongBench delta versus FP16 [^12].

## Products & Launches

*Why it matters: launches centered on easier content creation, more accessible open models, and higher-end image generation.*

- **Gemini Omni extended conversational editing to video.** Demonstrations showed users reshaping scenes, swapping objects, changing viewpoints, translating audio while keeping background music, and zooming into images while preserving character and physics consistency [^13][^14][^15][^16].

- **Cohere open-sourced Command A+.** Cohere called it its most powerful LLM yet, optimized to run on minimal hardware, while co-founder Nick Frosst framed the release as part of a push toward more empowering AI access [^17][^18].

- **Microsoft’s MAI-Image-2.5 entered the top tier of text-to-image.** The model debuted at #3 on the Arena leaderboard with a score of 1,254, a 72-point gain over MAI-Image-2, and marked the first time a lab outside Google DeepMind and OpenAI entered the top five [^19]. Public early access is live on Arena [^20].

## Industry Moves

*Why it matters: capital is flowing into inference and authenticity tooling even as model pricing keeps collapsing.*

- **Baseten is reportedly raising at a sharply higher valuation.** The inference provider is said to be raising $1B at an $11B valuation after growing from $200M to $600M ARR in Q1 [^21].

- **DeepMind expanded SynthID into a broader industry coalition.** It said SynthID has watermarked more than 100B pieces of content, is adding partners including OpenAI, ElevenLabs, and Kakao, and has already seen 50M+ Gemini verification checks, with Search and Chrome next [^22][^23].

- **API price competition accelerated again.** Xiaomi said MiMo-V2.5 pricing is permanently reduced by up to 99%, with unified pricing across context lengths and token plans upgraded to 5–8x more usable tokens [^24]. Another note said MiMo 2.5 Pro now matches DeepSeek V4 Pro pricing [^25].

## Policy & Regulation

*Why it matters: governments are moving from general AI debate to concrete controls and review processes.*

- **China is restricting overseas travel for top AI professionals at private firms including Alibaba and DeepSeek** [^26].
- **The FDA launched a pilot to review AI-generated evidence in drug submissions.** Over 200 AI-designed drugs are already in clinical trials worldwide, but none have FDA approval; the new pilot selects 10 companies for expedited review [^27].

## Quick Takes

- Runway launched Project Luxo, saying AI-generated video has crossed the uncanny valley; one 10-minute film was made by a single person in under a month [^28].
- Anthropic’s new security-guidance plugin for Claude Code cut security-related PR comments by 30-40% in internal rollout and benchmarks [^29][^30].
- Figure signed a commercial agreement with Catalyst Brands to deploy humanoid robots at scale, starting in Reno, Nevada [^31].
- Google introduced Daily Brief, a personalized morning-digest agent designed to be a daily first stop [^32].

---

### Sources

[^1]: [𝕏 post by @kimmonismus](https://x.com/kimmonismus/status/2059311386820289013)
[^2]: [𝕏 post by @SebastienBubeck](https://x.com/SebastienBubeck/status/2059343132991623186)
[^3]: [𝕏 post by @dair_ai](https://x.com/dair_ai/status/2059294269698199929)
[^4]: [𝕏 post by @kimmonismus](https://x.com/kimmonismus/status/2059334220225323045)
[^5]: [𝕏 post by @kimmonismus](https://x.com/kimmonismus/status/2059295549988192660)
[^6]: [𝕏 post by @serenaa_ge](https://x.com/serenaa_ge/status/2059308218564890875)
[^7]: [𝕏 post by @kimmonismus](https://x.com/kimmonismus/status/2059390325077004549)
[^8]: [𝕏 post by @Alibaba_Qwen](https://x.com/Alibaba_Qwen/status/2057450220708147250)
[^9]: [𝕏 post by @arena](https://x.com/arena/status/2059297720079393107)
[^10]: [𝕏 post by @dair_ai](https://x.com/dair_ai/status/2059333792775745619)
[^11]: [𝕏 post by @lvwerra](https://x.com/lvwerra/status/2056774820872831234)
[^12]: [𝕏 post by @krishgarg](https://x.com/krishgarg/status/2059041521576648980)
[^13]: [𝕏 post by @rourke_heath](https://x.com/rourke_heath/status/2058994443022385376)
[^14]: [𝕏 post by @laszlogaal_](https://x.com/laszlogaal_/status/2058852177163272458)
[^15]: [𝕏 post by @bilawalsidhu](https://x.com/bilawalsidhu/status/2058593677094609228)
[^16]: [𝕏 post by @alexanderchen](https://x.com/alexanderchen/status/2058330610574221672)
[^17]: [𝕏 post by @cohere](https://x.com/cohere/status/2057120818551734589)
[^18]: [𝕏 post by @nickfrosst](https://x.com/nickfrosst/status/2057132425310851104)
[^19]: [𝕏 post by @arena](https://x.com/arena/status/2059346024632820146)
[^20]: [𝕏 post by @arena](https://x.com/arena/status/2059346032157462668)
[^21]: [𝕏 post by @steph_palazzolo](https://x.com/steph_palazzolo/status/2059422732995932405)
[^22]: [𝕏 post by @GoogleDeepMind](https://x.com/GoogleDeepMind/status/2059235181274202500)
[^23]: [𝕏 post by @GoogleDeepMind](https://x.com/GoogleDeepMind/status/2059235184130535436)
[^24]: [𝕏 post by @XiaomiMiMo](https://x.com/XiaomiMiMo/status/2059314052892099070)
[^25]: [𝕏 post by @kimmonismus](https://x.com/kimmonismus/status/2059354372643975490)
[^26]: [𝕏 post by @DeItaone](https://x.com/DeItaone/status/2059188403002916984)
[^27]: [𝕏 post by @kimmonismus](https://x.com/kimmonismus/status/2059189028549738594)
[^28]: [𝕏 post by @runwayml](https://x.com/runwayml/status/2059279505009615293)
[^29]: [𝕏 post by @ClaudeDevs](https://x.com/ClaudeDevs/status/2059385239781384341)
[^30]: [𝕏 post by @ClaudeDevs](https://x.com/ClaudeDevs/status/2059385242319012188)
[^31]: [𝕏 post by @adcock_brett](https://x.com/adcock_brett/status/2059351113984557262)
[^32]: [𝕏 post by @Google](https://x.com/Google/status/2059302072705585389)