# Codex Goes Mobile, Figure Extends Humanoid Runtime, and Autonomous Agents Beat a Human Baseline

*By AI High Signal Digest • May 15, 2026*

Codex went mobile, Figure extended humanoid runtime past a full day, and PrimeIntellect showed autonomous coding agents beating a human nanoGPT baseline. The brief also covers diffusion decoding speedups, time-series scaling laws, enterprise data agents, Anthropic’s Gates partnership, and the latest U.S.-China compute tensions.

## Top Stories

*Why it matters: today’s strongest signal is that AI agents are becoming more persistent, more physical, and more capable at open-ended technical work.*

- **OpenAI put Codex on the phone.** Codex is now in preview inside the ChatGPT mobile app, letting users start work, review outputs, steer execution, and approve next steps from iOS and Android while jobs keep running on a laptop, Mac mini, or devbox; OpenAI also made Remote SSH generally available for managed remote environments [^1][^2][^3]. Commentators called it a major unlock for remote agent work and broader day-to-day agent usage [^4][^5].

- **Figure pushed humanoid uptime from a shift demo to around-the-clock operation.** Figure said its F.03 robots moved from an original 8-hour target to more than 24 hours of continuous autonomous package sorting without failure, and later crossed 30 hours with no downtime [^6][^7]. The company says the robots are now around human parity at roughly 3 seconds per package, run entirely onboard via Helix-02 with no teleoperation, and have processed more than 38,000 packages [^6][^7].

- **Autonomous coding agents beat the human baseline on nanoGPT optimization.** PrimeIntellect let Claude Code (Opus 4.7) and Codex (GPT-5.5) run autonomously on the nanoGPT speedrun optimizer track using idle compute, totaling about 10,000 runs, 14,000 H200 hours, and 23.9B tokens [^8][^9]. Opus reached 2930 steps and Codex 2950, both ahead of the 2990 human baseline; PrimeIntellect framed the work as a step toward automating AI research [^8][^9][^8].

## Research & Innovation

*Why it matters: the most notable technical updates were about cheaper inference, clearer scaling laws, and better understanding of what models are doing internally.*

- **Zyphra’s diffusion language model targets the decoding bottleneck.** ZAYA1-8B-Diffusion-Preview, trained on AMD hardware, claims a 4.6-7.7x decoding speedup over autoregressive LLMs with minimal quality degradation by generating 16-token blocks in parallel [^10][^11][^12]. The company argues this matters because autoregressive inference is memory-bandwidth bound, while diffusion removes that bottleneck [^13].

- **Datadog’s Toto 2.0 makes the case that time-series models scale cleanly.** The open-weights family ranges from 4M to 2.5B parameters, with each size outperforming the previous one under a single hyperparameter configuration and leading BOOM, GIFT-Eval, and TIME [^14]. Datadog’s framing is that time series now shows the kind of predictable scaling behavior long seen in language and vision [^14].

- **Goodfire found a “geometric calculator” inside Llama models.** The mechanism encodes numbers as positions on multiple circles, handles arithmetic as well as weekday and month reasoning, and was tested by steering the circles and watching answers change [^15][^16][^17]. Goodfire says this kind of neural-geometry work could improve debugging, control, and model design [^18].

## Products & Launches

*Why it matters: new tools keep turning agents from isolated assistants into systems that can work across design, data, and the browser itself.*

- **MagicPath 2.0** is now a multiplayer canvas for humans and agents such as Codex and Claude Code, with real-time shared context and fully functional browser-based prototypes built from real code [^19][^20][^21]. It also supports design-to-repo and repo-to-design round trips through external agents [^22].

- **Perplexity Computer now connects to Snowflake.** The product can run end-to-end workflows on live warehouse data and return answers with SQL, source tables, filters, and metrics, while admins retain control over access and shared business logic [^23][^24].

- **Kimi Web Bridge brings browser actions to major agent stacks.** The extension lets agents search, scroll, click, type, fill spreadsheets, and turn repeated browser work into reusable skills; it supports Kimi Code CLI, Claude Code, Cursor, Codex, Hermes, and more [^25][^26][^27][^25].

## Industry Moves

*Why it matters: major firms are pairing frontier models with real distribution, public-interest deployment, and international expansion.*

- **Anthropic partnered with the Gates Foundation on a $200M package** of grants, Claude credits, and technical support across global health, life sciences, education, agriculture, and economic mobility [^28].

- **Runway is expanding to Japan with a Tokyo base.** The company says Japan is already its third-largest market, its fastest-growing self-serve market in Asia, and has seen 300% enterprise customer growth over the last 12 months [^29][^30].

## Policy & Regulation

*Why it matters: AI geopolitics still turns on compute, and approvals matter less than actual hardware movement.*

- **U.S.-China chip controls remain unresolved in practice.** Reuters-reported approvals cover roughly 10 Chinese firms buying Nvidia H200s, but no chips have shipped yet [^31]. Separate analysis this week argued Chinese labs remain compute-constrained and continue renting or smuggling Nvidia-designed chips from third countries, so the real signal is deliveries, not approvals [^32][^33].

## Quick Takes

*Why it matters: these smaller updates point to where the next wave of tooling, governance, and specialty models is heading.*

- Ahead of Google I/O, a leak described **Gemini Spark** as an always-on agent with access to Gmail, Calendar, location, tasks, and personal context [^34].
- **arXiv** now has a **one-year ban** for hallucinated references in submissions [^35].
- **Baseten** says it serves **Qwen3-TTS** on **vLLM-Omni** at **$3 per 1M characters**, about **90% lower** than comparable closed-source TTS APIs [^36][^37].
- **Intern-S2-Preview**, a **35B** open scientific multimodal model, claims performance comparable to the trillion-scale Intern-S1-Pro on core scientific tasks and launched with day-0 vLLM support [^38][^39].

---

### Sources

[^1]: [𝕏 post by @OpenAI](https://x.com/OpenAI/status/2055016850849993072)
[^2]: [𝕏 post by @OpenAI](https://x.com/OpenAI/status/2055016852133417389)
[^3]: [𝕏 post by @OpenAIDevs](https://x.com/OpenAIDevs/status/2055016938217377945)
[^4]: [𝕏 post by @gdb](https://x.com/gdb/status/2055034165968384099)
[^5]: [𝕏 post by @thursdai_pod](https://x.com/thursdai_pod/status/2055080979551916073)
[^6]: [𝕏 post by @adcock_brett](https://x.com/adcock_brett/status/2054973511572271172)
[^7]: [𝕏 post by @adcock_brett](https://x.com/adcock_brett/status/2055075231002407417)
[^8]: [𝕏 post by @PrimeIntellect](https://x.com/PrimeIntellect/status/2055056380881744365)
[^9]: [𝕏 post by @eliebakouch](https://x.com/eliebakouch/status/2055059154738278851)
[^10]: [𝕏 post by @ZyphraAI](https://x.com/ZyphraAI/status/2055038845809480113)
[^11]: [𝕏 post by @ZyphraAI](https://x.com/ZyphraAI/status/2055038850226086238)
[^12]: [𝕏 post by @ZyphraAI](https://x.com/ZyphraAI/status/2055038853430542773)
[^13]: [𝕏 post by @ZyphraAI](https://x.com/ZyphraAI/status/2055038851572547779)
[^14]: [𝕏 post by @ClementDelangue](https://x.com/ClementDelangue/status/2054991352295731619)
[^15]: [𝕏 post by @GoodfireAI](https://x.com/GoodfireAI/status/2054962258225357024)
[^16]: [𝕏 post by @GoodfireAI](https://x.com/GoodfireAI/status/2054962332749758569)
[^17]: [𝕏 post by @GoodfireAI](https://x.com/GoodfireAI/status/2054962344464453897)
[^18]: [𝕏 post by @GoodfireAI](https://x.com/GoodfireAI/status/2054962356162363599)
[^19]: [𝕏 post by @skirano](https://x.com/skirano/status/2054975534539370708)
[^20]: [𝕏 post by @skirano](https://x.com/skirano/status/2054975555053703361)
[^21]: [𝕏 post by @skirano](https://x.com/skirano/status/2054975552654639180)
[^22]: [𝕏 post by @skirano](https://x.com/skirano/status/2054975547227111671)
[^23]: [𝕏 post by @perplexity_ai](https://x.com/perplexity_ai/status/2054945872451129506)
[^24]: [𝕏 post by @perplexity_ai](https://x.com/perplexity_ai/status/2054945874523095527)
[^25]: [𝕏 post by @Kimi_Moonshot](https://x.com/Kimi_Moonshot/status/2054918374837322140)
[^26]: [𝕏 post by @Kimi_Moonshot](https://x.com/Kimi_Moonshot/status/2054918377978908933)
[^27]: [𝕏 post by @Kimi_Moonshot](https://x.com/Kimi_Moonshot/status/2054918384475832597)
[^28]: [𝕏 post by @AnthropicAI](https://x.com/AnthropicAI/status/2054941901900611787)
[^29]: [𝕏 post by @nikkei](https://x.com/nikkei/status/2055017220766359944)
[^30]: [𝕏 post by @c_valenzuelab](https://x.com/c_valenzuelab/status/2055069698816090213)
[^31]: [𝕏 post by @dnystedt](https://x.com/dnystedt/status/2054811340267733033)
[^32]: [𝕏 post by @RyanFedasiuk](https://x.com/RyanFedasiuk/status/2055097086312624481)
[^33]: [𝕏 post by @kimmonismus](https://x.com/kimmonismus/status/2054868338229309624)
[^34]: [𝕏 post by @kimmonismus](https://x.com/kimmonismus/status/2054855742247584231)
[^35]: [𝕏 post by @stuhlmueller](https://x.com/stuhlmueller/status/2055045918991450529)
[^36]: [𝕏 post by @baseten](https://x.com/baseten/status/2054976770789769513)
[^37]: [𝕏 post by @vllm_project](https://x.com/vllm_project/status/2055136943550427242)
[^38]: [𝕏 post by @intern_lm](https://x.com/intern_lm/status/2055146106799976798)
[^39]: [𝕏 post by @vllm_project](https://x.com/vllm_project/status/2055148034124894395)