# FutureSim Debuts, Mistral Targets 1GW Capacity, and Malta Rolls Out ChatGPT Plus

*By AI High Signal Digest • May 17, 2026*

A new forecasting benchmark gives frontier agents a tougher continual-learning test, Mistral lays out an independence-first compute buildout, and Malta ties nationwide ChatGPT Plus access to AI literacy. The brief also covers new work on benchmark validity, long-context efficiency, tool use, and agent products from Pinecone, OpenAI, and xAI.

## Top Stories

*Why it matters: today’s biggest developments touched evaluation, infrastructure, and labor impact.*

- **FutureSim introduced a tougher benchmark for agentic forecasting.** It was designed to address the lack of realistic continual-learning evaluations by replaying the web day by day from Jan. 1, 2026, with date-gated access to about 244,000 real news articles and forecasts on events resolving over the next 90 days [^1][^2]. In native harnesses, GPT-5.5 led at 25% accuracy, ahead of Opus 4.6 at 20%, DeepSeek V4 Pro at 13%, GLM 5.1 at 10%, and Qwen3.6 Plus at 5%; on some parallel Polymarket questions, GPT-5.5 sometimes beat the crowd aggregate, including Super Bowl LX [^2]. The benchmark is meant to test adaptation, memory across 1,000+ tool calls, search, and inference scaling, and one observer described future-prediction benchmarks as scalable and hard to saturate [^2][^3].

- **Mistral laid out an independence-first compute strategy.** CEO Arthur Mensch said the company rejects acquisition offers because its mission is to remain independent [^4]. Notes from the same discussion put Mistral above €1B in R&D spend this year and targeting 1GW of datacenter capacity by 2029, with current clusters at 40MW in France and 25MW in Sweden and another 80MW planned in France next year [^4].

- **Anthropic CEO Dario Amodei warned that AI could bring very high GDP growth alongside very high unemployment and inequality, potentially reaching a 10% unemployment rate** [^5].

## Research & Innovation

*Why it matters: the most useful papers today were about whether agents are being measured and optimized correctly.*

- ***The Evaluation Trap* argues many AI evals test proxy behaviors rather than underlying capabilities.** The paper says most benchmarks bake in implicit theories, and that many agent leaderboards are not measuring what people think they are [^6].

- **Meta’s SP-KV targets long-context efficiency.** The method uses a small utility predictor to decide which older key-value pairs to keep while preserving a local sliding window, reducing KV cache size by about 3x-10x and improving decoding speed and memory bandwidth [^7].

- **A new interpretability paper isolates a tool-use failure mode.** Researchers found models often recognize they should call a tool but fail to do so, with mismatch rates of 26%-54% concentrated in the cognition-to-action transition [^8]. The authors say late-layer representations rotate the signal away from the final action, which may help explain stubborn tool-use prompting ceilings [^8].

## Products & Launches

*Why it matters: the main product updates aimed at cheaper retrieval, faster coding workflows, and broader agent access.*

- **Pinecone launched Nexus, a knowledge-engine layer for agents.** It claims up to 90% lower token use by compiling task-optimized artifacts before query time instead of sending raw files to agents, then indexing those artifacts for semantic, sparse, and full-text search [^9].

- **OpenAI shipped a meaningful Codex UX and performance pass.** Updates include customizable shortcuts, Git actions back in review flow, cleaner thread and local server panels, roughly 75% less re-rendering on thread switches, and 10x-50x faster Git operations in large repos [^10][^11][^12][^13][^14].

- **xAI widened Hermes Agent distribution.** X Premium+ and SuperGrok subscribers can now access Grok, X Search, image and video generation, and voice, with X Search available to agents using Grok OAuth login [^15][^16].

## Industry Moves

*Why it matters: these updates point to where labs are trying to extend distribution and control surfaces.*

- **Posts this week described OpenAI expanding Codex into a multi-device control plane.** A reported *Locked Use* setting would let Codex invoke Computer Use on other machines from a main device, creating a personal Codex network across Macs, workstations, and older PCs [^17].

- **Claude Mythos appeared in Google Cloud Console, but the launch path is unclear.** One post noted the preview label is gone and compared the pattern to Opus 4.7’s pre-release appearance, while another argued Anthropic’s prior statements about Mythos risk make a public release unlikely [^18][^19].

## Policy & Regulation

*Why it matters: this is a country-scale public AI access program tied to mandatory literacy training.*

- **Malta became the first country to offer ChatGPT Plus free to every citizen for one year.** Access requires completing an AI literacy course built by the University of Malta rather than OpenAI, framing the program around basic AI education with tool access as the incentive [^20].

## Quick Takes

*Why it matters: these smaller updates still show progress in robotics, open-source tooling, and model compression.*

- Figure said its F.03 humanoids reached **Day 4** of nonstop 24/7 autonomous operation until failure [^21][^22].
- Eric Jang said a strong AlphaGo-style system can now be trained from scratch for **a few thousand dollars** of rented compute, with tutorial, code, and a playable bot released publicly [^23].
- Antirez released per-layer quantized **DeepSeek V4** models on Hugging Face, using Q8 for attention, shared experts, and output layers and 2-bit quantization elsewhere to protect quality-critical weights [^24].
- **Khala 1.0**, a music model from Beijing’s Central Conservatory of Music, launched with paper, code, weights, and demo all open-sourced [^25].

---

### Sources

[^1]: [𝕏 post by @ShashwatGoel7](https://x.com/ShashwatGoel7/status/2055336064378720412)
[^2]: [𝕏 post by @arvindh__a](https://x.com/arvindh__a/status/2055336266322039045)
[^3]: [𝕏 post by @teortaxesTex](https://x.com/teortaxesTex/status/2055737704340435188)
[^4]: [𝕏 post by @eliebakouch](https://x.com/eliebakouch/status/2055636477447389583)
[^5]: [𝕏 post by @TheChiefNerd](https://x.com/TheChiefNerd/status/2055632971789480132)
[^6]: [𝕏 post by @dair_ai](https://x.com/dair_ai/status/2055747638381920342)
[^7]: [𝕏 post by @TheTuringPost](https://x.com/TheTuringPost/status/2055828260542644463)
[^8]: [𝕏 post by @omarsar0](https://x.com/omarsar0/status/2055750162526715926)
[^9]: [𝕏 post by @TheTuringPost](https://x.com/TheTuringPost/status/2055807882650903000)
[^10]: [𝕏 post by @OpenAIDevs](https://x.com/OpenAIDevs/status/2055717793841221796)
[^11]: [𝕏 post by @OpenAIDevs](https://x.com/OpenAIDevs/status/2055717859058495596)
[^12]: [𝕏 post by @OpenAIDevs](https://x.com/OpenAIDevs/status/2055717927064846802)
[^13]: [𝕏 post by @OpenAIDevs](https://x.com/OpenAIDevs/status/2055717993460732191)
[^14]: [𝕏 post by @OpenAIDevs](https://x.com/OpenAIDevs/status/2055718005309575431)
[^15]: [𝕏 post by @NousResearch](https://x.com/NousResearch/status/2055748546679472322)
[^16]: [𝕏 post by @Teknium](https://x.com/Teknium/status/2055749507535835331)
[^17]: [𝕏 post by @testingcatalog](https://x.com/testingcatalog/status/2055708109343994335)
[^18]: [𝕏 post by @AiBattle_](https://x.com/AiBattle_/status/2055762242373558477)
[^19]: [𝕏 post by @kimmonismus](https://x.com/kimmonismus/status/2055767238636933494)
[^20]: [𝕏 post by @kimmonismus](https://x.com/kimmonismus/status/2055718739581305021)
[^21]: [𝕏 post by @Figure_robot](https://x.com/Figure_robot/status/2055695818984976697)
[^22]: [𝕏 post by @adcock_brett](https://x.com/adcock_brett/status/2055695727985352722)
[^23]: [𝕏 post by @ericjang11](https://x.com/ericjang11/status/2055359839371772356)
[^24]: [𝕏 post by @witcheer](https://x.com/witcheer/status/2055401320178204766)
[^25]: [𝕏 post by @junmingong](https://x.com/junmingong/status/2055452194632302640)