# Research Returns as DeepSeek Gains Momentum and Agent Tools Expand

*By AI High Signal Digest • May 4, 2026*

Ilya Sutskever’s call for a return to original research, DeepSeek V4’s efficiency-driven momentum, and a wave of agent infrastructure launches lead today’s brief. Also included: Sakana’s latest orchestration work, concrete enterprise deployments, and new signals on the compute bottleneck.

## Top Stories

*Why it matters:* The clearest signals today were that easy scaling is weakening, open-model economics are improving fast, and compute remains the hard constraint.

- **Sutskever says AI is back to research.** He said pre-training will run out of data and that the field is returning to an “age of research,” where original ideas matter more than just scaling the old recipe [^1]. NandoDF added that building a top-20 LLM now looks more like recipe plus capital—about $0.5B for chips—than a pure research problem, pushing the edge toward innovation beyond scale [^2].
- **DeepSeek V4 is driving the open-model conversation.** Posts this weekend described it as a new open-source leader on quality and price; separate users highlighted low long-context cost, days-long cache economics, and stronger tool use once harness issues were repaired [^3][^4][^5][^6]. The practical signal is that open-model competition is shifting toward efficiency and harness design, not only raw scores.
- **Compute remains bottlenecked and geopolitically messy.** One post relaying Jensen Huang said Nvidia’s China share has fallen to zero under export controls, while another thread argued Chinese frontier models still trail the US frontier by about eight months as the compute gap widens [^7][^8]. At the same time, most 2026 GPU supply is reportedly already spoken for even as xAI’s fleet is said to be running at roughly 11% utilization [^9][^10].

## Research & Innovation

*Why it matters:* The most interesting research updates pushed on orchestration, real-time speech, and generative efficiency.

- **Sakana’s 7B Conductor** uses RL to orchestrate frontier models by choosing workers, subtasks, and context, and reportedly set records on LiveCodeBench and GPQA-Diamond while beating more expensive multi-agent baselines [^11].
- **KAME** tackles speech latency with a tandem design: a speech-to-speech frontend starts replying immediately while a backend LLM injects knowledge asynchronously, aiming to move from “think, then speak” to “speak while thinking” [^12].
- **FD-loss** pushed one-step pixel-space generation from 0.9 to 0.75 FID, according to Jiawei Yang, by directly optimizing FID rather than only treating it as an evaluation metric [^13].

## Products & Launches

*Why it matters:* New launches were mostly about agent infrastructure rather than single-model demos.

- **OpenAI Agents SDK** is an open orchestration layer for multi-agent workflows, with sessions, human-in-the-loop support, tracing, voice agents, sandboxed execution, and compatibility with 100+ models [^14].
- **Sakana Fugu** entered beta as a multi-agent orchestration system with SOTA claims on SWE-Pro, GPQA-D, and ALE-Bench, exposed through an OpenAI-compatible API with Mini and Ultra variants [^15].
- **Codex Security plugin** packages five AppSec workflows—security scan, threat model, finding discovery, validation, and attack-path analysis—into a review pipeline from threat model to report [^16].

## Industry Moves

*Why it matters:* The strongest commercial signals came from enterprise deployment and clearer visibility into training scale.

- **Sakana and SMBC** deployed a proposal-generation application at Sumitomo Mitsui Bank. The system uses multiple AI agents for information gathering, hypothesis building, and proposal structuring, with proposal creation expected to fall from 1–2 weeks to tens of minutes or hours [^17][^18].
- **Poolside disclosed large training runs.** One model used 6–8K H200s for a 225B-total, 23B-active system, while a 30B-total, 3B-active model reached 33T tokens in about 20 days on 2K GPUs [^19][^20].
- **Ricoh says its 70B Japanese LLM is already automating financial tasks** such as loan approvals, a sign that domain-specific enterprise models are moving into regulated workflows [^21].

## Quick Takes

*Why it matters:* Smaller updates still added useful signal on tooling, safety, and deployment gaps.

- **vLLM v0.20.1** shipped 10+ fixes and optimizations for running DeepSeek V4 in production [^22].
- **PDF parsing remains a major agent bottleneck,** because PDFs are built for display rather than clean semantic extraction; Jerry Liu pointed to VLM-based approaches such as LlamaParse and ParseBench [^23].
- **A safety paper suggests multi-agent alignment is harder than single-agent alignment:** teams of individually aligned agents can still produce less ethical but more effective solutions [^24].
- **OpenRouter launched free response caching,** aimed at lowering the cost of tests and agent retries; Hermes Agent now supports it [^25][^26].

---

### Sources

[^1]: [𝕏 post by @r0ck3t23](https://x.com/r0ck3t23/status/2050771197105680710)
[^2]: [𝕏 post by @NandoDF](https://x.com/NandoDF/status/2050889627406315602)
[^3]: [𝕏 post by @bindureddy](https://x.com/bindureddy/status/2051142381752517114)
[^4]: [𝕏 post by @jbhuang0604](https://x.com/jbhuang0604/status/2050686882653065496)
[^5]: [𝕏 post by @teortaxesTex](https://x.com/teortaxesTex/status/2051175787294769346)
[^6]: [𝕏 post by @MrAhmadAwais](https://x.com/MrAhmadAwais/status/2050956678502420612)
[^7]: [𝕏 post by @kimmonismus](https://x.com/kimmonismus/status/2050960942914023648)
[^8]: [𝕏 post by @fleetingbits](https://x.com/fleetingbits/status/2051166013929214079)
[^9]: [𝕏 post by @Yuchenj_UW](https://x.com/Yuchenj_UW/status/2051154420692623821)
[^10]: [𝕏 post by @theinformation](https://x.com/theinformation/status/2050606311440531809)
[^11]: [𝕏 post by @SakanaAILabs](https://x.com/SakanaAILabs/status/2048777689763639741)
[^12]: [𝕏 post by @SakanaAILabs](https://x.com/SakanaAILabs/status/2049544945233764755)
[^13]: [𝕏 post by @JiaweiYang118](https://x.com/JiaweiYang118/status/2050032859155734743)
[^14]: [𝕏 post by @TheTuringPost](https://x.com/TheTuringPost/status/2050903494010499113)
[^15]: [𝕏 post by @SakanaAILabs](https://x.com/SakanaAILabs/status/2047479445209145785)
[^16]: [𝕏 post by @reach_vb](https://x.com/reach_vb/status/2051019108028969251)
[^17]: [𝕏 post by @SakanaAILabs](https://x.com/SakanaAILabs/status/2049655589027123402)
[^18]: [𝕏 post by @SakanaAILabs](https://x.com/SakanaAILabs/status/2049666328664097091)
[^19]: [𝕏 post by @teortaxesTex](https://x.com/teortaxesTex/status/2050944580116168936)
[^20]: [𝕏 post by @eisokant](https://x.com/eisokant/status/2049400822186803308)
[^21]: [𝕏 post by @nikkeibpITpro](https://x.com/nikkeibpITpro/status/2050779920671092899)
[^22]: [𝕏 post by @vllm_project](https://x.com/vllm_project/status/2050961077769494830)
[^23]: [𝕏 post by @jerryjliu0](https://x.com/jerryjliu0/status/2050961097642086427)
[^24]: [𝕏 post by @dl_weekly](https://x.com/dl_weekly/status/2050983946695897150)
[^25]: [𝕏 post by @OpenRouter](https://x.com/OpenRouter/status/2050616590098473445)
[^26]: [𝕏 post by @Teknium](https://x.com/Teknium/status/2050863343917752476)