# DeepSeek’s Hardware Push and GPT-5.5’s Agent Gains

*By AI High Signal Digest • April 26, 2026*

DeepSeek’s V4 story is shifting toward deployment, hardware, and sharply mixed external evaluations, while GPT-5.5 is gaining traction in agentic workflows rather than just chat benchmarks. The brief also covers key research on harnesses and inference-time scaling, major launches from Tencent and Qwen, and a new U.S. warning tied to DeepSeek.

## Top Stories

*Why it matters:* The biggest signals today were deployment readiness, agent performance, and the hardware limits underneath both.

- **DeepSeek V4 is quickly becoming a systems story.** LMSYS said SGLang and Miles shipped day-0 support for V4 Pro and Flash, reaching **199 tok/s** on B200 and **266 tok/s** on H200 at 4K context, with only about **10%** throughput loss at 900K context [^1][^2]. Reuters said DeepSeek also launched a preview adapted for Huawei chip technology, and outside analysis pointed to MXFP4 support on Huawei’s 950DT with training software planned within a week [^3][^4]. External evals were highly mixed: V4-Pro became the top open model on MathArena, but BridgeBench ranked it last with an **11.2** quality score [^5][^6].

- **GPT-5.5’s strongest evidence is in agentic execution.** It ranked first on VoxelBench at **2101** versus **1722** for second place and had a **96%** win rate from **517** user votes, though one observer cautioned that highly visual tasks may be prone to benchmaxxing [^7][^8][^7][^8]. In practitioner testing, it was described as finally matching Claude on tool calls for long-running agents, making fewer correct tool calls, and producing one example report in **11 minutes** [^9][^10].

- **The next bottleneck is hardware, not ambition.** SemiAnalysis CEO Dylan Patel called this the biggest capability leap in nearly two years, with execution getting extremely cheap even as supply chains remain tight [^11]. He highlighted CPUs as an underestimated bottleneck because RL environments and deployment workloads are CPU-heavy, and said DRAM prices could double or triple by 2028 while TSMC capex could reach **$100B** [^11].

## Research & Innovation

*Why it matters:* The most useful technical progress is coming from better harnesses and inference-time search, even as new benchmarks show major reliability gaps.

- **AutoHarness** uses LLM-based code synthesis to build a Python harness around an LLM policy; the authors say AutoHarness plus a small Gemini Flash beat Gemini-2.5-Pro and GPT-5.2-High on TextArena games [^12].
- A new **test-time scaling** framework for agentic coding turns rollouts into structured summaries of hypotheses, progress, and failure modes, then applies Recursive Tournament Voting and Parallel-Distill-Refine; on SWE-Bench Verified, Claude-4.5-Opus improved from **70.9%** to **77.6%**, and on Terminal-Bench from **46.9%** to **59.1%** [^13].
- Microsoft’s **DELEGATE-52** benchmark simulates long document-editing workflows across **52** domains and found that **19** tested models corrupted an average of **25%** of document content by the end of long workflows; agentic tool use did not help [^14].

## Products & Launches

*Why it matters:* New releases are clustering around multimodal quality and always-on agent behavior rather than generic chatbots.

- **Tencent Hunyuan Hy3 Preview** launched after a rebuilt architecture and is now deployed across Yuanbao and Tencent AI products, with upgrades in dialogue, coding, agents, math, instruction following, and long-context understanding [^15]. A cited external analysis said it enters China’s top tier with stronger instruction following and token efficiency, but still shows noticeable hallucination issues and weaker fact retention than Qwen [^15].
- **Qwen-Image-2.0-Pro** landed at **#9** on Arena’s text-to-image ranking, **#17** in image edit, and top 10 in portraits, photorealistic/cinematic imagery, and art; Qwen says the update improves complex instruction following, visual fidelity, multilingual text rendering, and consistency across styles [^16][^17][^16][^18][^19][^20][^21].
- **Yutori Delegate** launched as an always-on agent that monitors, researches, and acts across the web in the background [^22].

## Industry Moves

*Why it matters:* Labs are increasingly competing on governed deployment, shared agent infrastructure, and compute allocation strategy.

- **Databricks and OpenAI** are pushing GPT-5.5 into governed enterprise environments: Databricks made the model available under Unity AI Gateway, with support for coding workflows, enterprise-grounded agents, natural-language data queries via Genie, and document pipelines [^23].
- **Hugging Face** is leaning into shared agent infrastructure: GPT-5.5 is now available in ml-intern with access to buckets, jobs, and repos, and the company says ml-intern agents already collaborate through shared buckets, datasets, leaderboards, and community tabs [^24][^25].
- **Anthropic’s compute strategy question is getting sharper.** Its CPO said labs are actively weighing whether advanced models may be more valuable held back for internal deployment, because compute must be split among RL, customer workloads, and the next pretraining run [^26].

## Policy & Regulation

*Why it matters:* Government attention is increasingly focusing on cross-border AI competition and alleged model theft.

- Reuters reported that the **U.S. State Department** ordered a global warning about alleged China AI thefts by DeepSeek, escalating AI competition into a more explicit diplomatic issue [^27].

## Quick Takes

*Why it matters:* A few smaller updates still stood out for cost, efficiency, and infrastructure.

- A grammar-constrained inference trick on **Qwen 3.6** cut HumanEval+ think tokens **22x** with no accuracy loss and lifted LiveCodeBench public-slice pass@1 by **14%** with about **5x** fewer total tokens [^28].
- **GPT-5.5 xhigh** still came out cheaper than Sonnet on the Artificial Analysis Index, while **5.5 medium** posted roughly 5.4-xhigh-level performance [^29].
- Google Research is demonstrating **Sensitive Content Warnings** in Google Messages as an on-device system that keeps processing private [^30].
- **pyptx** launched as a Python DSL for writing NVIDIA PTX kernels with Hopper and Blackwell support plus JAX and PyTorch integration [^31].

---

### Sources

[^1]: [𝕏 post by @lmsysorg](https://x.com/lmsysorg/status/2048121451316965667)
[^2]: [𝕏 post by @teortaxesTex](https://x.com/teortaxesTex/status/2048124436679676248)
[^3]: [𝕏 post by @ReutersBiz](https://x.com/ReutersBiz/status/2047612763388654016)
[^4]: [𝕏 post by @teortaxesTex](https://x.com/teortaxesTex/status/2048006259412963436)
[^5]: [𝕏 post by @j_dekoninck](https://x.com/j_dekoninck/status/2047928772540690603)
[^6]: [𝕏 post by @bridgebench](https://x.com/bridgebench/status/2047674937016496231)
[^7]: [𝕏 post by @voxelbench](https://x.com/voxelbench/status/2048041316164960610)
[^8]: [𝕏 post by @scaling01](https://x.com/scaling01/status/2048048794034422208)
[^9]: [𝕏 post by @rishdotblog](https://x.com/rishdotblog/status/2047939407689503228)
[^10]: [𝕏 post by @rishdotblog](https://x.com/rishdotblog/status/2048041092395008221)
[^11]: [𝕏 post by @QQ_Timmy](https://x.com/QQ_Timmy/status/2047897232515752319)
[^12]: [𝕏 post by @sirbayes](https://x.com/sirbayes/status/2029659700653092901)
[^13]: [𝕏 post by @rsalakhu](https://x.com/rsalakhu/status/2048148033435546013)
[^14]: [𝕏 post by @omarsar0](https://x.com/omarsar0/status/2048083630531211540)
[^15]: [𝕏 post by @ZhihuFrontier](https://x.com/ZhihuFrontier/status/2048277380696064215)
[^16]: [𝕏 post by @Alibaba_Qwen](https://x.com/Alibaba_Qwen/status/2048022731548229869)
[^17]: [𝕏 post by @arena](https://x.com/arena/status/2047341506441380054)
[^18]: [𝕏 post by @Alibaba_Qwen](https://x.com/Alibaba_Qwen/status/2048024933805244779)
[^19]: [𝕏 post by @Alibaba_Qwen](https://x.com/Alibaba_Qwen/status/2048024937999577533)
[^20]: [𝕏 post by @Alibaba_Qwen](https://x.com/Alibaba_Qwen/status/2048024942286147822)
[^21]: [𝕏 post by @Alibaba_Qwen](https://x.com/Alibaba_Qwen/status/2048024946342072482)
[^22]: [𝕏 post by @togethercompute](https://x.com/togethercompute/status/2048040458358800427)
[^23]: [𝕏 post by @databricks](https://x.com/databricks/status/2047848795862364468)
[^24]: [𝕏 post by @_lewtun](https://x.com/_lewtun/status/2048079747855298733)
[^25]: [𝕏 post by @ClementDelangue](https://x.com/ClementDelangue/status/2048065651940471138)
[^26]: [𝕏 post by @Hangsiin](https://x.com/Hangsiin/status/2047951982556766297)
[^27]: [𝕏 post by @kimmonismus](https://x.com/kimmonismus/status/2048044826545766690)
[^28]: [𝕏 post by @andthatto](https://x.com/andthatto/status/2048103064922447992)
[^29]: [𝕏 post by @theo](https://x.com/theo/status/2048134278760857949)
[^30]: [𝕏 post by @GoogleResearch](https://x.com/GoogleResearch/status/2048047440096358715)
[^31]: [𝕏 post by @PatrickToulme](https://x.com/PatrickToulme/status/2048113408940159430)