# DeepSeek Cuts Inference Costs as Web Agents and Open Coding Advance

*By AI High Signal Digest • June 28, 2026*

DeepSeek led the cycle with a major inference optimization and new serving economics, while long-horizon web agents and open-source coding models showed clear progress. The brief also covers important research on ensembling and evaluation, fresh document AI tools, and a policy update on Anthropic model access.

## Top Stories

*Why it matters: the clearest signals today were about cheaper inference, more capable agents, and stronger open-source specialization.*

- **DeepSeek turned inference into the story.** It released **DSpark**, a semi-parallel speculative decoding method, said production DSV4 saw roughly **50% throughput/latency gains** with up to **~80% latency improvement**, open-sourced the **DeepSpec** training/evaluation stack, and disclosed V4-Pro serving economics indicating **at least 3x cheaper** serving than prior benchmarks and roughly **5x cheaper** inference at 50 TPS. *Impact:* frontier competition is increasingly about token delivery and serving efficiency, not just better base models. [^1][^2][^3][^4]

- **Web agents are getting more real-world.** Google DeepMind’s CUA team took **#1 on Odysseys** with a vision-only Gemini 3.5 Flash agent; the benchmark focuses on **multi-hour** web workflows that require planning, memory, reasoning, and verification across many sites and tools. ViDA’s open-source **BrowserBC** turns one recorded human browser flow into reusable skills and improved **WebArena-Hard** success from **60% to 81%** while cutting tool calls **27%**. *Impact:* progress is shifting from short browser demos to reusable, long-horizon workflows. [^5][^6][^7][^6]

- **Open-source coding models kept moving upstack.** **Ornith-1.0** launched as an MIT-licensed family for agentic coding in sizes from **9B** to **397B MoE**, using an RL-based self-improving strategy that jointly optimizes scaffolds and solutions. The team reports state-of-the-art open-source results on benchmarks including **Terminal-Bench 2.1** and **SWE-Bench Verified**. *Impact:* self-hosted coding stacks are becoming more capable and more commercially usable. [^8]

## Research & Innovation

*Why it matters: several new papers challenged common assumptions about ensembling, evaluation, and AI readiness in medicine.*

- **Model ensembling got a reality check.** A new paper argues that any router, voting system, or mixture-of-agents setup that must return one member model’s answer is capped at **1 − β**, where **β** is the fraction of queries that every candidate model gets wrong. It also argues that low pairwise error correlation does **not** reveal that ceiling. [^9]

- **BINEVAL made LLM judging more inspectable.** It breaks each evaluation criterion into atomic yes/no questions and aggregates the results into calibrated multidimensional scores; across **SummEval, Topical-Chat, and QAGS**, it matched or beat **UniEval** and **G-Eval**, with especially strong factual-consistency results. [^10]

- **Medical AI showed both promise and limits.** One ECG model was reported to flag sudden-cardiac-death risk and, with a generative explainability model, reveal a new biomarker. Separately, **GPT-5.5 Pro** improved radiology interpretation scores to **79/100** from **69/100** on older models, but the evaluation still found it short of reliable clinical use. [^11][^12]

## Products & Launches

*Why it matters: the strongest launches focused on practical infrastructure for documents and agents.*

- **Mistral OCR 4** is a self-hostable document-intelligence model with bounding boxes, block classification, and confidence scores; one roundup said it beat competitors in human-preference testing and topped **OlmOCRBench**. [^13]

- **LiteParse** was highlighted as an open-source parser with **~3 ms average page latency**, support for **50+ formats**, basic bounding boxes, and top results on **OpenDataLoader-Bench, OlmOCR-Bench, and ParseBench**. [^14][^15]

- **Project Think** said its next version lets agents make **read-only fetch requests** with SSRF hardening, explicit allowlists, markdown-first responses, and separate caps for downloads versus model context. [^16]

## Industry Moves

*Why it matters: strategy and org structure are starting to matter almost as much as raw model quality.*

- **Microsoft made a leadership bet on Copilot.** Reporting says Satya Nadella handed Copilot to **Jacob Andreou**, 33, as part of Microsoft’s push to regain AI momentum. [^17]

- **Sakana AI is pushing orchestration and sovereign deployment.** The company said Japanese megabanks are moving AI workflows from **PoC into production**, argued that orchestrating many models may beat relying on one giant frontier model, and framed sovereign AI as the ability to develop, adapt, and run AI domestically inside global supply chains. [^18]

## Policy & Regulation

*Why it matters: access to top models is increasingly a regulatory decision, not just a product rollout.*

- **Anthropic said the US government cleared Mythos 5 for a narrow return.** The company said its strongest cybersecurity model can be redeployed to a set of US organizations that operate and defend critical infrastructure, while broader Mythos and Fable availability is still being worked through with the government. [^19]

## Quick Takes

*Why it matters: these smaller updates still point to where performance, adoption, and tooling are moving next.*

- OpenAI says **750 tokens/sec** is coming to **5.6 Sol** in July. [^20][^21]
- A **GMAC** survey of 600+ recruiters found **1 in 3 employers** replacing entry-level jobs with AI; tech was highest at **40%**. [^22]
- **Seed Audio 1.0** was highlighted for scene-level audio generation, including multi-character dialogue and delivery from a single prompt. [^23]
- **Datalab** said its balanced extraction mode hit **95.9%** on an internal 225-document benchmark, above Reducto Deep Extract at less than half the price. [^24]

---

### Sources

[^1]: [𝕏 post by @eliebakouch](https://x.com/eliebakouch/status/2070762049362370602)
[^2]: [𝕏 post by @scaling01](https://x.com/scaling01/status/2070739300853907830)
[^3]: [𝕏 post by @teortaxesTex](https://x.com/teortaxesTex/status/2070832301005537627)
[^4]: [𝕏 post by @scaling01](https://x.com/scaling01/status/2070846579217519069)
[^5]: [𝕏 post by @rsalakhu](https://x.com/rsalakhu/status/2070888315725717551)
[^6]: [𝕏 post by @kimmonismus](https://x.com/kimmonismus/status/2070986798092702034)
[^7]: [𝕏 post by @vida_agent](https://x.com/vida_agent/status/2070921732459024492)
[^8]: [𝕏 post by @ornith_](https://x.com/ornith_/status/2070148887067963854)
[^9]: [𝕏 post by @dair_ai](https://x.com/dair_ai/status/2070984020155158636)
[^10]: [𝕏 post by @omarsar0](https://x.com/omarsar0/status/2070942495832470001)
[^11]: [𝕏 post by @iScienceLuvr](https://x.com/iScienceLuvr/status/2070789091944526102)
[^12]: [𝕏 post by @yishan](https://x.com/yishan/status/2070742742133780960)
[^13]: [𝕏 post by @dl_weekly](https://x.com/dl_weekly/status/2070854833934876745)
[^14]: [𝕏 post by @jerryjliu0](https://x.com/jerryjliu0/status/2070905612238758217)
[^15]: [𝕏 post by @llama_index](https://x.com/llama_index/status/2070181882411561100)
[^16]: [𝕏 post by @threepointone](https://x.com/threepointone/status/2070944460230508625)
[^17]: [𝕏 post by @SebasAHerrera](https://x.com/SebasAHerrera/status/2070871433874677847)
[^18]: [𝕏 post by @SakanaAILabs](https://x.com/SakanaAILabs/status/2071048213390598323)
[^19]: [𝕏 post by @AnthropicAI](https://x.com/AnthropicAI/status/2070665903440871779)
[^20]: [𝕏 post by @sama](https://x.com/sama/status/2070609922631537024)
[^21]: [𝕏 post by @stevenheidel](https://x.com/stevenheidel/status/2070970771351171378)
[^22]: [𝕏 post by @kimmonismus](https://x.com/kimmonismus/status/2070764396431708405)
[^23]: [𝕏 post by @TomLikesRobots](https://x.com/TomLikesRobots/status/2070923534449119424)
[^24]: [𝕏 post by @VikParuchuri](https://x.com/VikParuchuri/status/2070939803982426510)