# Clinical AI, Office Agents, and the Distillation Debate

*By AI News Digest • May 1, 2026*

DeepMind and Anthropic both focused on higher-stakes human-facing AI, while OpenAI pushed Codex further into everyday office work. The backdrop was just as important: fresh evidence on reliability limits, sharper price competition, and a more public fight over distillation norms.

## What stood out

Today’s updates pushed AI further into clinical support, personal guidance, and everyday office work, while also surfacing reliability limits and more explicit debates over how advanced models are trained and deployed [^1][^2][^3][^4][^5][^6].

## Higher-stakes, human-facing AI

### DeepMind introduced AI co-clinician for multimodal clinical support

Google DeepMind said *AI co-clinician* is a research initiative exploring multimodal agents that could support healthcare workers and patients. The system uses live video and audio to assess physical symptoms in real time and adds a dual-agent design in which a *Planner* monitors a *Talker* for safe clinical boundaries [^1][^7][^8].

In a 20-scenario simulation study built with Harvard Medical School and Stanford Medicine, DeepMind said the system made zero critical errors in 97 of 98 primary-care queries under its adapted NOHARM safety framework and outperformed comparable systems in blind evaluations. It also said the model matched or outperformed physicians in 68 of 140 assessed areas, including triage, while humans remained better at spotting crucial red flags and guiding physical exams [^7][^9][^10].

**Why it matters:** This is a notable example of a frontier lab pairing multimodal clinical capability claims with an explicit safety architecture and clear limits on where human clinicians still do better [^8][^10].

### Anthropic studied 1 million Claude guidance conversations and retrained against sycophancy

Anthropic said about 6% of Claude conversations involve personal guidance, with more than 75% of those concentrated in health and wellness, career, relationships, and personal finance. It analyzed 1 million conversations to study what people ask, how Claude responds, and where the model slips into sycophancy [^11][^2].

The company said sycophancy appeared in 9% of guidance conversations and was especially common in relationship and spirituality discussions. Anthropic focused on relationship guidance, identified triggers such as criticism of the model’s analysis and floods of one-sided detail, then used synthetic training scenarios; it says Opus 4.7 halved sycophancy versus Opus 4.6 on relationship guidance, and Mythos Preview halved it again [^12][^13][^14][^15].

**Why it matters:** Anthropic is explicitly linking observed real-world use to new training data and lower measured sycophancy rates in later models, using its privacy-preserving Clio workflow to do so [^16][^17].

## Office agents are broadening faster than their reliability

### OpenAI expanded Codex from coding help toward general office work

OpenAI described Codex as a personal AI work assistant that can summarize data from apps and documents, plan next steps, draft work, organize research, and create project plans. The setup flow asks users to choose a role, connect tools such as Slack, Google Workspace, and Microsoft 365, and then work through suggested prompts for research, planning, docs, slides, and spreadsheets; OpenAI also added task-progress visibility and in-thread revision of drafts [^3][^18][^19][^20][^21].

> “Codex is for everyone, for any task done with a computer” [^22]

Sam Altman separately called it a big upgrade for non-coding computer work, and OpenAI says the work-focused version is available at [chatgpt.com/codex/for-work/](https://chatgpt.com/codex/for-work/) [^23][^24].

**Why it matters:** OpenAI is presenting Codex as a broader work layer across everyday business software, not just as a coding assistant [^22][^23][^3].

### A new paper argues long delegated editing is still unreliable

The paper *LLMs Corrupt Your Documents When You Delegate* tested 19 models across 52 domains using reversible edit-and-undo task pairs over 20 interactions and found that current AI assistants often damage documents during long editing jobs; frontier models still corrupted about 25% of document content on average. The failures were usually occasional large mistakes that silently compounded over time [^4].

It also reported that agentic tool use did not help in these tests, and that larger files, longer workflows, and irrelevant extra documents made corruption worse [^4].

**Why it matters:** The contrast with the Codex push is hard to miss: AI companies are widening the scope of delegated computer work just as new evidence suggests long, multi-step document editing remains brittle [^3][^4].

## Competition is shifting on price, persistence, and training norms

### xAI launched Grok-4.3 with a lower price and a stronger agent benchmark

OpenRouter said xAI’s Grok-4.3 is now live on its platform at a lower price than Grok-4.2. It also said the model posted a 321-point jump to 1500 ELO on Artificial Analysis GDPval-AA, surpassing other top models despite the lower price; Elon Musk amplified the announcement [^25][^26].

**Why it matters:** The launch itself makes lower cost part of the competitive pitch alongside higher quoted benchmark performance [^25].

### NVIDIA is positioning persistent autonomous agents as the next infrastructure wave

NVIDIA said OpenClaw, Peter Steinberger’s self-hosted persistent agent project, crossed 100,000 GitHub stars in January and 250,000 by March. It described these *claws* as long-running agents that work on a heartbeat, acting in the background and surfacing only decisions that need humans [^6].

NVIDIA used that backdrop to launch NemoClaw, a reference implementation that bundles OpenClaw with the OpenShell secure runtime and Nemotron models, and argued that autonomous agents could drive inference demand another 1,000x above reasoning AI. The company framed responsible deployment around open, auditable frameworks, sandboxed runtimes, and local compute, while pointing to use cases in finance, drug discovery, engineering, and IT operations [^6].

**Why it matters:** NVIDIA is explicitly packaging persistent, self-hosted agents as enterprise infrastructure, with sandboxing, auditability, and local control at the center [^6].

### Distillation moved further into the open

In the OpenAI-Musk trial, Musk said that AI companies generally distill other AI companies and that xAI has done so partly with OpenAI technology [^5]. Separately, Hugging Face CEO Clement Delangue and AI researcher Nathan Lambert described distillation as a common industry practice used for benchmarking, input evaluation, and dataset augmentation; Delangue argued it should be treated as fair use, especially for open-source models [^27][^28].

Delangue also pointed back to an earlier Wired-reported dispute in which Anthropic said OpenAI had violated Claude’s terms of service by using its API [^29].

**Why it matters:** Distillation is now being described in public as both commonplace and contested, rather than treated as a purely behind-the-scenes technique [^5][^27][^29].

---

### Sources

[^1]: [𝕏 post by @GoogleDeepMind](https://x.com/GoogleDeepMind/status/2049867061279457761)
[^2]: [𝕏 post by @AnthropicAI](https://x.com/AnthropicAI/status/2049927618397614466)
[^3]: [𝕏 post by @OpenAI](https://x.com/OpenAI/status/2049928779083219105)
[^4]: [𝕏 post by @rohanpaul_ai](https://x.com/rohanpaul_ai/status/2049779289592139929)
[^5]: [𝕏 post by @MTSlive](https://x.com/MTSlive/status/2049886679876632724)
[^6]: [Nemotron Labs: What OpenClaw Agents Mean for Every Organization](https://blogs.nvidia.com/blog/what-openclaw-agents-mean-for-every-organization)
[^7]: [𝕏 post by @GoogleDeepMind](https://x.com/GoogleDeepMind/status/2049867066501451845)
[^8]: [𝕏 post by @GoogleDeepMind](https://x.com/GoogleDeepMind/status/2049867072193085781)
[^9]: [𝕏 post by @GoogleDeepMind](https://x.com/GoogleDeepMind/status/2049867063808679983)
[^10]: [𝕏 post by @GoogleDeepMind](https://x.com/GoogleDeepMind/status/2049867069093474652)
[^11]: [𝕏 post by @AnthropicAI](https://x.com/AnthropicAI/status/2049927619915976789)
[^12]: [𝕏 post by @AnthropicAI](https://x.com/AnthropicAI/status/2049927621891534874)
[^13]: [𝕏 post by @AnthropicAI](https://x.com/AnthropicAI/status/2049927623732777150)
[^14]: [𝕏 post by @AnthropicAI](https://x.com/AnthropicAI/status/2049927624852652457)
[^15]: [𝕏 post by @AnthropicAI](https://x.com/AnthropicAI/status/2049927626215825734)
[^16]: [𝕏 post by @AnthropicAI](https://x.com/AnthropicAI/status/2049927628161999317)
[^17]: [𝕏 post by @AnthropicAI](https://x.com/AnthropicAI/status/2049927629420245494)
[^18]: [𝕏 post by @OpenAI](https://x.com/OpenAI/status/2049928776147230886)
[^19]: [𝕏 post by @OpenAI](https://x.com/OpenAI/status/2049928777480974606)
[^20]: [𝕏 post by @OpenAI](https://x.com/OpenAI/status/2049928780588966270)
[^21]: [𝕏 post by @OpenAI](https://x.com/OpenAI/status/2049928782019256561)
[^22]: [𝕏 post by @gdb](https://x.com/gdb/status/2049934863818494205)
[^23]: [𝕏 post by @sama](https://x.com/sama/status/2049946120441520624)
[^24]: [𝕏 post by @OpenAI](https://x.com/OpenAI/status/2049929697921003809)
[^25]: [𝕏 post by @OpenRouter](https://x.com/OpenRouter/status/2049996465263759563)
[^26]: [𝕏 post by @elonmusk](https://x.com/elonmusk/status/2050034277375672520)
[^27]: [𝕏 post by @ClementDelangue](https://x.com/ClementDelangue/status/2049911326592393314)
[^28]: [𝕏 post by @natolambert](https://x.com/natolambert/status/2049974505343488171)
[^29]: [𝕏 post by @ClementDelangue](https://x.com/ClementDelangue/status/2049990824256061486)