# Agents Move Into Workflows as Deployment and Evaluation Stacks Mature

*By AI News Digest • June 24, 2026*

AI moved further out of the chat window today: Google, Claude, and NVIDIA each advanced different layers of the agent stack. A stricter coding benchmark, a deeper NVIDIA-AWS infrastructure push, and Yann LeCun’s critique of autoregressive models rounded out the day.

## Agents moved further into production

Today’s strongest theme was AI moving out of the standalone chat box and into APIs, runtimes, and team workflows [^1][^2][^3].

### Google makes its Gemini agent API generally available

Google’s Interactions API is now generally available as a single API for Gemini models and agents, with async `background=True`, multimodal tool use and combination, an isolated remote Linux sandbox via Antigravity Agent, and dedicated coding skills [^1]. Separately, Google says users created more than 1,000,000 native Android apps directly in AI Studio in the last month [^4].

**Why it matters:** This looks like a deliberate push to make agent building a mainstream application workflow rather than a specialized demo stack.

### Claude joins Slack as a teammate

Claude Tag lets Claude join Slack teams with access to the channels and tools a company selects, so teams can tag it into tasks and let it work asynchronously inside existing conversations [^2].

> "Imo this is the 3rd major redesign of LLM UIUX ... a self-contained, persistent, asynchronous entity with org-wide tools and context, working alongside teams of humans." [^5]

In separate commentary on enterprise coding systems, Andrej Karpathy said deeply integrated, multiplayer AI can make it feel like "everyone is a manager" and even "I work from Slack now" [^6].

**Why it matters:** The interface frontier is shifting from standalone chat surfaces to systems embedded in the team’s operating environment.

### Enterprise runtimes are shipping with more structure—and more guardrails

NVIDIA launched an open, modular Agent Toolkit combining Nemotron open models, NemoClaw blueprints for safer behavior, and the OpenShell runtime, with examples spanning life sciences, cybersecurity, and chip design workflows [^3]. Google DeepMind researcher Nenad Tomašev said the field is concentrating heavily on coding agents because software is easier to verify, but argued human oversight is still necessary because agents are not 100% accurate and face automation bias, prompt-injection-style "agentic traps," and cognitive monoculture risks [^7].

**Why it matters:** Enterprise agent adoption is increasingly about the surrounding harness—runtime, permissions, verification, and safety—not just the base model.

## Coding agents are getting harder to evaluate

### DeepSWE aims at real software work, not benchmark contamination

DeepSWE is a new contamination-free benchmark with tasks written from scratch across 91 repositories and five languages, using hand-written verifiers that test software behavior rather than implementation details [^8]. Its tasks require roughly 5.5x more code and about 2x more output tokens than prior benchmarks, and the project is open-source on [GitHub](https://github.com/datacurve-ai/deep-swe) [^8].

**Why it matters:** Better benchmarks are becoming necessary as coding agents move into production. In parallel, François Chollet warned that unnecessary code can mechanically compound in agentic coding, turning complexity into a tax on every future change [^9][^10].

## The deployment stack keeps consolidating

### NVIDIA and AWS push further into production-scale AI

NVIDIA and AWS expanded their partnership across Amazon OpenSearch and EC2, targeting low-latency inference, vector search, and scalable GPU infrastructure for production deployments [^11]. In AWS, OpenSearch Serverless now defaults to GPU-accelerated vector indexing via NVIDIA cuVS with up to 10x faster indexing at one-quarter the cost, while new EC2 G7 instances promise up to 4.6x AI inference performance versus G6 and AWS has reached NVIDIA Exemplar Cloud status for GB300 training workloads [^11].

**Why it matters:** This sits inside a broader concentration of infrastructure advantage: NVIDIA technology now powers more than 400 of the TOP500 supercomputers—81% of the list—and nearly 90% of new entries [^12].

## One research debate worth watching

### LeCun argues real-world AI will need something beyond next-token prediction

Yann LeCun said GPT-style autoregressive systems work well on discrete symbols like language and code but run into a fundamental problem when asked to predict video or physical states, because the space of possible futures becomes mathematically intractable [^13]. He presented JEPA as a non-generative architecture based on abstract representations, and said world models built on top of it could enable objective-driven planning for industrial systems such as power plants, jet engines, and patient treatment planning [^13].

**Why it matters:** Even as current agent products mature, influential researchers are still arguing that the long-term path to real-world AI may require a different architecture. In separate commentary, François Chollet said today’s stack remains several orders of magnitude inefficient and argued symbolic learning is the route to near-optimal AI [^14].

---

### Sources

[^1]: [𝕏 post by @_philschmid](https://x.com/_philschmid/status/2069108134044467487)
[^2]: [𝕏 post by @claudeai](https://x.com/claudeai/status/2069468693017268244)
[^3]: [How Businesses Are Building Specialized AI They Can Trust](https://blogs.nvidia.com/blog/nvidia-agent-toolkit-open-models-tools-skills-secure-runtime-ai-agents)
[^4]: [𝕏 post by @OfficialLoganK](https://x.com/OfficialLoganK/status/2069443708790132996)
[^5]: [𝕏 post by @karpathy](https://x.com/karpathy/status/2069547676849557725)
[^6]: [𝕏 post by @karpathy](https://x.com/karpathy/status/2069601818540392669)
[^7]: [When millions of AI agents meet](https://www.youtube.com/watch?v=V04bm-3d6EQ)
[^8]: [r/MachineLearning post by u/we_are_mammals](https://www.reddit.com/r/MachineLearning/comments/1ue0hlp/)
[^9]: [𝕏 post by @fchollet](https://x.com/fchollet/status/2069483683955028265)
[^10]: [𝕏 post by @fchollet](https://x.com/fchollet/status/2069483131447738768)
[^11]: [NVIDIA and AWS Collaborate to Bring AI to Production at Scale](https://blogs.nvidia.com/blog/nvidia-aws-ai-production-scale)
[^12]: [NVIDIA Powers Over 400 of the World’s 500 Fastest Supercomputers](https://blogs.nvidia.com/blog/top500-green500-supercomputers-isc-2026)
[^13]: [Yann LeCun — Fireside Chat on Open Source & AI | UN Open Source Week 2026 \(Part 1/3\)](https://www.youtube.com/watch?v=Mtu0rZqW7hk)
[^14]: [𝕏 post by @fchollet](https://x.com/fchollet/status/2069457210456019363)