# Math Breakthroughs, Factory AI, and the Move Beyond Brute-Force Scaling

*By AI News Digest • April 29, 2026*

Today’s digest centers on applied capability: OpenAI described rapid progress in math research, NVIDIA pushed open multimodal AI deeper into factories, and new benchmark work argued that correctness matters more than JSON validity alone. DeepMind’s Korea partnership and more practical image-generation workflows rounded out the picture.

## Today’s signal

A lot of today’s news pointed the same way: AI progress is being judged less by raw scale alone and more by **useful work**—solving harder math, staying correct in structured tasks, handling multiple modalities in real systems, and producing assets people can use immediately [^1][^2][^3][^4][^5].

### OpenAI says math models are crossing into research work

OpenAI said GPT-5.4 Pro helped solve a 60-year-old Erdős problem, and researchers on the OpenAI Podcast described a sharp jump from routine failures in early 2025 to gold-medal performance at the International Math Olympiad, day-to-day help for Fields Medalists, and more than 10 genuinely new combinatorics results that are publishable in top journals [^6][^2]. Ernest Ryu also said he resolved a 42-year-old optimization question after about 12 hours of back-and-forth with ChatGPT, with the model proposing ideas and Ryu acting as verifier and guide [^2].

**Why it matters:** OpenAI is presenting math as a proving ground for longer reasoning horizons: the podcast framed current progress as a move toward systems that can think for days today, and eventually weeks or months, in support of an *automated researcher* model [^2].

### NVIDIA pushes multimodal AI closer to production environments

NVIDIA launched Nemotron 3 Nano Omni, an open multimodal model spanning video, audio, image, and text, saying it tops six leaderboards and can deliver up to 9x higher throughput than comparable open omni models through its 30B-A3B hybrid mixture-of-experts design [^4]. NVIDIA also argued that manufacturing has entered a simulation-first phase, with high-fidelity synthetic data enabling production-grade physical AI; it cited ABB reaching 99% sim-to-real accuracy and cutting commissioning time by up to 80%, while JLR reduced a four-hour aerodynamics simulation step to one minute [^7].

**Why it matters:** The notable shift is not just a new model release. It is the combination of open multimodal agent tooling with concrete deployment paths in computer-use agents, document intelligence, audio-video workflows, and factory operations [^4][^7].

### A new benchmark argues that valid JSON is not enough

The Structured Output Benchmark proposes measuring exact leaf-value accuracy, faithfulness, and perfect-response rates, rather than treating schema validity and type safety as the main success criteria [^3]. Its early results say most models clear 90%+ JSON pass rates but still drop sharply on value accuracy, and the release says open-source GLM 4.7 ranks second behind GPT 5.4 [^3].

**Why it matters:** This lines up with a broader shift in how experts are talking about progress. Sara Hooker argued that recent returns on compute look better in post-training, alignment, data targeting, and gradient-free learning than in brute-force model growth alone [^1].

> "It is the slow death of brute force scaling alone. innovation now lies in how a model interacts with the world." [^1]

### DeepMind’s Korea push ties AI progress to science, safety, and robotics

Demis Hassabis said Google DeepMind is partnering with Korea on AI for science work including materials science and weather prediction, youth education, and international safety standards, building on Korea’s role in hosting last year’s AI summit [^8]. In the same interview, he said Gemini’s multimodality puts physical AI on the threshold of major breakthroughs in factories, automotive settings, homes, and automated labs, and pointed to ongoing ties with Samsung, Hyundai, and SK Hynix [^8].

**Why it matters:** This looks like more than a ceremonial visit. It connects frontier AI work to a country that Hassabis described as well positioned in robotics, manufacturing, mobile devices, and chips, and he separately said Korea has a leading part to play in AI safety and AI for science [^8][^9].

### Image generation looks more like a work tool than a novelty feature

OpenAI’s ChatGPT Images 2.0 was described as materially more useful for practical tasks such as slide decks, multi-image carousels, storyboards, content calendars, and accurate visual explainers [^5]. Matt Wolfe showed it pulling context from URLs to build ads, real-estate flyers, and infographics from source pages, while Greg Brockman highlighted product ideas being shared internally through image generation and a one-shot Codex app screen mockup [^5][^10].

**Why it matters:** The emerging use case is less about standalone art and more about fast design, marketing, and product-spec work that can move from prompt to working asset in one step [^5][^10].

---

### Sources

[^1]: [𝕏 post by @sarahookr](https://x.com/sarahookr/status/2049332831512641872)
[^2]: [What happens now that AI is good at math? — the OpenAI Podcast Ep. 17](https://www.youtube.com/watch?v=9-TVwv6wtGQ)
[^3]: [r/MachineLearning post by u/404llm](https://www.reddit.com/r/MachineLearning/comments/1syepnz/)
[^4]: [NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents](https://blogs.nvidia.com/blog/nemotron-3-nano-omni-multimodal-ai-agents)
[^5]: [ChatGPT Images Just Got Way Better \(Here's Why\)](https://www.youtube.com/watch?v=2u_T68P9H_U)
[^6]: [𝕏 post by @OpenAI](https://x.com/OpenAI/status/2049182118069358967)
[^7]: [Into the Omniverse: Manufacturing’s Simulation-First Era Has Arrived](https://blogs.nvidia.com/blog/manufacturing-simulation-first)
[^8]: ["우리 애들은 AI 초능력자 될 것" 하사비스 KBS 독점 인터뷰 풀영상 / KBS 2026.04.29.](https://www.youtube.com/watch?v=kSNGVVnW1uw)
[^9]: [𝕏 post by @demishassabis](https://x.com/demishassabis/status/2049291051266142574)
[^10]: [𝕏 post by @TheRohanVarma](https://x.com/TheRohanVarma/status/2048985585000563009)