OpenAI Launches GPT-5.4, Pushing Agentic AI

OpenAI just dropped GPT-5.4, a new flagship model squarely aimed at enterprise and professional work. It's reportedly outperforming rivals like Anthropic's Opus 4.6 and Google's Gemini 3.1 Pro on key benchmarks. The biggest leap is in its "agentic" capability, designed to power autonomous multi-step workflows, signaling a major shift from single-task AI to full-pipeline automation.

OpenAI's latest release is the first general-purpose model from the company with native computer-use capabilities, allowing it to operate applications by issuing keyboard and mouse commands. This is a significant step in agentic AI, enabling the model to move beyond generating content to autonomously executing tasks across a user's machine. The "agentic" shift is from passive, single-prompt responses to proactive, goal-driven systems that can plan and execute multi-step solutions. These AI agents can interact with external tools and software, manage tasks over extended periods, and adapt their strategy based on new information. For newsrooms, this could mean automating workflows like sourcing information, generating summaries, and preparing clips for various platforms. On key benchmarks, GPT-5.4 "Thinking" shows strong performance, scoring 75.0% on OSWorld-Verified for desktop navigation, which surpasses the reported human performance of 72.4%. It also achieved 83.0% on the GDPval benchmark, which tests capabilities across 44 professional occupations, outperforming Anthropic's Opus 4.6. The model also introduces a "tool search" feature in its API, designed to lower costs and latency in complex agentic systems. By allowing the model to look up tool definitions as needed rather than loading them all upfront, OpenAI claims a 47% reduction in token usage in its tests. This new agentic capability has direct implications for video processing. AI agents can now be orchestrated to manage an entire video production pipeline, from analyzing raw footage and generating a narrative to creating assets and assembling the final product. Frameworks are emerging, like NVIDIA's, that use AI agents for multi-step reasoning over video streams. However, scaling this level of AI-driven video processing introduces significant infrastructure challenges. AI agents perform non-linear, high-throughput data access, which can create bottlenecks if storage isn't optimized for it. This requires a shift from traditional storage to systems that prioritize high-speed frame access and rich metadata to prevent "GPU starvation," where expensive processors sit idle waiting for data.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.