OpenAI Unveils Sora 2 Text-to-Video Model
OpenAI has introduced Sora 2, a new generative AI model capable of creating realistic, minute-long videos from detailed text prompts. The model is positioned as a foundation for AI-powered digital storytelling, showing significant improvements in temporal consistency, motion, and prompt adherence. Sora 2 is seen as setting a new benchmark in the text-to-video space, with capabilities that could impact content creation, marketing, and user-generated video applications.
- A developer API for Sora 2 was launched on October 7, 2025, operating on a per-second billing model with prices starting at $0.10 per second for video generation. Access to the API is currently limited, requiring developers to apply and join a waitlist. - A key technical advancement over its predecessor is the integration of synchronized audio generation, allowing the model to create dialogue, sound effects, and ambient noise that match the video content. The model also demonstrates a much-improved understanding of real-world physics and object permanence. - The public version of Sora 2 can generate videos up to 25 seconds long at 1080p resolution, a significant increase from the original model's shorter clip limit. While internal versions have generated minute-long videos, this is not yet a public-facing feature. - A new "Cameo" feature allows users to insert their own likeness into videos, though it requires a one-time identity verification process as a safety measure. This is part of OpenAI's response to concerns about the model's potential for generating deepfakes. - The model's release prompted immediate ethical and legal concerns regarding deepfakes, copyright, and misinformation. In response, OpenAI embedded provenance signals, including visible watermarks and C2PA metadata, to help identify AI-generated content. - In direct comparisons with competitors like Google's Veo 3.1, Sora 2 is noted for its ability to create realistic ambiance, while Veo has demonstrated superior performance in modeling how sound behaves realistically across different environments. - The model is already being positioned to disrupt traditional video production workflows by drastically reducing costs for tasks like creating storyboards, promotional materials, and educational content. One case study reported a 95%+ reduction in production costs compared to traditional methods.