OpenAI Releases Sora 2.0 Video Generation Model
OpenAI has released Sora 2.0, an upgraded text-to-video model capable of generating high-fidelity, temporally consistent videos. The model is described as a "physics engine for the world" due to its ability to simulate plausible physical dynamics and interactions. Sora 2.0 is now accessible via API and a new independent social application, signaling a move toward broader availability for multimodal AI at the edge.
- Sora 2's architecture is a diffusion transformer (DiT), which processes video by breaking it down into spatiotemporal patches, analogous to how language models use tokens. This approach replaces the U-Net architecture common in earlier diffusion models and allows for better scaling and temporal consistency. - A key advancement in Sora 2 is the integration of synchronized audio generation, allowing the model to create dialogue that matches lip movements, along with sound effects and ambient noise that align with the video content. - The new "Sora" iOS and Android application incorporates a social media feed and a "cameo" feature, which lets users insert their own face and voice into generated videos after a one-time identity verification and likeness capture. - Video length has been extended from the initial version; free users can now generate clips up to 15 seconds, while "Sora 2 Pro" subscribers can create videos up to 25 seconds long. - The model was released on September 30, 2025, with initial access rolling out in the United States and Canada via the new mobile app and a web interface. - OpenAI is offering multiple tiers, including a free version, a higher-quality "Sora 2 Pro" for ChatGPT Pro subscribers, and the faster "sora-2" model available through the API, which is optimized for rapid iteration. - The release of Sora 2 places OpenAI in direct competition with other major text-to-video models like Google's Veo 3, Runway, and Luma Labs' Dream Machine, which are also rapidly advancing in capabilities.