OpenAI Unveils Sora 2 Text-to-Video Model

Published by The Daily Scout

What happened

OpenAI has unveiled Sora 2, its next-generation text-to-video model capable of generating high-quality videos up to one minute long from natural language prompts. The new model features enhanced scene realism, support for multi-character interactions, and complex visual storytelling. Sora 2's capabilities are positioned to compete with or surpass existing tools like Runway, with integrations planned via both API and a browser-based interface.

Why it matters

- The initial version of Sora was first made available to a "red team" of experts to identify potential misuse and harms before a gradual public release to ChatGPT Plus and Pro users in the U.S. and Canada on December 9, 2024. - Sora 2's underlying technology is a diffusion transformer, which generates video by creating and then cleaning up 3D blocks of visual data called "patches," a technique evolved from the architecture of the DALL-E 3 image generation model. - A significant advancement in Sora 2 is the integration of synchronized audio, allowing it to generate not just video but also corresponding dialogue, sound effects, and background music. - OpenAI released an API for Sora 2 on October 7, 2025, moving to a per-second billing model. The "sora-2" standard model is priced at $0.10 per second, while the higher-resolution "sora-2-pro" model costs between $0.30 and $0.50 per second. - To combat misuse, all generated videos contain a visible, moving digital watermark, though reports indicate third-party programs to remove it became prevalent shortly after release. - The model underwent adversarial testing by a specialized "red team" tasked with assessing risks related to misinformation and bias before its public release. - Content moderation for Sora 2 involves a three-layer process that scans text prompts for sensitive terms, reviews uploaded media for copyrighted material, and analyzes individual video frames for policy violations before the final output is available. - In the competitive landscape, Runway is a primary alternative. While Sora 2 is noted for its high degree of photorealism, Runway's platform is often highlighted for offering more direct user control, editing flexibility, and multiple input methods, including text-to-video, image-to-video, and video-to-video transformations.

Key numbers

  • OpenAI has unveiled Sora 2, its next-generation text-to-video model capable of generating high-quality videos up to one minute long from natural language prompts.
  • Sora 2's capabilities are positioned to compete with or surpass existing tools like Runway, with integrations planned via both API and a browser-based interface.
  • Sora 2's underlying technology is a diffusion transformer, which generates video by creating and then cleaning up 3D blocks of visual data called "patches," a technique evolved from the architecture of the DALL-E 3 image generation model.
  • A significant advancement in Sora 2 is the integration of synchronized audio, allowing it to generate not just video but also corresponding dialogue, sound effects, and background music.

What happens next

  • OpenAI has unveiled Sora 2, its next-generation text-to-video model capable of generating high-quality videos up to one minute long from natural language prompts.

Quick answers

What happened in OpenAI Unveils Sora 2 Text-to-Video Model?

OpenAI has unveiled Sora 2, its next-generation text-to-video model capable of generating high-quality videos up to one minute long from natural language prompts. The new model features enhanced scene realism, support for multi-character interactions, and complex visual storytelling. Sora 2's capabilities are positioned to compete with or surpass existing tools like Runway, with integrations planned via both API and a browser-based interface.

Why does OpenAI Unveils Sora 2 Text-to-Video Model matter?

The initial version of Sora was first made available to a "red team" of experts to identify potential misuse and harms before a gradual public release to ChatGPT Plus and Pro users in the U.S. and Canada on December 9, 2024. Sora 2's underlying technology is a diffusion transformer, which generates video by creating and then cleaning up 3D blocks of visual data called "patches," a technique evolved from the architecture of the DALL-E 3 image generation model. A significant advancement in Sora 2 is the integration of synchronized audio, allowing it to generate not just video but also corresponding dialogue, sound effects, and background music. OpenAI released an API for Sora 2 on October 7, 2025, moving to a per-second billing model. The "sora-2" standard model is priced at $0.10 per second, while the higher-resolution "sora-2-pro" model costs between $0.30 and $0.50 per second. To combat misuse, all generated videos contain a visible, moving digital watermark, though reports indicate third-party programs to remove it became prevalent shortly after release. The model underwent adversarial testing by a specialized "red team" tasked with assessing risks related to misinformation and bias before its public release. Content moderation for Sora 2 involves a three-layer process that scans text prompts for sensitive terms, reviews uploaded media for copyrighted material, and analyzes individual video frames for policy violations before the final output is available. In the competitive landscape, Runway is a primary alternative. While Sora 2 is noted for its high degree of photorealism, Runway's platform is often highlighted for offering more direct user control, editing flexibility, and multiple input methods, including text-to-video, image-to-video, and video-to-video transformations.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Published by The Daily Scout - Be the smartest in the room.