Open-Source Model Rivals Commercial Video AI

A new open-source model named JavisDiT++ has been developed for generating semantically aligned audio and video from text prompts. The model's performance is reportedly competitive with leading commercial text-to-video models, signaling a rapid advancement in open-source AI capabilities.

- JavisDiT++ is built upon a Diffusion Transformer (DiT) architecture and introduces a "modality-specific mixture-of-experts" design. This allows the model to efficiently handle audio and video data separately while still enabling them to interact, which improves the generation quality of both. - To achieve precise synchronization between audio and video, the model uses a technique called Temporal-Aligned RoPE (TA-RoPE), which ensures that audio and video tokens are aligned at the frame level. - A key innovation in JavisDiT++ is the use of Audio-Video Direct Preference Optimization (AV-DPO). This is a training method that helps align the model's output with human preferences for quality, consistency, and synchronization. - The model was trained on approximately 1 million public data entries, consisting of 780,000 diversified audio-text pairs and 360,000 high-quality sounding videos. Its creators claim it significantly outperforms previous open-source methods. - The project is part of a broader trend of open-source models rapidly catching up to the capabilities of closed, commercial models like Google's Veo3. - JavisDiT++ was developed by a team of researchers from several institutions, including Zhejiang University, National University of Singapore, and the University of Toronto. - The complete resources for JavisDiT++, including the code, pre-trained model, and the dataset used for training, have been made publicly available to encourage further research and development. - This model is an evolution of a previous version called JavisDiT, which introduced a new benchmark dataset named JavisBench, containing over 10,000 high-quality text-captioned videos for evaluating joint audio-video generation.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.