OpenAI released a new generative text-to-video model named Sora in a press release yesterday. Its diffusion models are able to convert brief written descriptions into one-minute high-definition video snippets.
TakeAway points:
- OpenAI released a new generative text-to-video model, Sora.
- Sora is a diffusion model that can convert a brief written description into a one-minute high-definition video clip.
- However, this model has constraints, like having trouble simulating complex space accurately, unable to recognize a few examples of causation and effect, amongst others.
- OpenAI is currently working with a team of red teamers to test the model prior to making Sora available to OpenAI users.
Sora Diffusion Model
The Sora diffusion model begins with a movie that has the appearance of static noise. The output gradually changes over a number of steps as the noise is eliminated.
OpenAI has overcome the difficult problem of keeping subject consistency even when it briefly vanishes from view by giving the model the simultaneous foresight of many frames.
Images and videos are represented as patches, collections of smaller units of data. OpenAI diffusion transformers are on a variety of data with varying durations, resolutions.
Because Sora makes use of DALL-E3’s recaptioning capabilities, the model closely complies with the user’s text instructions.
Additionally, OpenAI’s Sora is capable of producing complex scenes with multiple actors, unique motion, and accurate subject and background delineations.
“The model understands not only what the user has asked for in the prompt but also how those things exist in the physical world,”
OpenAI states.
Apart from creating long videos from text, Sora can also take the components of a still image and animate them, adding frames where necessary and extending the length of videos that have already been created.
Again, Sora can also take a still image and apply motion to it, completing any gaps in between frames and making previously created videos longer.
Constraints of Sora
OpenAI stated that there are recognised flaws in the existing paradigm, such as:
- Having trouble simulating complex space accurately
- Recognize a few examples of causation and effect
- Confuse spatial details of a prompt
- Accurate accounts of occurrences throughout time
Safety Considerations of Sora
In the press release, OpenAI has stated that they will not only leverage existing safety methods leveraged for the release of DALL-E3 but also go one step further to build tools to detect misleading content, including a detection classifier that can identify a video generated by Sora.
OpenAI is currently working with a team of red teamers to test the model prior to making Sora available to OpenAI users. These “red teamers” are subject-matter specialists who are acquainted with bias, hate speech, and false information.
However, once the model is released in OpenAI’s products, C2PA metadata will be included and monitored by their text and image classifiers. Input prompts that violate their usage policy will be rejected, and video outputs will be reviewed frame by frame.
OpenAI also stated they will involve legislators, educators, and artists to understand concerns and pinpoint use cases for the model.
“We are working with red teamers – domain experts in areas like misinformation, hateful content, and bias – who will be adversarially testing the model, we’re also building tools to help detect misleading content, such as a detection classifier that can tell when a video was generated by Sora,” it added.
the company said.
Meanwhile, OpenAI’s new release has put on conversations on social media. While some people appreciate this development, some others think it still needs more improvement.