AI Video Generator

Keyframes

Optional

Start Frame

End Frame

Upload JPG/PNG/WEBP images up to 10MB, with a minimum width/height of 300px.

Prompt

Model

Aspect Ratio

16:9

Resolution

480p

Duration

Audio

Inspiration

Wan 2.5 Preview Audio-Driven AI Video Generator

Experience Alibaba's next-gen Wan 2.5 Preview, now faster with enhanced motion. It turns text, images, and audio into 1080p videos with perfect lip-sync and cinematic quality. Let your voice drive the story on Cuty.ai. Try free!

Key Features

Discover what makes Wan 2.5 Preview exceptional

Groundbreaking Audio-Driven Lip-Sync

Provide an audio clip, and Wan 2.5 Preview animates a static character image to speak with incredibly realistic and natural expressions and mouth movements. Revolutionizes workflows for narration, dialogue, virtual presenters, and digital humans.

True Multimodal Input Flexibility

Wan 2.5 Preview supports text, image, and audio inputs for true multimodal creation. Generate video from descriptions or images, or use audio as a groundbreaking starting point. Unprecedented freedom to start creating with any asset you have on hand.

Cinematic Realism & Enhanced Motion

Wan 2.5 Preview pursues cinematic realism with enhanced motion dynamics and stability. Subjects remain highly consistent, avoiding distortion or jitter. It also better interprets complex prompts, including cinematic camera moves like panning, zooming, and focus shifts.

Up to 10s Duration & Multi-Resolution

Supports generating videos up to 10 seconds long for more complete narratives. Offers multiple output resolutions (480p, 720p, 1080p) to match platform needs. Choose the perfect clarity for your project on Cuty.ai. For pure motion transfer, see Wan 2.2.

Frequently Asked Questions

Everything you need to know about Wan 2.5 Preview

Wan 2.5 Preview is Alibaba's next-gen multimodal AI video model. Its key breakthrough is audio-driven video generation, creating realistic 1080p videos of characters speaking with perfectly synchronized lip-sync and natural facial expressions.

Wan 2.5 Preview also features enhanced motion dynamics for more fluid movement, improved contextual understanding for complex prompts, richer visual details in scene composition, and often faster processing times compared to earlier general video models.

Wan 2.2 focuses on motion transfer (animation/replacement) from a reference video. Wan 2.5 Preview focuses on lip-sync and animation driven by a reference audio file. Use 2.2 to make characters dance; use 2.5 to make them talk.

Yes, this is the perfect use case for Wan 2.5 Preview. Provide a static character image and an audio clip of their speech, and the model generates a 1080p video with realistic expressions and accurate lip-syncing.

Upload standard audio clips (e.g., MP3, WAV) containing narration, dialogue, or any human voice. Wan 2.5 Preview uses this audio as the driver to animate the character's facial expressions and mouth movements from your image.

Wan 2.5 Preview supports generating videos up to 10 seconds long, ideal for short-form content, product narrations, and social media. It supports 480p, 720p, and 1080p HD resolutions, all easily accessible on Cuty.ai.

You can try Wan 2.5 Preview's groundbreaking lip-sync feature on Cuty.ai with our free trial credits. For generating longer videos, using 1080p resolution, and other premium features, you can upgrade to one of our subscription plans.

Ready to create with Wan 2.5 Preview?

Start generating amazing content with our powerful AI models. Try it free today!