Back to Models
Wan-AIWan-AI

Wan | v2.2 A14B | Text to Video | Turbo

Video
Text to Video
Image to Video
Enhance / Upscale

"Wan 2.2 A14B Text-to-Video turns detailed text prompts into 5-second 720p videos at 24 fps, delivering cinematic motion and coherent scenes. Built on a diffusion-transformer with a highly compressed VAE, it supports both text-to-video and image-to-video in one workflow and can run on consumer GPUs (e.g., RTX 4090) with memory optimizations. Expect multi-object scenes, temporal consistency, and flexible aspect ratios. For best results, write specific prompts that describe subjects, lighting, motion, and composition. Single-GPU inference may take around 9 minutes; multi-GPU setups accelerate significantly. If VRAM is limited, use offloading and dtype conversion, or try the smaller 5B variant."

Gpu Optimized Inference
Diffusion Transformer Clips
720P Output
Wan | v2.2 A14B | Text to Video | Turbo

Output Example

Used Prompt

A hero bursts through a metal door, sprinting forward as a massive explosion erupts behind him, fire and debris blasting outward. The camera follows in dynamic motion, showing dust and sparks flying as the blast lights up the scene. In slow motion, the hero dives forward while the fiery glow illuminates his silhouette, creating an intense cinematic escape moment.