Wan | v2.2 A14B | Text to Video | Turbo - AI Model

"Wan 2.2 A14B Text-to-Video turns detailed text prompts into 5-second 720p videos at 24 fps, delivering cinematic motion and coherent scenes. Built on a diffusion-transformer with a highly compressed VAE, it supports both text-to-video and image-to-video in one workflow and can run on consumer GPUs (e.g., RTX 4090) with memory optimizations. Expect multi-object scenes, temporal consistency, and flexible aspect ratios. For best results, write specific prompts that describe subjects, lighting, motion, and composition. Single-GPU inference may take around 9 minutes; multi-GPU setups accelerate significantly. If VRAM is limited, use offloading and dtype conversion, or try the smaller 5B variant."

Output Example

Used Prompt

A hero bursts through a metal door, sprinting forward as a massive explosion erupts behind him, fire and debris blasting outward. The camera follows in dynamic motion, showing dust and sparks flying as the blast lights up the scene. In slow motion, the hero dives forward while the fiery glow illuminates his silhouette, creating an intense cinematic escape moment.