Wan-AI"Wan 2.2 A14B Text-to-Video turns detailed text prompts into 5-second 720p videos at 24 fps, delivering cinematic motion and coherent scenes. Built on a diffusion-transformer with a highly compressed VAE, it supports both text-to-video and image-to-video in one workflow and can run on consumer GPUs (e.g., RTX 4090) with memory optimizations. Expect multi-object scenes, temporal consistency, and flexible aspect ratios. For best results, write specific prompts that describe subjects, lighting, motion, and composition. Single-GPU inference may take around 9 minutes; multi-GPU setups accelerate significantly. If VRAM is limited, use offloading and dtype conversion, or try the smaller 5B variant."
