Back to Models
ViduVidu

Vidu Q1 | Text to Video

Video
Text to Video

This text-to-video model turns clear prompts (and optional reference images) into short, polished clips with natural motion and strong scene consistency. It excels at quick generation for social content, ads, and animation, producing 2–8 second videos at up to 1080p. You can guide characters, backgrounds, and styles (including anime) while keeping details coherent across frames. Multimodal support lets you add background music and sound effects for immersive results. Use concise, descriptive prompts and reference images to lock in appearance and pacing. For best quality, prototype at lower resolution, then render higher. Break complex narratives into shorter shots and stitch them together.

1080P Rapid Generation
Multimodal Generation
Character Consistency
Vidu Q1 | Text to Video

Output Example

Used Prompt

The camera moves very quickly forward along a forest path where it is pouring rain. Leaves on plants shudder from the drops, as the camera moves the fog clears, plants, rocks, moss become clear. High contrast lighting, cinematic ultra realistic style, 8K