Ltx v2 | Text to Video - AI Model

LTX‑2 is an open‑source DiT‑based video foundation model from Lightricks that generates synchronized audio and video up to native 4K at 50 fps. Built for real production, it supports multimodal inputs (text, image, audio, depth, reference video) and 10‑second clips with real‑time or near real‑time generation. Advanced controls—multi‑keyframe conditioning, 3D camera logic, and LoRA fine‑tuning—enable precise motion, shot design, and consistent style across projects. Start in Fast mode for previews, then switch to Pro/Ultra for final fidelity. Clear prompts and multimodal guidance improve coherence; expect occasional A/V alignment edge cases. Runs efficiently on consumer GPUs, cutting compute costs by ~50%.

Output Example

Used Prompt

A lone fisherman sits quietly in a small wooden boat on a calm sea at sunrise. The camera remains mostly steady, focused on the gentle movement of the water and the fisherman’s slow, deliberate actions. He casts his line into the still water, ripples spreading softly across the golden surface. A few seagulls glide past in the distance. The air is hazy with morning light, warm pink and orange tones reflecting on the waves. The fisherman waits patiently, the sound of water and light breeze creating a peaceful rhythm. Minimal camera motion, cinematic lighting, ultra-realistic 4K visuals, natural and contemplative mood.