This experimental AI focuses on ultra-long inputs with a fine-grained sparse attention method (DSA) that keeps quality high while reducing compute and cost. It’s designed for large documents, big codebases, and multi-step workflows in a single call, with support near ~128K tokens. You can enable “thinking” or fast response modes via prompt templates, making it flexible for reasoning, code generation, tool calls, and agent tasks. While DSA cuts token-to-token overhead, long contexts still require careful memory and KV-cache planning. Open weights and code under an MIT license let teams localize and fine-tune for research or cost-sensitive production systems.
