About MiMo-V2.5
MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding tasks. Its 1M context window supports complete documents, extended conversations, and complex task contexts in a single pass, making it ideal for integration with agent frameworks where strong reasoning, rich perception, and cost efficiency all matter.
Specifications
- Provider
- Xiaomi
- Context Length
- 1,048,576 tokens
- Input Types
- text, audio, image, video
- Output Types
- text
- Category
- Other
- Added
- 4/22/2026