About Kimi VL A3B Thinking
Kimi-VL is a lightweight Mixture-of-Experts vision-language model that activates only 2.8B parameters per step while delivering strong performance on multimodal reasoning and long-context tasks. The Kimi-VL-A3B-Thinking variant, fine-tuned with chain-of-thought and reinforcement learning, excels in math and visual reasoning benchmarks like MathVision, MMMU, and MathVista, rivaling much larger models such as Qwen2.5-VL-7B and Gemma-3-12B. It supports 128K context and high-resolution input via its MoonViT encoder.
Specifications
- Provider
- Moonshotai
- Context Length
- 131,072 tokens
- Input Types
- image, text
- Output Types
- text
- Category
- Other
- Added
- 4/10/2025