About LLaVA 13B
LLaVA is a large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities and setting a new state-of-the-art accuracy on Science QA.
#multimodal
Specifications
- Provider
- Liuhaotian
- Context Length
- 2,048 tokens
- Input Types
- text, image
- Output Types
- text
- Category
- Llama2
- Added
- 11/16/2023