My research is focused on developing advanced multimodal models capable of enhancing their intelligence through interactions with humans and the real world.
News! Check out our Vision Arena demo on HuggingFace! You can directly chat with or compare the large multimodal models (GPT4-V, Gemini-Pro Vision, LLaVA-NEXT 34b, QwenVL Chat, etc.) side by side easily!