Official code of the paper "MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments" accepted at CVPR 2025. Operating rooms (ORs) are ...
The most advanced MLLMs (e.g. Gemini-1.5) still struggle to comprehend multimodal documents. All MLLMs exhibit poor performance on image needles. MLLMs fail to recognize the exact number of images in ...