Official code of the paper "MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments" accepted at CVPR 2025. Operating rooms (ORs) are ...
The most advanced MLLMs (e.g. Gemini-1.5) still struggle to comprehend multimodal documents. All MLLMs exhibit poor performance on image needles. MLLMs fail to recognize the exact number of images in ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results