The codebase is benchmark code for audio-visual sound event localization and detection (SELD) in STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations ...