Abstract: Audio-visual speaker diarization (AVSD) is a critical technique that segments audio-visual signals and assigns them to multiple speakers in practical scenarios. Thus, how to efficiently ...