Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
https://arxiv.org/abs/2503.01174
https://x.com/Sid_Arora_18/status/1897315720205328593
As usual, baseline cascaded system is intentionally weak. Whisper tiny as a baseline???
>>Click here to continue<<
