Ultimate Voice Remover for a 5m 24s audio track took 4m 40s to process using CPU only on Mac M1. Once I checked GPU then it took 57s. So GPU is only way to go 5x realtime compared to about 1.2x realtime.
download video only and audio only from youtube using yt-dlp --list-formats "someyoutubeURL"
download 720p video onlyyt-dlp -f 136 "https://www.youtube.com/watch?v=SOMEVIDEO"
download m4a 128k audio onlyyt-dlp -f 140 "https://www.youtube.com/watch?v=SOMEVIDEO"
now after removing music from audio just mux audio with original videoffmpeg -i "video_only.mp4" -i "audio_(Vocals).mp3" -c:v copy -c:a copy -map_metadata 0 -shortest output.mp4
>>Click here to continue<<
