https://github.com/fluxions-ai/vui
https://huggingface.co/fluxions/vui
got some attention recently. Multispeaker TTS model with context (like Dia) 100m params
DIA vs vui
- vui 16x smaller
- Unlimited render length, dia 30 seconds
- vui has 150ms latency, time to first byte
- vui runs in <5gb VRAM
- 4x faster codec
- 1/2 the number of people
- built with google cloud tpus, vs two 4090's in a basement.
- 7x faster RTF
>>Click here to continue<<
