TG Telegram Group & Channel
Speech Technology | United States America (US)
Create: Update:

This talk was here already but I watched it again recently and can recommend to revisit it again

Hearing the AGI from GMM HMM to GPT 4o Yu Zhang
November 15th LTI Colloquium Speaker

https://www.youtube.com/watch?v=pRUrO0x637A

Highly recommended:

1. Importance of scale
2. Importance of self-supervised learning for dirty data training
3. Very tricky case with dither seed and self-supervised learning
4. Voice search data is useless
5. Importance of multi-objective training (again)
6. Why readable transcripts (Whisper) better than good WER (RNNT)
7. Discussion on factors of audio and text data for audio LLM training
8. Size of the decoder and size of the encoder

Not always relevant for us gpu-poor guys but very nice overall.

This talk was here already but I watched it again recently and can recommend to revisit it again

Hearing the AGI from GMM HMM to GPT 4o Yu Zhang
November 15th LTI Colloquium Speaker

https://www.youtube.com/watch?v=pRUrO0x637A

Highly recommended:

1. Importance of scale
2. Importance of self-supervised learning for dirty data training
3. Very tricky case with dither seed and self-supervised learning
4. Voice search data is useless
5. Importance of multi-objective training (again)
6. Why readable transcripts (Whisper) better than good WER (RNNT)
7. Discussion on factors of audio and text data for audio LLM training
8. Size of the decoder and size of the encoder

Not always relevant for us gpu-poor guys but very nice overall.


>>Click here to continue<<

Speech Technology






Share with your best friend
VIEW MORE

United States America Popular Telegram Group (US)