TG Telegram Group & Channel
Speech Technology | United States America (US)
Create: Update:

As part of our mission to create open-source datasets for low-resource African languages, Digital Umuganda has released 2,250 hours of open-source Kinyarwanda speech data. To accompany this release, we launched an ASR hackathon on Kaggle, inviting the ecosystem to build models and contribute to shaping the future of low resource language technologies.

Our goal is to collect 10,000 hours each of Kinyarwanda and Swahili speech data. This hackathon is a crucial step in that journey. The feedback will help us refine our data collection strategy for the remaining hours and ensure the datasets meet the needs of developers, researchers, and language advocates across the region.

We would greatly appreciate it if you could share this initiative with your network and help us reach more contributors passionate about language, technology, and open data.

The hackathon is made of 3 tracks

Track A – Small: 540 hours of fully transcribed Kinyarwanda speech.

Track B – Medium: 1180 hours of fully transcribed Kinyarwanda speech.

Track C – Large: 1180 hours of transcribed speech plus 1170 hours of unlabeled Kinyarwanda audio.

For more information you can check the Hackathon website https://digital-umuganda.github.io/kasr_hackathon/

As part of our mission to create open-source datasets for low-resource African languages, Digital Umuganda has released 2,250 hours of open-source Kinyarwanda speech data. To accompany this release, we launched an ASR hackathon on Kaggle, inviting the ecosystem to build models and contribute to shaping the future of low resource language technologies.

Our goal is to collect 10,000 hours each of Kinyarwanda and Swahili speech data. This hackathon is a crucial step in that journey. The feedback will help us refine our data collection strategy for the remaining hours and ensure the datasets meet the needs of developers, researchers, and language advocates across the region.

We would greatly appreciate it if you could share this initiative with your network and help us reach more contributors passionate about language, technology, and open data.

The hackathon is made of 3 tracks

Track A – Small: 540 hours of fully transcribed Kinyarwanda speech.

Track B – Medium: 1180 hours of fully transcribed Kinyarwanda speech.

Track C – Large: 1180 hours of transcribed speech plus 1170 hours of unlabeled Kinyarwanda audio.

For more information you can check the Hackathon website https://digital-umuganda.github.io/kasr_hackathon/


>>Click here to continue<<

Speech Technology




Share with your best friend
VIEW MORE

United States America Popular Telegram Group (US)