Mozilla has revealed an open speech dataset and a TensorFlow-based transcription engine. Mozilla floated “Project Common Voice” back in July 2017, when it called for volunteers to either submit samples of their speech or check machine translations of others’ utterances. The project has since collected 500 hours of samples (in the longer term, Common Voice wants 10,000 hours), comprising 400,000 recordings made by 20,000 people. The project’s Michael Henretty wrote that “most of us only have access to fairly limited collection of voice data; an essential component for creating high-quality speech recognition engines”. Even limited non-free data sets cost “upwards of tens of thousands of dollars”. Mozilla’s Sean White wrote that the job of extending Common Voice beyond English will begin in the first half of 2018. Common Voice is available for download here, and if developers need more open source speech datasets, Mozilla helpfully links four other sets it was able to identify: LibriSpeech, the TED-LIUM Corpus, VoxForge, and Tatoeba. Mozilla also announced an associated transcription effort based on Baidu’s Deep Speech speech recognition project. Read more here…

thumbnail courtesy of theregister.co.uk