Improving AI for Atypical Voices is a Win for All of Us

Improving AI for Atypical Voices is a Win for All of Us

One of the things I love about working in the AI space is the positive impact it can have on the day to day lives of individuals. I recently came across an episode of the Wall Street Journal podcast “The Future of Everything” that discussed ways AI voice assistants can help people with atypical voices (e.g., those with conditions like Parkinson’s disease and muscular dystrophy) better use devices like Apple’s Siri and Amazon’s Alexa. The ultimate question the podcast raises is: “What will it take to create voice assistants that work for everyone right out of the box?”

Engaging with the World

One of the guests on the podcast is Michael Cash, a 49 year old with cerebral palsy. He has never had trouble communicating with friends and family, but he says it took many years of speech therapy to feel confident speaking with new people. 

One recurring frustration he speaks to is that Siri and Alexa have significant trouble understanding him. While Google claims that the error rates of its automatic speech recognition can be less than 10%, for people with neurologic or voice-specific conditions, the error rates can easily be 50% or even as high as 90%.

Evolving Methodology

According to Mark Hasegawa, a professor of Electrical and Computer Engineering at the University of Illinois Urbana- Champaign, it wasn’t until 2014 that the first commercially viable end-to-end neural network automatic speech recognizer was published. This enabled us to predict the speech sounds coming up based on the speech sounds that came before. That, and a database called Librispeech curated by Johns Hopkins containing hundreds of hours of amateur audiobook recordings. 

However, while the larger datasets like Librispeech and end-to-end neural network training systems brought this technology into public use, the algorithms still had difficulty understanding the speech patterns of people with atypical voices–the same difficulties humans often have. Therefore, companies have had to seek new ways to mitigate this issue and make the tech accessible and usable for everyone. 

voice AI, voice artificial intelligence

New Approaches 

Michael Cash, referenced above, works at a company called Voiceitt. In August 2023, Voiceitt released a product called Voiceitt2 that allows users to train a voice assistant themselves in order to unlock a whole array of functionalities. These include features like transcription and dictation; meaning users can write notes, documents, and emails using their voice. And users can interact with ChatGPT as well via the app. 

But most exciting is that Voiceitt can now create real-time transcripts of what people are saying and integrate that into workplace software like Webex. Sara Smolley, co-founder of Voiceitt, describes this as “what a ramp was to an office building, Voiceitt is to today’s remote workplace.”

Voice AI That’s More Accessible for Everyone 

One thing that Google, Amazon, and Voiceitt all agree on is that advancing the tech in terms of accessibility will result in lower error rates and improved functionality for all users. But setting aside universal benefit, this is essential work for users with disabilities–which means it’s essential to us as a society. According to the UN, an estimated 15% of the world’s population, or 1 billion people, live with a disability. And quoting from a 2020 paper published by the National Institute of Health, 

“People suffering from motor and cognitive impairments would significantly benefit from the possibility of controlling home appliances and personal devices remotely. Voice assistants hold the potential to enable individuals with disabilities to govern their houses without the need to constantly depend on caregivers.”

I see this as a call to action for all of us to keep pushing the frontiers of accessibility. This is one of the many reasons I’m proud to work at Vapi. Our continued ML work around improving models for conversational intelligence is enabling better and better speech recognition. And we’re already supporting 70+ different languages, with more in the pipeline. I sincerely believe, despite the naysayers, that AI can, is, and will continue to make a positive impact on people’s lives.