MENU

Why the human ear is still better at transcribing than Siri

December 5, 2018 • General, Top Stories

Why the human ear is still better at transcribing than Siri

Why the human ear is still better at transcribing than Siri

Despite an improvement in accuracy of 80% over the last decade, human-driven transcription solutions are still more accurate than AI solutions.

In recent years, the quality in artificial intelligence-driven transcription services has improved considerably. Between 2016-2017, Microsoft reported it had reduced its error rate by 12%, which puts the accuracy of its automated transcriptions currently at 94.9%.

Automated transcription is where the a video or audio file is converted into text file by voice recognition software. However, despite the emergence of AI-driven apps such as Trint, Otter, and Wreally, which are growing in popularity due to the improvements made in automated voice-to-speech technology, both language and tech experts believe that automated transcription services will not replace human-driven services at any time in the near future.

The biggest advantage that a human transcriber has over their automated counterpart is that the human ear is more attuned to a series of external factors which makes it likely to misunderstand, omit, or completely skip words.

This is due to a number of variables, starting with the fact that humans are more adept at filtering out background noise. Meanwhile, AI-driven services can produce a transcript with an error rate of 12%, even with a clean audio script. Humans, growing up around different cultural contexts and languages showed to be more adept at identifying different accents than a machine.

Earlier in 2018, a study was carried out which found Amazon Alexa and Google Assistant to have problems with accuracy when identifying different accents irrespective of how fluent the speaker’s English might be – accuracy dropped by 2.6% with speakers with a Chinese accent and by as much as 4.2% for Spanish accents.

The difficulties experienced by AI-driven services comes from the fact that they have a dictionary-based vocabulary, which means they are only able to understand a series of short commands and a limited amount of words – therefore they are unable to recognize different accents, interlocked speech, or colloquial and slang terms. This presents them at a significant disadvantage over human-driven competitors who are able to achieve accuracy rates of between 99% – 100%.

“We cannot doubt the fact that the advancements AI has made in the transcription sphere in recent years is phenomenal,” commented Peter Trebek, the CEO of freelancer-driven transcription service provider, GoTranscript. “However, with error rates over 5%, there is still some considerable improvements to be made until AI-driven solutions are considered to be on par with human transcribers.”

Without doubt, AI-driven solutions have made significant gains over the last decade with experts claiming up to an 80% reduction rate in transcription errors over the last decade. However, programmers will need to work closer with language experts to bring them on par with human-driven transcribers and eliminate the current linguistic deficits.

Edited by Daniëlle Kruger
Follow Daniëlle Kruger on Twitter
Follow IT News Africa on Twitter

Comments

comments


Leave a Reply

Your email address will not be published. Required fields are marked *

« »