Transcription and AI (Artificial Intelligence) - Friend or Foe ?

Mike Cart
Mar 20, 2021
3 min read

It’s a common question, especially with rapid improvements from various technologies EVERY week!

And I understand.

Since we see technology create efficiencies and streamline processes every day, it’s common to think technology will “take over” the transcription industry too.

Yes, artificial intelligence and voice-to-text exists. It’s even used by companies. But it will never replace human transcription, and here are the top six reasons why.

It must be trained to recognize one voice and it doesn't do that accurately

Artificial intelligence is called that for a reason. It can learn over time. If you use Alexa or Google Home or Siri, you know what I mean. The more you use it, the better it gets at recognizing your voice patterns. However, if you’ve ever spent time yelling at the little machine because it’s not hearing or understanding you, you also know there’s a long way to go.

Most transcribers don’t type jobs for a single speaker and only that one speaker for the rest of time. We need to hear accents and work amidst background noise or crosstalk of multiple speakers. AI can’t parse it the way a human can. Additionally, AI takes one pass and that’s it whereas humans will play it over and over and use different headsets and audio tools to get the cleanest file.

It can't ID speakers

Not being able to identify speakers is hugely problematic because much transcription work involves more than one speaker.

I have seen systems that can separate “male speaker” from “female speaker” but given the wide range of voice types, it’s not always accurate, and it does not designate between multiple female or male voices in the same file.

It can't format a document

While every voice-to-text system will have a standard template it will use to generate the transcript, it cannot format a document to the client’s specifications.

Formatting can include everything from indentations to headers/footers to use of bold, italics, and quotation marks.

It can't punctuate properly

If your smartphone generates a transcript every time someone leaves you a voicemail, you know that not only does the AI do a terrible (albeit sometimes hilarious) job of transcribing what was said, it also does a lousy job of punctuating. Punctuation is an art form.

Yes, we have guidelines, but even punctuating written text requires an artful approach. Punctuating the spoken word requires even more deliberation and consideration, which is something AI just can’t do.

It can't research unfamiliar words and phrases

Artificial intelligence hears what it hears. Or more specifically, what it thinks it hears. Think of all the industry specific terms and acronyms that can come up during a chunk of audio - particularly in medical/drug terms.

An auto-generated transcript would use words instead of acronyms and it won’t make any sense.

Research is a huge and necessary part of a transcribers job.

It doesn't understand context

Similar to what I said above regarding words and phrases, context means a lot when transcribing.

For example:

“Let’s eat grandma” and “Let’s eat, Grandma” mean two COMPLETELY different things and require totally different punctuation choices.

As a transcriber, you type the words you hear. But you also need to place them in context which means listening for comprehension. Context determines whether you need to do further research (are they saying an acronym instead of a word) and provides insight to punctuation choices.

Conclusion

There are companies who hire transcribers and proofreaders to clean up auto-generated transcripts. And it’s extremely tedious work - work that could be done faster and more accurately if done by a human transcriber.

BAY RECORDING SERVICES