When to Use Voice Recognition


Relying on YouTube’s voice recognition technology resulted in this unfortunate caption.
The real message? “I don’t have to settle on my smartphone for all my internet needs
when I’m out and about.” Yikes. 

Since the early ‘90s, people have been asking if we use voice recognition apps for captioning, subtitling, and transcription.

While we love a new tool, the truth is that voice recognition hasn’t worked for us, although it might for you. Below I’ve listed a few of its limitations, and then shown why they don’t generally jibe with the work we do.


  • Voice recognition applications learn one speaker’s voice and work best on that voice alone.

We transcribe hundreds of different speakers every week.

  • Voice recognition applications don’t like background noise, like music, f/x, or other speakers.

Nearly everything we work on involves these elements.

  • Voice recognition application outputs require loads of hands-on editing and proofing.

We figure since we’ve got our hands on the keyboard, we might as well transcribe (but if you weren’t already a proficient transcriber, you might give voice recognition a try).


If you do decide to try voice recognition for transcription, it’s best to use it for a single speaker: optimally, speaking directly to camera. Someone will need to check the output for errors, and sometimes very funky errors, very carefully. There are obvious concerns, like whether any unusual nouns are spelled correctly, but also whether the voice recognition confused more common words, like ARE and OR. They do sound similar.

Pro Tip: For best results with voice recognition apps, pretend that you are a simultaneous translator at the UN. Repeat each word you hear into your microphone to ensure optimal clarity and to give the application the chance to learn your voice alone.

And, as always, if you need any help or have questions, just give us a call.