Transcribe Speech To Text With Linux And Google

Photo by PublicDomainPictures (Pixabay)

Sometimes in life, you run into situations where turning a voice recording into a text document is necessary. Perhaps this is from an interview for a news publication or perhaps you need to transcribe a verbal lecture from school. On Windows and OS X, there are a number of software programs that can help with this. Yet for Linux users, the options feel a bit sparse by comparison.

Today’s tip will address this issue. In this tip, I’ll show you how to combine Google’s Web Speech API with the Linux sound management server, PulseAudio.

Ready to get started? Great, here’s what you’re going to do:

1) Install pavucontrol (PulseAudio Control). It’s available from most software repositories.

2) Open pavucontrol (PulseAudio Control), click into the Input Devices tab. At the bottom, set Show to Monitors. Select the monitor that reflects the audio device you’ll be listening from by clicking the box next to the padlock on the right side. In my case, this was the USB speakers.

3) Now goto the Output Devices tab, make sure the matching output device is selected by clicking the box next to the padlock on the right side. Leave this app open, for troubleshooting.

PulseAudio Volume

4) Install/Open Chrome, browse to Google’s Web Speech API Demonstration page.

5) Now open up your audio player that will play the audio file. Get ready to play the audio file, but don’t hit play just yet.

6) Back on the API Demonstration page in Chrome, click on the microphone icon in the right center of the page.

7) Now in the audio player, hit play.

If everything went well, you should start seeing text appear on the Chrome page. If it isn’t working, re-check your settings. Another reason why it might not work is because of music or other noises in the background making voice audio difficult to detect.

Bonus fun: This also makes for a fun game of Mad Libs, by using a separate tab for YouTube podcasts. Some of the results are quite funny!

Also check out...

Matt Hartley
Matt Hartley
Founder at Freedom Penguin
Freedom Penguin’s founder & talking head – Matt has over a decade working with Linux desktops, his operating system experience consists of both Windows and Linux operating platforms. In addition to writing articles on Linux and open source technology for Datamation.com and OpenLogic.com/wazi, Matt also once served as a co-host for a popular Linux-centric podcast.

Matt has written about various software titles, such as Moodle, Joomla, WordPress, openCRX, Alfresco, Liferay and more. He also has additional Linux experience working with Debian based distributions, openSUSE, CentOS, and Arch Linux.

5 Responses to “Transcribe Speech To Text With Linux And Google

  • Lovely! Worked like a charm on my old iMac running openSUSE.

  • Going to test this out sometime in the next 24 hours. Wondering how this would work with teamspeak, and other apps, since I’m deaf, and speech-to-text is kinda something I’ve been wanting for a long time.

    • *Please* update us here and let us know how you testing goes. I had success with WAVs and MP3s. Expect some missed words, but generally speaking it’s pretty good at getting stuff right. Key is clear, understandable English. I tested this with some podcasts and because the spoken tracks were all over the place with interruptions, etc, it wasn’t as accurate as a lecture or a phone call. Hope this helps.

      • Well, It works for the most part. It does transcribe what’s being said on Teamspeak, although I have to admit the guy talking on TS was a little fast and the API was skipping words making it look like ghetto speak lol.

        The only issue I see with this set up is the time-out on the microphone at the website. I literally have to click the microphone button to make it work again. Wonder how this works with VOIP solutions too.. Might be a potential solution rather than using the captel phones I’ve been seeing lately. Many thanks for this how-to, matthartley!!!

Leave a Reply