Transcribe Speech To Text With Linux And Google

Sometimes in life, you run into situations where turning a voice recording into a text document is necessary. Perhaps this is from an interview for a news publication or perhaps you need to transcribe a verbal lecture from school. On Windows and OS X, there are a number of software programs that can help with this. Yet for Linux users, the options feel a bit sparse by comparison.

Today’s tip will address this issue. In this tip, I’ll show you how to combine Google’s Web Speech API with the Linux sound management server, PulseAudio.

Ready to get started? Great, here’s what you’re going to do:

1) Install pavucontrol (PulseAudio Control). It’s available from most software repositories.

2) Open pavucontrol (PulseAudio Control), click into the Input Devices tab. At the bottom, set Show to Monitors. Select the monitor that reflects the audio device you’ll be listening from by clicking the box next to the padlock on the right side. In my case, this was the USB speakers.

3) Now goto the Output Devices tab, make sure the matching output device is selected by clicking the box next to the padlock on the right side. Leave this app open, for troubleshooting.

PulseAudio Volume
PulseAudio Volume

4) Install/Open Chrome, browse to Google’s Web Speech API Demonstration page.

5) Now open up your audio player that will play the audio file. Get ready to play the audio file, but don’t hit play just yet.

6) Back on the API Demonstration page in Chrome, click on the microphone icon in the right center of the page.

7) Now in the audio player, hit play.

If everything went well, you should start seeing text appear on the Chrome page. If it isn’t working, re-check your settings. Another reason why it might not work is because of music or other noises in the background making voice audio difficult to detect.

Bonus fun: This also makes for a fun game of Mad Libs, by using a separate tab for YouTube podcasts. Some of the results are quite funny!

Introducing Jed Reynolds

I’m honored that Matt Hartley asked me to contribute to Freedom Penguin. In upcoming articles, I’ll be sharing some stories about things I’ve learned during my journey as a professional programmer analyst. I program in multiple languages like Java, Perl, bash, and also write end user documentation in PHP and do my own CSS.

I fix my own computers, and pride myself in being mechanically competent. I also fix my own bikes. (But don’t fix people.) For example, back in college around 1994, one of my friends who was getting a job asked me advice about getting a car. She probably asked the wrong person: I said  that all the cars I’ve owned, or been around, all needed work. Water pumps, starter motors, plug  changes and so forth. She was quite anxious after I went quiet on the topic. She asked, “I can’t  spend all my time fixing cars, I need to get to work. Will anything just get me to work?” I then told her, “Well, a Toyota will. They won’t break a lot.”

(Surprisingly, the Toyota’s I have owned have done pretty well by me. I own a 2000 Sienna that keeps on ticking. Real guzzler by modern van standards, however the thing could tow a boat.)

And that’s (almost) how I think of Linux. It’s pretty dependable, but you’ve gotta roll up your sleeves once in a while. My family all have Ubuntu laptops. And as a result, I don’t have to worry about adware. Also stuff is easy to back up. The kids can game on these PCs, and only game just enough that they’re not bugging me for those fancy water cooled rigs. Luckily for me, I can keep the family running happily on used or refurbished hardware for the indefinite future. Similar to how I would view buying a used Toyota. I tend to think of myself as the family Linux mechanic.