Transcribe Speech To Text With Linux And Google

(Last Updated On: February 24, 2024)

Sometimes in life, you run into situations where turning a voice recording into a text document is necessary. Perhaps this is from an interview for a news publication or perhaps you need to transcribe a verbal lecture from school. On Windows and OS X, there are a number of software programs that can help with this. Yet for Linux users, the options feel a bit sparse by comparison.

Today’s tip will address this issue. In this tip, I’ll show you how to combine Google’s Web Speech API with the Linux sound management server, PulseAudio.

Ready to get started? Great, here’s what you’re going to do:

1) Install pavucontrol (PulseAudio Control). It’s available from most software repositories.

2) Open pavucontrol (PulseAudio Control), click into the Input Devices tab. At the bottom, set Show to Monitors. Select the monitor that reflects the audio device you’ll be listening from by clicking the box next to the padlock on the right side. In my case, this was the USB speakers.

3) Now goto the Output Devices tab, make sure the matching output device is selected by clicking the box next to the padlock on the right side. Leave this app open, for troubleshooting.

4) Install/Open Chrome, browse to Google’s Web Speech API Demonstration page.

5) Now open up your audio player that will play the audio file. Get ready to play the audio file, but don’t hit play just yet.

6) Back on the API Demonstration page in Chrome, click on the microphone icon in the right center of the page.

7) Now in the audio player, hit play.

If everything went well, you should start seeing text appear on the Chrome page. If it isn’t working, re-check your settings. Another reason why it might not work is because of music or other noises in the background making voice audio difficult to detect.

Bonus fun: This also makes for a fun game of Mad Libs, by using a separate tab for YouTube podcasts. Some of the results are quite funny!

More great Linux goodness!

Home Recording with Ubuntu Studio Part One: Gearing Up

How To Setup and Use XBT version 1.7

Linux Mint Saves The Day

Mr. Desktop & Mr. Server Episode 8 | CYA!

Lessons from the Linux Mint Hack

Ubuntu Wireless Internet Drop Off Fix

Matt Hartley

Freedom Penguin

Freedom Penguin’s founder & talking head – Matt has over a decade working with Linux desktops, his operating system experience consists of both Windows and Linux operating platforms. In addition to writing articles on Linux and open source technology for Datamation.com and OpenLogic.com/wazi, Matt also once served as a co-host for a popular Linux-centric podcast.

Matt has written about various software titles, such as Moodle, Joomla, WordPress, openCRX, Alfresco, Liferay and more. He also has additional Linux experience working with Debian based distributions, openSUSE, CentOS, and Arch Linux.

6
Leave a Reply

Please Login to comment

WP Register, WP Login or Sign in Below:

newest oldest most voted

Guest

Eric Beyer

Lovely! Worked like a charm on my old iMac running openSUSE.

matthartley

Awesome man 🙂

Joe Brouhard

Going to test this out sometime in the next 24 hours. Wondering how this would work with teamspeak, and other apps, since I’m deaf, and speech-to-text is kinda something I’ve been wanting for a long time.

*Please* update us here and let us know how you testing goes. I had success with WAVs and MP3s. Expect some missed words, but generally speaking it’s pretty good at getting stuff right. Key is clear, understandable English. I tested this with some podcasts and because the spoken tracks were all over the place with interruptions, etc, it wasn’t as accurate as a lecture or a phone call. Hope this helps. 🙂

Well, It works for the most part. It does transcribe what’s being said on Teamspeak, although I have to admit the guy talking on TS was a little fast and the API was skipping words making it look like ghetto speak lol. The only issue I see with this set up is the time-out on the microphone at the website. I literally have to click the microphone button to make it work again. Wonder how this works with VOIP solutions too.. Might be a potential solution rather than using the captel phones I’ve been seeing lately. Many thanks for this… Read more »

Member

Laven Pillay

Thanks for the tip - and it works great, except, I’m guessing for “abuse reasons” they seem to have limited how much of audio that Demonstration Page converts (only did about 30 seconds of a file for me)

Transcribe Speech To Text With Linux And Google

Matt Hartley

More great Linux goodness!

6 Leave a Reply

6
Leave a Reply