Is Speech Recognition Ready for Every Day Use?

Movies and television are showing people interacting with computers by using speech recognition. Who among us has not dreamed of talking to our computer and receiving answers or having tasks completed? Is speech recognition ready for everyday use?

I spent the last month researching speech control on Apple and Window OS’s. In reality, I have started this quest several times in the past years because I am always trying to push my computer to the next level. I always wanted my computer to either accept my voice commands or speak back to me. This will be an on-going project but I will summarize what I have learned so far.

Before I begin, I want to say – I am not reviewing any of the programs I refer to. I guess you could say I’m reviewing Speech Recognition in whole.

The Search …

I searched the web and read review after review. Each review was basically a repeat of the one before it and everyone with the big webzine tech names all had the same conclusion – Speech Recognition is fun to play with but it’s not ready for anything serious or power users.

I disagreed. I wanted more.

The two main programs that are talked about are Apples built-in speech app, and Dragon Dictate by Nuance, as it is called for the Mac, or Dragon Naturally Speaking for the Microsoft world.

Program Choices and Learning Curve

I started with the easiest and most available of the two programs. I fired up Macs built-in speech app and started my quest. Within a few short moments, I was opening applications and giving my computer commands. From the start, I noticed that the built-in speech app was decent but would miss commands about 3-4 out of 10 times. Before we get too far in this article, you need to realize the following. If you have never worked with speech recognition, there is a medium learning curve. It can, and usually is, difficult to transition from using your hands to using your voice with your computer.

When you have spent all your time using a keyboard and mouse to interface with your computer, there is an awkwardness when you begin to use only speech, because your hands are sitting still. This can be very awkward. You have to re-train yourself how to interface with your computer using just your voice. This will not be a quick or easy task. Not only can it be awkward in the beginning, but there are some things you will have take into account such as:

1. how do you speak? do you use slang? do you have good pronunciation? do you use proper english?
2. What is your environment like to the speech program? (background sounds – more on this later)
3. Is your computer specifications able to process the voice commands?
4. How well built and what is the response of your microphone?
There are other variables that could influence your results but above are the main ones everyone needs to be concious of.

How well does it work?

The first few days using my voice to interface with my Mac showed me what all the reviews had said up to now.

Basically, – It’s fun to play with but not ready for the masses – or – Speech recognition has a long way to go before becoming the keyboard / mouse killer, – and many other reviewers comments basically telling you to not even attempt it. It’s a waste of time.

I can see how they arrived at this opinion if they tried it for four or five days and decided to write their article. I have been using speech with my computer for over a month now and I have a different opinion of the technology.

Before I go further – understand this limitation and work with it: You will not be using speech recognition to totally remove the mouse and keyboard.

The keyboard and mouse are still needed and sometimes, just faster. Remember I said earlier that you will have to teach yourself how to interface using your voice. I have been doing this over a month and I still feel awkward, although it is getting easier, when speaking to my Mac. I don’t know how long it will take for it to feel natural, but a month and a few days is not enough time.

Beginnings of a Summary

Mac’s built in speech app is a good start but, the recognition missed on a more often basis than I was willing to put up with. I have used and compared it back and forth to find what will work for me. I found it was lacking in comparison to Dragon Dictate. Also, Dragon Dictate has several modes (command mode, dictation mode, spelling mode, numbers mode, and sleep mode) that makes it easier to speak to your computer and have it understand what you want done.

You have to learn to “speak” to the software. There are keywords and phrases you must learn to begin using the software to it’s fullest. If you do not take time to learn how to speak to the software, you will not continue to use it and will give up believing that speech recognition is not ready.

Think about all those Star TrekTM movies we have all seen over the years. “Computer. What is the distance to Romulus 7 at max warp speed?” and, of course, the computer would compute and say the answer. Oh man, this is what we all want. This would be the ultimate. Well, in my mind, it would be. This may be happening in a secret government installation right now but for the rest of us, we can only use what is available.

But, as on Star Trek, you have to learn how to speak to the software by using the keywords and phrases it understands.

Maybe I Can Make It Better

There are built in commands that come stock with speech recognition software. These commands are the standard “activate a program” and perform task such as “close window”. Once you learn these basics, you begin to think of other task you can have your computer do with speech commands.

One of my thoughts was that I wanted to adjust my computer sound volume. This is one of those commands that would would be useful if I were listening to say, iTunes and a song came on that I wanted to rock out too. From across the room, I could ask the computer to turn up the volume.

Volume control is not a command built into either Mac speech app or Dragon Dictate. After some research on the web, I found that Apple Automator could do the trick if I built a workflow that specified my volume level. I had learned that the Mac has eight levels (0-7) of volume from Mute to Max. I could also do half steps but I decided to use whole numbers. I created eight workflows and named them “Set Sound to zero” (muted) to “Set Sound to Seven” (Max). Both Mac Speech and Dragon Dictate allow you to run automator workflows or Apple scripts, so my workflows for volume control will work with either program. They work as expected and the plus is that I can also use them with calendar to have scheduled tasks.

The understanding of using scripting or other external programs to create commands is a must have to make speech commands work for you. Dragon Dictate identifies the programs you have active and allows you to create new commands using the menu system of the program you want to create a new command for. This makes it easy and quick to create speech control commands for any program that you use.

Dictation Mode
The use of speech recognition for dictation has been around for awhile. This is probably the easiest feature to use. Get into your favorite text editor or word processor, talk into your microphone and ithe speech to text program writes down every word you say. Todays versions of the these programs are quite efficient and accurate in turning your speech into text. Again, this is all based on how well you have learned the keywords required to use the program and how well you progress with talking to your computer.

I mentioned earlier, one of my requirements to use speech recognition in my daily life was that it had to work with my normal background noise. I do have to make some adjustments due to the programs limitations, but these are to be expected.

When I trained my microphone for my voice, I kept my background sound, television, at a comfortable volume that I could still hear. This is a needed whether you use a headset or desk mic. If you train your microphone with a completely quiet room, that is what the program will require to function properly. Your environment plays a role in how well the recognition will work.

The Future
Humans interface with each other everyday using speech and our growing dependency on computers will lead to further evolution in speech technology. Stay tuned for future updates… We’re just getting started. (A more in-depth article can be found at (http://ssdd.jlgaines.com/2011/11/21/speech-recognition/)

Leave a Reply Cancel reply