Talk to Type: Speech Recognition Software

by Barbara Twardowski on May 1, 2004 - 3:50pm

In my junior year of high school, I took a typing class. With the discipline of a pianist, I learned to position my fingers above the keyboard correctly. Back straight. Eyes on the document.

I spent hours training my brain to see a word and send a message to the finger that corresponded to each letter, while never looking at my hands.

I was slow. Painfully slow. But I learned to type.

In high school, I pounded on a turquoise manual typewriter from J.C.Penneys. Before heading to college, I bought a portable, electric Smith-Corona.

A decade passed and my master's thesis was composed on a word processor. Since the early 1990s, I've advanced to a computer.

But the effects of my Charcot-Marie-Tooth disease have made computer keyboards sometimes difficult to manage. The muscles in my hands have atrophied and I have two fingers I can't straighten out.

Now it's a new millennium and my typing can be replaced with a speech-recognition software program.

Just talk to it

Speech-recognition software enables the user to dictate directly into a computer using a headset. As the user talks to the computer, words are transcribed onto the screen.

After more than a decade of typing the conventional way, I was ready to try a hands-free approach. The three major products on the market are IBM's ViaVoice, Commodio's QPointer and Scansoft's Dragon NaturallySpeaking 7.

My goal was to turn my computer into a personal stenographer who types and edits perfect copy.

Each product requires "training" the computer to recognize the speaker's voice. Theoretically, this can be accomplished in less than 30 minutes.

A headset microphone comes with each program; the kind telemarketers wear on infomercials. First, the microphone has to be properly positioned and tested. Once the computer can "hear" you, the next step is "training" it to recognize your voice.

Talking to a computer isn't the same as speaking to a person. People can filter out background noises while they converse, but the computer needs help.

The room you work in must be free of as much background noise as possible. (No more classical music while I work.) Then the computer has to adjust to the unique characteristics of a person's voice.

For this training process, you select from a list of short passages and read them aloud. The instructions say to read them in a "natural" voice. If you read so that the computer understands, the words are highlighted on the screen and you proceed. If it doesn't, you're stuck repeating the sentence over and over. (Getting louder doesn't help.)

Reading in a "natural voice" isn't easy. With ViaVoice, I talked slowly and in a monotone. With QPointer, my slow, puffy enunciation was similar to a poor Marilyn Monroe impersonation.

A broadcaster's voice

Dragon NaturallySpeaking was the last product I trained. Its instructional materials were the most specific about how to speak:

"Listen to the way newscasters read the news. If you copy this style when you use Dragon NaturallySpeaking, the program should successfully recognize what you say."

Suddenly, picturing Dan Rather in my mind, I sat up straighter and spoke in a confident tone. I began to read the text like a news anchor, but it certainly didn't feel natural.

A large yellow arrow moved across the screen indicating which word I was on as I read the following text:

Try thinking about what you want to say before you start to speak. This will help you speak in longer, more natural phrases. Speak at your normal pace without slowing down. When another person is having trouble understanding you, speaking more slowly usually helps. It doesn't help, however, to speak at an unnatural pace when you're talking to a computer. This is because the program listens for predictable sound patterns when matching sounds to words. If you speak in syllables, Dragon NaturallySpeaking is likely to transcribe each syllable as a separate word.

When you read this training text, Dragon NaturallySpeaking adapts to the pitch and volume of your voice. For this reason, when you dictate, you should continue to speak at the pitch and volume you are speaking with right now. If you shout or whisper when you dictate, Dragon NaturallySpeaking won't understand you as well.

And last but not least, avoid saying extra little words you really don't want in your document, like um or you know. The computer has no way of knowing which words you say are important, so it simply transcribes everything you say.

An exercise that trained both the computer and me! In less than 15 minutes, I'd raced through the training portion of Dragon NaturallySpeaking.

The training process is critical. All of the manufacturers recommend you repeat the training session if the computer is making excessive errors.

MDA Matters
The author makes friends with Dragon NaturallySpeaking. Photo by Jim Vance

The computer makes mistakes for a variety of reasons. While people can use common sense and context clues to understand a speaker, the computer can't distinguish between phrases that sound alike such as "ice cream" and "I scream."

Speech-recognition programs don't understand the meanings of words.

Instead, they keep track of how frequently words occur by themselves and in the context of other words. This information helps the computer choose the most likely word or phrase from among several possibilities. If you mumble or slur your speech, the computer doesn't understand.


ViaVoice

Once the computer is trained, the next step is actually dictating.

With ViaVoice, I opened the Speak Pad and began dictating what I hoped would be an e-mail to my friend Luan.

The computer typed "Lou a and." I shouldn't really expect it to know proper names.

I also discovered it isn't a big fan of contractions. When I said "I've," the screen read "I'm." When I said "play," it typed "fly." When I said "thanks," it typed "takes."

I struggled with how to put the document into an e-mail and finally contacted customer support for help. A cut and paste with either the keyboard or by voice was necessary.

I spent several hours over a two-week period playing with ViaVoice. I'd consult the manual and retrain, but I found myself repeatedly using my hands to insert corrections. I was beginning to realize speech recognition doesn't happen instantly.

Commodio's QPointer

The second product I installed and trained was QPointer. Its Web site touts it as being "especially beneficial for disabled users with impaired fine motor skills."

The software allows the user to control the computer with voice commands, as well as to dictate documents. My interest in QPointer was only its ability to transcribe dictation.

I dictated using QPointer over and over. The sentences were filled with errors. So, I trained it again.

Four times I sat through training, but I was never satisfied with the end results. I could type better with two fingers.

Dragon NaturallySpeaking 7

By the time I installed Dragon NaturallySpeaking, I knew the routine.

I dictated directly into a word-processing file and was amazed at the accuracy. It transcribed my contractions. When I said "91" it typed the number.

The program has an autopunctuation command that capitalizes the first word of a sentence and places periods at the ends of sentences. When I dictated a 314-word document, it only made five mistakes.

Hands down, or rather hands-free, the best dictation product was Dragon NaturallySpeaking.


Expert advice

"Speech-recognition software requires motivation and a desire to learn new technology. People mistakenly think they will put on a headset and magic will happen," said Dan Gilman, the creator of AbilityHub.com, a Web site that informs consumers on alternative methods and adaptable equipment for accessing computers.

Gilman is certified by the Rehabilitation Engineering and Assistive Technology Society of North America as an assistive technology practitioner. He's lived with a disability since 1972, when he fractured his neck in a swimming accident.

To use speech-recognition software effectively, a powerful, fast computer is essential. Gilman recommends a Pentium 4 CPU and the highest-quality USB headset or microphone available. Maximum memory is important because most users are operating several programs simultaneously.

Gilman offers some tips for using the programs:

Give your voice frequent breaks and drink water, not caffeine. It's easy to strain your vocal cords when you talk to a computer for hours.

Before buying a product for a child, realize he must be able to read and spell to use the programs. Gilman has worked with children as young as 8 and says a soft vocal pitch is difficult for the computer to recognize.

Correct the errors in your documents with the speech-recognition program, so the computer will improve its accuracy.

Plan to spend four to six hours becoming familiar with a program. The results are worth the effort.

Gilman says, "Speech-recognition software has become more user-friendly. It is empowering."

Your rating: None Average: 3 (3 votes)
MDA cannot respond to questions asked in the comments field. For help with questions, contact your local MDA office or clinic or email publications@mdausa.org. See comment policy