Issue: Volume: 24 Issue: 2 (Feb 2001)

Look Who's Talking

By Jeffrey Rowe

To demonstrate the productivity advantages speech can offer, this two-part assembly was created from scratch using think3's thinkdesign with and without the aid of thinktalk, the program's new speech-enabled user interface. Creating the model with thinktalk reduced mouse clicks by 44 percent, eliminated all keyboard strokes, and was 23 percent faster than creating the model without thinktalk.

Without Speech:
Mouse Clicks: 117
Keyboard Strokes: 37
Total Creation Time: 207 seconds

With Speech:
Mouse Clicks: 65
Keyboard Strokes: 0
Total Creation Time: 160 seconds
Images courtesy think3.

Speech Enablers
Alias|Wavefront ·
Microsoft Corp. ·
think3 ·

Most CAD professionals, including myself, have never felt a close affinity for designing things using a keyboard, mouse, or command line. And although attempts have been made to create input devices-such as digitizing pads and 3D controller balls-that more closely mimic natural design tools, none of these devices has ever really taken hold in a big way.

Over the last few years, however, we've seen the emergence of technology that supports speech recognition as part of a computer program's user interface. And lately, CAD vendors have begun to examine the efficacy of incorporating such technology into their software, marking the first step toward a more natural way to interact with CAD applications.

With speech-recognition technology, CAD vendors can tailor their software so that a user can design and engineer a complex product simply by speaking design commands into a microphone. So far, only one leading CAD vendor, think3, is shipping a CAD system that incorporates speech recognition. But as available technologies continue to mature and as existing limitations are ironed out, other CAD software vendors say they will explore the possibility of including such capability into future versions of their products as well.

Speech-recognition technology has been around in one form or another for more than three decades. But only during the past few years have speech engines begun to support the use of speech as an input device.

Until recently, speech recognition was not useful for many tasks because available products suffered from two main limitations. First, despite the technology's relatively limited grammar and vocabulary capabilities, it required CPUs powerful enough to compute the complex statistical comparisons necessary for speech recognition. And second, for similar-sounding words, such as meal and kneel, or words that sound the same but are spelled differently, such as pear, pair, and pare, speakers had to repeat words several times for the systems to understand what they were saying. And even then, a 60-percent accuracy rate was considered good.

Other obstacles existed as well. For example, early systems were speaker-dependent, meaning one system could could only be used by a single user. The systems required speakers to enunciate each word carefully. And speech recognition typically had to be conducted in soundproof rooms to eliminate ambient noise. All these limitations added up to a technology that wasn't worth the time or resources needed to make it work.

Today, many of these limitations have been overcome. Thanks to software and hardware improvements, speech-recognition systems can run on a 350MHz Pentium II or III CPU with 128MB of RAM. They are substantially more accurate than they used to be. Many are speaker-adaptive. And some systems can perform fairly well with more spontaneous (albeit still careful) speech.
It's easier to learn spoken commands than to select the right icon from a huge palette or find the correct command from a hierarchical list of cascading menus.

Finally, applications are being developed for use in normal office conditions. Although the technology still doesn't work well in environments with high levels of ambient noise, such as shop floor or production environments, noise-filtering microphones and better software-based sound-comprehension (not just recognition) technologies are increasing accuracy. These improvements are ensuring that speech recognition is not just another fad, but a real, positive step toward better user interfaces between humans and machines.

Although several companies-most notably, IBM-have been involved in the speech-recognition market over the years with varying degrees of success, Microsoft is currently the most significant to the CAD community because of its work with software vendors in implementing speech-enabling technologies.

Microsoft first became actively involved in this market in 1997, when it began a relationship with Lernout & Hauspie Speech Products (L&H) to develop speech technology. Earlier this year, Microsoft and L&H expanded their relationship to include the development of localized speech engines that permit more universal use of speech recognition. At press time, L&H was reportedly struggling to recover after filing for bankruptcy in January amid accusations of financial misdeeds. Meanwhile, Microsoft continues shipping several products with speech capabilities that it developed with L&H, including Encarta 99, Windows 2000, and Auto PC, in addition to Web-based speech controls for browsers.

Today, speech recognition in CAD products is being implemented via the Microsoft Speech Application Programming Interface (SAPI) 5.0 Software Development Kit, which consists of the Microsoft Speech Recognition (SR) and Microsoft Text-to-Speech (TTS) engines. According to Claudia Gaudagnini, software engineer at think3, Microsoft's SR engine offers several benefits to vendors thinking of incorporating speech capability into their products.

For instance, Gaudagnini says, training the engine to recognize a user's voice is a fast and easy process, requiring the user to recite a simple set of words into a microphone connected to the computer. Moreover, the SR engine is relatively fast and responsive, is customizable, and has relatively light hardware requirements because the speech-recognition algorithms have become more efficient. She also claims that the engine is reliable and precise, providing greater than 95 percent accuracy for most common design tasks, once it has been trained to recognize a user's voice.

Although a number of CAD companies are exploring the potential of speech-enabled applications, think3 is the only vendor currently offering speech recognition as part of its program's user interface: Its speech-enabled technology, called thinktalk, is integrated into the newest release of thinkdesign, version 6.0. The company decided to integrate a speech component in thinkdesign 6.0's interface primarily to increase the speed at which users can design with the product. Added benefits include savings in screen real estate and an increase in focus on the design tasks at hand.

Thinktalk offers additional advantages. For example, not only does the speech-enabled user interface require fewer key strokes and mouse clicks, it also gives both experienced designers and novices a more natural and intuitive way to interact with the software.

In addition, speech can provide an easier method of learning and exercising the full functionality of a CAD program. It's relatively easy for the human brain to learn 200 new words representing CAD commands, including when and where these words should be used within the context of a CAD application. On the other hand, it's much more difficult to learn and retain the ability to select the correct icon from a palette of hundreds or to find the correct command from a hierarchical list of cascading menus. For command retrieval, speech can prove to be the easiest way to get users to learn and implement the full range of a CAD system's command set.

Other CAD vendors say they're looking at incorporating speech technology into their software, but they contend that certain limitations will first have to be overcome. For instance, Buzz Kross, vice president of the manufacturing market group at Autodesk, says a speech-enabled interface has a long way to go before it is as productive as Autodesk's current user interface. "Speech is not universally accepted enough yet as a part of the user interface, and users have not yet formed a preference," he says. "But when they do, Autodesk will pay more attention to implementing it."

Although Autodesk is exploring speech, Kross says the company is focusing more heavily on tactile user interface possibilities, such as those involving finger and thumb motion, because it believes these are more accurate than speech at this time. "We are also exploring force/pressure feedback mice for the tactile feel of physical objects being designed," he says.

Another company that is in the exploratory phase of speech-enabled technologies is Alias|Wavefront. "We have been tracking speech technology carefully and looking at places where speech might play a role in our products," says William Buxton, chief scientist at Alias|Wavefront. "For example, when we engineered Maya, by building its user interface on top of our MEL [Maya Embedded Language] scripting language, we consciously engineered it so that it would accommodate speech recognition as that technology evolved."

However, according to Buxton, one of the places where Alias|Wavefront believes speech has potential, and where it will most likely first appear in its products, does not involve speech recognition at all. Rather, it is in the form of what might be called multimedia voice mail.

"Current computer graphics software is pretty good at letting us create animations and models. But the area of greatest weakness has to do with tools to support what you do with these digital assets once they are created," Buxton says. In particular, few tools effectively support the process of design reviews and approval, he says. "What we envision are systems that enable you to point at and mark up parts of a design or animation while speaking, and have the speech and gestures recorded in sync so that they can be preserved for later playback."

This kind of multimedia (speech, pointing, and marking) meta-document attached to 3D computer graphics would provide better communication about designs among designers, managers, and customers, Buxton explains. "The speech recognition is performed by the recipient of the message, rather than by a computer. And, as nearly all the research has shown, the real benefits accrue when the speech is combined with manual gestures."

It's too early to tell whether speech recognition will ultimately be part of an interface that also supports parallel input of gesture recognition and different types of manual input. However, the technology appears to be on its way toward fulfilling the promise of a more usable CAD interface.

Jeffrey Rowe is an independent design, development, and technical writing consultant. He can be reached at