PEPNet-Northeast
formerly the Northeast Technical Assistance Center (NETAC)

Special Applications of Automatic Speech Recognition (ASR) with Deaf and Hard-of-Hearing People

by Ross Stuckless

What is ASR?

    Automatic speech recognition can be defined as the independent, Computer-driven transcription of spoken language into readable text in real time (Stuckless, 1994). Interest in the application of a device of some kind to enable deaf people to read in text form what is being spoken while it is being spoken, is not new. In 1883, the editor of the American Annals of the Deaf reported that while the mechanical device used to automatically transcribe human speech was unsuitable, "it is not unreasonable to hope that some instrument will yet be contrived that will record ordinary human utterance without annoyance or discomfort to the speaker" (Fay, 1883).

NTID's interest in operator-assisted systems

    In the absence of practical automatic speech recognition systems, NTID has been active in the development and application of operator-assisted speech recognition since 1981 when it acquired the first model of a stenographer-assisted speech-to-text system. ASR was introduced in selected NTID/RIT classes in February, 1982, and with large audiences of deaf adults at a convention of the A.G. Bell Association for the Deaf in Portland, Oregon, in June, 1984.

    Research with several hundred deaf students who used the first two generations of this system provided information pertaining to accuracy and error types, error detection by deaf students and their hearing classmates, instructors' speaking rates and styles, and text display considerations (Stuckless, 1983). Stinson et al. (1988) reported on the perceptions of deaf students, differentiated by several demographic and communication characteristics, toward the real-time system relative to other support services such as interpreting. The information derived from all this work provides excellent benchmark criteria against which to assess potential applications of ASR. In the meantime, more technically-advanced stenographic-assisted systems continue to be used with considerable numbers of deaf and severely hard-of-hearing students mainstreamed in regular classes across the country, particularly at the college level.

    Unfortunately, the cost and shortage of well-qualified stenographers are deterrents to the widespread use of stenographic-assisted systems. This is especially problematic in a setting such as NTID/RIT where deaf students are mainstreamed with hearing students in 400 or more different undergraduate and graduate courses each quarter.

    With the intent of maintaining the benefits of operator-assisted systems at lower cost and with a larger potential pool of operators, Michael Stinson and several of his colleagues in NTID's Department of Educational and Career Research have taken a slightly different approach. Instead of using stenographers and their steno machines to input speech symbols, people with typing skills input standardized abbreviations of words using a regular laptop keyboard and special software designed and adapted at NTID. These abbreviations appear as full words on the screen.

    While C-Print does not reach the transcription speeds required for the verbatim transcription of rapid speech, a trained typist can reach speeds of up to 100 words per minute. Many deaf students prefer to read the condensed text of an instructor, particularly when it reduces the transcription of false starts and other idiosyncrasies inherent in the unrehearsed speech of most speakers (Stinson & Stuckless, in press).

What's been happening in ASR?

    Aside from the scientists and technicians who are actually engaged in ASR research and development, most people who think about ASR at all underestimate its complexity. More than speech synthesis, i.e., automatic text-to-speech, ASR requires fast computers with lots of data capacity and memory--a necessary condition for complex recognition tasks, and the involvement of speech scientists, linguists, computer scientists, mathematicians, and engineers. These in turn require the necessary support from the private and public sectors.

    The search is on for ASR systems that incorporate three features: large vocabularies, continuous speech capabilities, and speaker independence. Today, there are numerous systems which incorporate two of these combinations, but not all three, as illustrated by products developed and marketed by Dragon Systems and IBM.

    In 1994, Dragon Systems introduced a large vocabulary ASR system called DragonDictate for use on a Windows platform. IBM followed shortly after with a comparable product called VoiceType. Both products feature large vocabularies in excess of 20,000 words, with the provision for the user to customize his or her system by adding more. Both systems, however, require the user to pause briefly between each word he or she utters, i.e., discrete speech recognition in contrast to continuous speech recognition capabilities.

    Also, both systems were designed as single speaker systems rather than speaker-independent systems, meaning that acceptable accuracy can be obtained only by training the system to recognize each user's particular speech characteristics.

    With an hour of training, and reading selected materials, speakers reached speeds of 50 words per minute and word accuracy as high as 97 percent (Stuckless, 1996). In examining both systems at NTID for speech and accuracy (in collaboration with Harry Levitt of the City University of New York), we found that the absence of both continuous speech and speaker-independent capabilities presented major limitations for people who are deaf, precluding applications such as its use to capture spontaneous speech (as in conversation) or more formal speech (as in a college lecture situation).

    In Spring 1997, Dragon Systems announced the availability of the first edition of NaturallySpeaking, a system featuring continuous speech recognition. Informal use of this system has produced speeds of around 130 words per minute and word accuracy rates of approximately 95 percent. In Fall 1997, IBM announced its own continuous speech recognition system, called ViaVoice, which has not yet been examined. Both of these systems remain speaker dependent.

What's ahead?

    Encouraged by some innovative models and fresh sources of information, developments in ASR appear to be accelerating. There now arises the question of whether these developments will support communication as needed by people who are deaf or hard of hearing, or will induce the kinds of isolating effects that the telephone had on deaf and hard-of-hearing people for so many years.

    Prompted by the interest of a Rochester couple, Mr. and Mrs. F. W. Lovejoy, in exploring an adaptation of existing ASR software to permit the wife to communicate more readily with her late-deafened husband, a committee from the Rochester community came together in 1996 to plan a national meeting, the F.W. Lovejoy Symposium on Applications of ASR Systems with Deaf and Hard of Hearing People. This meeting was co-sponsored by RIT and the University of Rochester, and took place in April 1997.

A member of the senior research faculty in the Department of Educational and Career Research at NTID, Ross Stuckless began his career more than 40 years ago as an English teacher at the American School for the Deaf in Hartford, Connecticut, where his brother David was among his students. Ross is optimistic that future applications of automatic speech recognition will contribute substantially to the quality of life among deaf children and adults, and others who share their lives. For more information on ASR, contact Stuckless at ERSNVD@RIT.edu

References

    Fay, E.A. (1883). The glossograph. American Annals of the Deaf, 28, 67-69.

    Stinson, M., & Stuckless, R. (in press). Recent developments in speech-to-print transcription systems for deaf students. In A. Weisel (Ed.), Deaf education in the 1990's: International perspectives.

    Stinson, M., Stuckless, R., Henderson, J., & Miller, L. (1988). Perceptions of hearing-impaired college students toward real-time speech to print: RTGD and other educational support services. Volta Review, 90, 341-347.

    Stuckless, R. (1983). Real-time transliteration of speech into print for hearing impaired students in regular classes. American Annals of the Deaf, 128, 619-624.

    Stuckless, R. (1994). Developments in real-time speech-to-text communication for people with impaired hearing. In M. Ross(Ed.), Communication access for people with hearing loss (pp.197-226). Baltimore, MD: York Press.

    Stuckless, R. (1996). Evaluation of the DragonDictate discrete word (speech) recognition system for its application with deaf adults (technical report). Rochester, NY: National Technical Institute for the Deaf.