What is speech recognition

Speech recognition (in many contexts also knownhigh performance in controlled conditions. Part of
as 'automatic speech recognition', computerthe confusion mainly comes from the mixed
speech recognition or erroneously as Voiceusage of the term speech recognition and
Recognition) is the process of converting adictation.
speech signal to a sequence of words, by meansSpeaker-dependent dictation systems requiring a
of an algorithm implemented as a computershort period of training can capture continuous
program.speech with a large vocabulary at normal pace
Speech recognition applications that have emergedwith a very high accuracy. Most commercial
over the last few years include voice dialing (e.g.,companies claim that recognition software can
Call home), call routing (e.g., I would like to make aachieve between 98% to 99% accuracy (getting
collect call), simple data entry (e.g., entering aone to two words out of one hundred wrong) if
credit card number), preparation of structuredoperated under optimal conditions. These optimal
documents (e.g., a radiology report), domoticconditions usually means the test subjects have 1)
appliances control and content-based spoken audiomatching speaker characteristics with the training
search (e.g. find a podcast where particular wordsdata, 2) proper speaker adaptation, and 3) clean
were spoken).environment (e.g. office space). (This explains why
Voice recognition or speaker recognition is asome users, especially accented, might actually
related process that attempts to identify thefind that the recognition rate could be perceptually
person speaking, as opposed to what is being said.much lower than the expected 98% to 99%).
Speech recognition technologyOther, limited vocabulary, systems requiring no
In terms of technology, most of the technicaltraining can recognize a small number of words
text books nowadays emphasize the use of(for instance, the ten digits) from most speakers.
Hidden Markov Model as the underlying technology.Such systems are popular for routing incoming
The dynamic programming approach, the neuralphone calls to their destinations in large
network-based approach and theorganizations.
knowledge-based learning approach have beenBoth acoustic modeling and language modeling are
studied intensively in the 1980s and 1990s.important studies in modern statistical speech
Performance of speech recognition systemsrecognition. In this entry, we will focus on
The performance of a speech recognitionexplaining the use of hidden Markov model (HMM)
systems is usually specified in terms of accuracybecause notably it is very widely used in many
and speed. Accuracy is measured with the wordsystems. (Language modeling has many other
error rate, whereas speed is measured with theapplications such as smart keyboard and
real time factor.document classification; please refer to the
Most speech recognition users would tend tocorresponding entries.)
agree that dictation machines can achieve very