| Speech recognition (in many contexts also
| |
| | achieve very high performance in
|
| known as 'automatic speech recognition',
| |
| | controlled conditions. Part of the
|
| computer speech recognition or
| |
| | confusion mainly comes from the mixed
|
| erroneously as Voice Recognition) is the
| |
| | usage of the term speech recognition and
|
| process of converting a speech signal to
| |
| | dictation.
|
| a sequence of words, by means of an
| |
| | Speaker-dependent dictation systems
|
| algorithm implemented as a computer
| |
| | requiring a short period of training can
|
| program.
| |
| | capture continuous speech with a large
|
| Speech recognition applications that have
| |
| | vocabulary at normal pace with a very
|
| emerged over the last few years include
| |
| | high accuracy. Most commercial companies
|
| voice dialing (e.g., Call home), call
| |
| | claim that recognition software can
|
| routing (e.g., I would like to make a
| |
| | achieve between 98% to 99% accuracy
|
| collect call), simple data entry (e.g.,
| |
| | (getting one to two words out of one
|
| entering a credit card number),
| |
| | hundred wrong) if operated under optimal
|
| preparation of structured documents
| |
| | conditions. These optimal conditions
|
| (e.g., a radiology report), domotic
| |
| | usually means the test subjects have 1)
|
| appliances control and content-based
| |
| | matching speaker characteristics with the
|
| spoken audio search (e.g. find a podcast
| |
| | training data, 2) proper speaker
|
| where particular words were spoken).
| |
| | adaptation, and 3) clean environment
|
| Voice recognition or speaker recognition
| |
| | (e.g. office space). (This explains why
|
| is a related process that attempts to
| |
| | some users, especially accented, might
|
| identify the person speaking, as opposed
| |
| | actually find that the recognition rate
|
| to what is being said.
| |
| | could be perceptually much lower than the
|
| Speech recognition technology
| |
| | expected 98% to 99%).
|
| In terms of technology, most of the
| |
| | Other, limited vocabulary, systems
|
| technical text books nowadays emphasize
| |
| | requiring no training can recognize a
|
| the use of Hidden Markov Model as the
| |
| | small number of words (for instance, the
|
| underlying technology. The dynamic
| |
| | ten digits) from most speakers. Such
|
| programming approach, the neural
| |
| | systems are popular for routing incoming
|
| network-based approach and the
| |
| | phone calls to their destinations in
|
| knowledge-based learning approach have
| |
| | large organizations.
|
| been studied intensively in the 1980s and
| |
| | Both acoustic modeling and language
|
| 1990s.
| |
| | modeling are important studies in modern
|
| Performance of speech recognition systems
| |
| | statistical speech recognition. In this
|
| The performance of a speech recognition
| |
| | entry, we will focus on explaining the
|
| systems is usually specified in terms of
| |
| | use of hidden Markov model (HMM) because
|
| accuracy and speed. Accuracy is measured
| |
| | notably it is very widely used in many
|
| with the word error rate, whereas speed
| |
| | systems. (Language modeling has many
|
| is measured with the real time factor.
| |
| | other applications such as smart keyboard
|
| Most speech recognition users would tend
| |
| | and document classification; please refer
|
| to agree that dictation machines can
| |
| | to the corresponding entries.)
|