| Speech recognition (in many contexts also known | | | | high performance in controlled conditions. Part of |
| as 'automatic speech recognition', computer | | | | the confusion mainly comes from the mixed |
| speech recognition or erroneously as Voice | | | | usage of the term speech recognition and |
| Recognition) is the process of converting a | | | | dictation. |
| speech signal to a sequence of words, by means | | | | Speaker-dependent dictation systems requiring a |
| of an algorithm implemented as a computer | | | | short period of training can capture continuous |
| program. | | | | speech with a large vocabulary at normal pace |
| Speech recognition applications that have emerged | | | | with a very high accuracy. Most commercial |
| over the last few years include voice dialing (e.g., | | | | companies claim that recognition software can |
| Call home), call routing (e.g., I would like to make a | | | | achieve between 98% to 99% accuracy (getting |
| collect call), simple data entry (e.g., entering a | | | | one to two words out of one hundred wrong) if |
| credit card number), preparation of structured | | | | operated under optimal conditions. These optimal |
| documents (e.g., a radiology report), domotic | | | | conditions usually means the test subjects have 1) |
| appliances control and content-based spoken audio | | | | matching speaker characteristics with the training |
| search (e.g. find a podcast where particular words | | | | data, 2) proper speaker adaptation, and 3) clean |
| were spoken). | | | | environment (e.g. office space). (This explains why |
| Voice recognition or speaker recognition is a | | | | some users, especially accented, might actually |
| related process that attempts to identify the | | | | find that the recognition rate could be perceptually |
| person speaking, as opposed to what is being said. | | | | much lower than the expected 98% to 99%). |
| Speech recognition technology | | | | Other, limited vocabulary, systems requiring no |
| In terms of technology, most of the technical | | | | training can recognize a small number of words |
| text books nowadays emphasize the use of | | | | (for instance, the ten digits) from most speakers. |
| Hidden Markov Model as the underlying technology. | | | | Such systems are popular for routing incoming |
| The dynamic programming approach, the neural | | | | phone calls to their destinations in large |
| network-based approach and the | | | | organizations. |
| knowledge-based learning approach have been | | | | Both acoustic modeling and language modeling are |
| studied intensively in the 1980s and 1990s. | | | | important studies in modern statistical speech |
| Performance of speech recognition systems | | | | recognition. In this entry, we will focus on |
| The performance of a speech recognition | | | | explaining the use of hidden Markov model (HMM) |
| systems is usually specified in terms of accuracy | | | | because notably it is very widely used in many |
| and speed. Accuracy is measured with the word | | | | systems. (Language modeling has many other |
| error rate, whereas speed is measured with the | | | | applications such as smart keyboard and |
| real time factor. | | | | document classification; please refer to the |
| Most speech recognition users would tend to | | | | corresponding entries.) |
| agree that dictation machines can achieve very | | | | |