| Speech recognition (in many contexts | | | | achieve very high performance in |
| also known as 'automatic speech | | | | controlled conditions. Part of the |
| recognition', computer speech | | | | confusion mainly comes from the mixed |
| recognition or erroneously as Voice | | | | usage of the term speech recognition and |
| Recognition) is the process of | | | | dictation. |
| converting a speech signal to a sequence | | | | Speaker-dependent dictation systems |
| of words, by means of an algorithm | | | | requiring a short period of training can |
| implemented as a computer program. | | | | capture continuous speech with a large |
| Speech recognition applications that | | | | vocabulary at normal pace with a very |
| have emerged over the last few years | | | | high accuracy. Most commercial companies |
| include voice dialing (e.g., Call home), | | | | claim that recognition software can |
| call routing (e.g., I would like to make | | | | achieve between 98% to 99% accuracy |
| a collect call), simple data entry | | | | (getting one to two words out of one |
| (e.g., entering a credit card number), | | | | hundred wrong) if operated under optimal |
| preparation of structured documents | | | | conditions. These optimal conditions |
| (e.g., a radiology report), domotic | | | | usually means the test subjects have 1) |
| appliances control and content-based | | | | matching speaker characteristics with |
| spoken audio search (e.g. find a podcast | | | | the training data, 2) proper speaker |
| where particular words were spoken). | | | | adaptation, and 3) clean environment |
| Voice recognition or speaker recognition | | | | (e.g. office space). (This explains why |
| is a related process that attempts to | | | | some users, especially accented, might |
| identify the person speaking, as opposed | | | | actually find that the recognition rate |
| to what is being said. | | | | could be perceptually much lower than |
| Speech recognition technology | | | | the expected 98% to 99%). |
| In terms of technology, most of the | | | | Other, limited vocabulary, systems |
| technical text books nowadays emphasize | | | | requiring no training can recognize a |
| the use of Hidden Markov Model as the | | | | small number of words (for instance, the |
| underlying technology. The dynamic | | | | ten digits) from most speakers. Such |
| programming approach, the neural | | | | systems are popular for routing incoming |
| network-based approach and the | | | | phone calls to their destinations in |
| knowledge-based learning approach have | | | | large organizations. |
| been studied intensively in the 1980s | | | | Both acoustic modeling and language |
| and 1990s. | | | | modeling are important studies in modern |
| Performance of speech recognition | | | | statistical speech recognition. In this |
| systems | | | | entry, we will focus on explaining the |
| The performance of a speech recognition | | | | use of hidden Markov model (HMM) because |
| systems is usually specified in terms of | | | | notably it is very widely used in many |
| accuracy and speed. Accuracy is measured | | | | systems. (Language modeling has many |
| with the word error rate, whereas speed | | | | other applications such as smart |
| is measured with the real time factor. | | | | keyboard and document classification; |
| Most speech recognition users would tend | | | | please refer to the corresponding |
| to agree that dictation machines can | | | | entries.) |