| In the first week of May, 2010 Google announced | | | | captions turned on, you may see that the |
| the worldwide release of its YouTube video | | | | accuracy of the captions has increased several |
| transcription services. Although released in mid | | | | folds over the past few months. The accuracy is |
| 2009, the beta version of YouTube video | | | | going up day by day and is only going to improve |
| transcription was available to a select few | | | | as more people use the service. As Eric Schmidt, |
| Universities, News Broadcasters and Government | | | | CEO of Google Inc. says -' Our Google YouTube |
| agencies. | | | | transcriptions will improve over a period of time |
| The history of speech recognition technology | | | | as more and more users use it, it's a self learning |
| dates back to the late 1930's, when AT&T Bell | | | | technology " |
| Laboratories developed a primitive device that | | | | But there are still a few major flaws that could |
| could recognize speech. Researchers knew that | | | | be foreseen despite it being a self learning |
| the widespread use of speech recognition would | | | | technology - |
| depend on the ability to accurately and | | | | 1. Accurate captioning is possible only in the case |
| consistently perceive subtle and complex verbal | | | | when the speaker is speaking very clearly and |
| input. But because the computing technology was | | | | distinctly. |
| not good enough, the development of speech | | | | 2. The environment has to be free from any sort |
| recognition was snail paced. | | | | of disturbance |
| 50 years down the line, the capabilities of many | | | | 3. Errors creep in because of similar sounding |
| digital electronic devices had surpassed even the | | | | words such - sky and high -when spoken quickly, |
| best and the costliest technologies of the 1930's. | | | | the system is not able to differentiate between |
| This was made possible due to the breakthroughs | | | | the two. |
| made in chip and semiconductor fabrication. The | | | | 4. Interjections - People often pause or make |
| largest barriers to the speed and accuracy of | | | | some thinking sounds during speeches - these |
| speech recognition - computer speed and power - | | | | include uh's, Hmmms, ahh etc. The recognition |
| were no longer an issue. | | | | software makes an effort to transcribe these as |
| With more computing power (measured in units | | | | well, at times giving hilarious results. (Search |
| of FLOPS) than our 1930's computer scientists | | | | YouTube for Hilarious Google voice transcription) |
| could imagine, programmers could now develop | | | | And finally comes the major downside of them all |
| algorithms to code and decode a multitude of | | | | 5. Psychological Satisfaction - After the captioning |
| voice patterns. Practically they could now build a | | | | has been done by the Google robots, can |
| database of thousands of different voice | | | | uploader be sure of the accuracy? It is quite |
| patterns, convert them into digital sine waves and | | | | obvious that the transcribed captions would need |
| analyze words based on the mathematics of | | | | to be thoroughly checked for errors and |
| voice pattern signals. Over a period of time, as | | | | proofread several times. This means going |
| the speech to text technologies became usable; | | | | through the whole video several times, manually |
| many companies started offering voice | | | | correcting the words, correcting the grammar |
| recognition to its consumers - Dragon Dictation, | | | | portion including commas, hyphens, quotes etc |
| Microsoft (XP, Vista), Google Voice and other | | | | and them uploading them. A very time consuming |
| niche companies. | | | | process. |
| So now the question arises - How reliable are | | | | So what is the ultimate solution to transcribing |
| these technologies, particularly Google YouTube | | | | files if not voice to text recognition technology? |
| transcription and will they ever compete if not | | | | The answer is simple, the way digital and analog |
| surpass human transcription accuracy? | | | | files have been transcribed for the past 50 years |
| Those who like to view YouTube videos with | | | | - Humans. |