Google Voice and Video Transcription Versus Humans

In the first week of May, 2010 Google announcedcaptions turned on, you may see that the
the worldwide release of its YouTube videoaccuracy of the captions has increased several
transcription services. Although released in midfolds over the past few months. The accuracy is
2009, the beta version of YouTube videogoing up day by day and is only going to improve
transcription was available to a select fewas more people use the service. As Eric Schmidt,
Universities, News Broadcasters and GovernmentCEO of Google Inc. says -' Our Google YouTube
agencies.transcriptions will improve over a period of time
The history of speech recognition technologyas more and more users use it, it's a self learning
dates back to the late 1930's, when AT&T Belltechnology "
Laboratories developed a primitive device thatBut there are still a few major flaws that could
could recognize speech. Researchers knew thatbe foreseen despite it being a self learning
the widespread use of speech recognition wouldtechnology -
depend on the ability to accurately and1. Accurate captioning is possible only in the case
consistently perceive subtle and complex verbalwhen the speaker is speaking very clearly and
input. But because the computing technology wasdistinctly.
not good enough, the development of speech2. The environment has to be free from any sort
recognition was snail paced.of disturbance
50 years down the line, the capabilities of many3. Errors creep in because of similar sounding
digital electronic devices had surpassed even thewords such - sky and high -when spoken quickly,
best and the costliest technologies of the 1930's.the system is not able to differentiate between
This was made possible due to the breakthroughsthe two.
made in chip and semiconductor fabrication. The4. Interjections - People often pause or make
largest barriers to the speed and accuracy ofsome thinking sounds during speeches - these
speech recognition - computer speed and power -include uh's, Hmmms, ahh etc. The recognition
were no longer an issue.software makes an effort to transcribe these as
With more computing power (measured in unitswell, at times giving hilarious results. (Search
of FLOPS) than our 1930's computer scientistsYouTube for Hilarious Google voice transcription)
could imagine, programmers could now developAnd finally comes the major downside of them all
algorithms to code and decode a multitude of5. Psychological Satisfaction - After the captioning
voice patterns. Practically they could now build ahas been done by the Google robots, can
database of thousands of different voiceuploader be sure of the accuracy? It is quite
patterns, convert them into digital sine waves andobvious that the transcribed captions would need
analyze words based on the mathematics ofto be thoroughly checked for errors and
voice pattern signals. Over a period of time, asproofread several times. This means going
the speech to text technologies became usable;through the whole video several times, manually
many companies started offering voicecorrecting the words, correcting the grammar
recognition to its consumers - Dragon Dictation,portion including commas, hyphens, quotes etc
Microsoft (XP, Vista), Google Voice and otherand them uploading them. A very time consuming
niche companies.process.
So now the question arises - How reliable areSo what is the ultimate solution to transcribing
these technologies, particularly Google YouTubefiles if not voice to text recognition technology?
transcription and will they ever compete if notThe answer is simple, the way digital and analog
surpass human transcription accuracy?files have been transcribed for the past 50 years
Those who like to view YouTube videos with- Humans.