| Is it really possible to get high OCR | | | | Noise removal of borders, speckles and skews |
| accuracy from poor quality documents? | | | | are now common on the more advanced document |
| | | | scanners. |
| Optical Character Recognition (OCR) refers to | | | | |
| a software technology and processes that | | | | Furthermore, advanced color filter |
| involve the translation of printed text into | | | | technologies may be used to reduce any page |
| computer searchable text. | | | | background colors, in conjunction with |
| | | | multi-light image capture technologies to |
| Done correctly, OCR enables users to search | | | | remove any shadows cast by page creases that |
| for and retrieve individual words contained | | | | could impact image quality or recognition |
| within a file or page. In addition, when a | | | | accuracy. |
| set of files is indexed, users are able to | | | | |
| search for keywords across an entire document | | | | Once document scanning and processing are |
| library and retrieve each page with exact | | | | complete, an OCR text layer can actually be |
| precision. OCR enables users to execute | | | | added and hidden behind each image. An |
| searches in seconds, searches that once could | | | | additional orientation filter can be used to |
| take several hours or days to complete. | | | | ensure that the best image is presented to |
| | | | the OCR engines. |
| However, this technology did not work well on | | | | |
| older or poor quality documents that | | | | To achieve the highest conversion accuracy |
| contained mixed fonts or combinations of | | | | possible, the characters in the image can be |
| texts and graphics. Until now!! | | | | processed using multi-engine OCR voting |
| | | | technologies that rank each character to |
| Due to several recent technology advances, it | | | | determine the best text recognition fit. Then |
| is now possible to obtain six-sigma level | | | | once a word is generated, it will be filtered |
| character accuracy from these types of | | | | through a proprietary lexicon to ensure the |
| document collections. | | | | highest quality results. |
| | | | |
| Although it is important to keep in mind that | | | | Finally, this text can be processed utilizing |
| the quality and condition of the paper | | | | sophisticated layout retention technologies |
| documents are still key factors in the | | | | to represent the image text layout, to |
| successful OCR conversion, dramatically | | | | provide the best possible text representation |
| improved results can be obtained by enhancing | | | | for precise search and retrieval. After all, |
| the quality of the scanned image prior to | | | | isn't that why they call it Optical Character |
| processing. | | | | Recognition? |
| | | | |