| Is it really possible to get high OCR accuracy | | | | Noise removal of borders, speckles and skews |
| from poor quality documents? | | | | are now common on the more advanced |
| Optical Character Recognition (OCR) refers to a | | | | document scanners. |
| software technology and processes that involve | | | | Furthermore, advanced color filter technologies |
| the translation of printed text into computer | | | | may be used to reduce any page background |
| searchable text. | | | | colors, in conjunction with multi-light image capture |
| Done correctly, OCR enables users to search for | | | | technologies to remove any shadows cast by |
| and retrieve individual words contained within a file | | | | page creases that could impact image quality or |
| or page. In addition, when a set of files is indexed, | | | | recognition accuracy. |
| users are able to search for keywords across an | | | | Once document scanning and processing are |
| entire document library and retrieve each page | | | | complete, an OCR text layer can actually be |
| with exact precision. OCR enables users to | | | | added and hidden behind each image. An additional |
| execute searches in seconds, searches that once | | | | orientation filter can be used to ensure that the |
| could take several hours or days to complete. | | | | best image is presented to the OCR engines. |
| However, this technology did not work well on | | | | To achieve the highest conversion accuracy |
| older or poor quality documents that contained | | | | possible, the characters in the image can be |
| mixed fonts or combinations of texts and | | | | processed using multi-engine OCR voting |
| graphics. Until now!! | | | | technologies that rank each character to |
| Due to several recent technology advances, it is | | | | determine the best text recognition fit. Then once |
| now possible to obtain six-sigma level character | | | | a word is generated, it will be filtered through a |
| accuracy from these types of document | | | | proprietary lexicon to ensure the highest quality |
| collections. | | | | results. |
| Although it is important to keep in mind that the | | | | Finally, this text can be processed utilizing |
| quality and condition of the paper documents are | | | | sophisticated layout retention technologies to |
| still key factors in the successful OCR conversion, | | | | represent the image text layout, to provide the |
| dramatically improved results can be obtained by | | | | best possible text representation for precise |
| enhancing the quality of the scanned image prior | | | | search and retrieval. After all, isn't that why they |
| to processing. | | | | call it Optical Character Recognition? |