Processing these scanned disclosure forms from North Carolina legislators is time-consuming with OmniPage Pro, and although it recognized most of the text accurately, a software hangup prompted by this 1,700-page file made the results more difficult to wrangle.
OmniPage took its time with each stage of the process, from opening the document to converting it and saving it in PDF format. OCR processing alone took about an hour and a half. But when saving the results into PDF format, the system stalled, prompting a force close. This may be a result of running the program with a virtual machine on a Mac, which means less processing resources.
When we were finally able to wrestle a result from the program, our test showed its text recognition only missed one instance of our search terms. In this case, a poorly scanned table line meant the term - "BB&T" - was mixed with bullet points and dashes.
Searches did return several false positives, mostly caused by smaller type often obscured by shading from the messy scan.