If you're a reporter who regularly deals with tricky documents that need text recognition or conversion into sortable spreadsheets, OmniPage is worth the investment of money and time to conquer its quirks. But despite great performance on many of our toughest tests, it's not as ideal for time-strapped journalists looking for a quick solution.
The software presents users with a myriad of options for defining the type of document you're looking to convert, with a particular bend toward optical character recognition. Although this gives OmniPage the flexibility to handle tests no other program could, the added complexity takes time to master.
It's worth noting that OmniPage's performance on our largest test documents was glitchy at times. More than once, the software crashed or hung up in the middle of processing. Nuance representatives said this could be a result of running our tests on a virtual machine on a Mac, which means less processor resources.
There are several ways around this though. The software makes it easy to split conversion into chunks without reloading or splitting your PDF just by selecting the pages you want to process - probably a good idea when your document exceeds 500 pages. Another way is to save work between steps, which allows users to pick up where they left off. Crashes still mean lost time, but not starting from scratch.
As a PDF tool, OmniPage matched or exceeded the performance of every other product we've tested, save one document with particularly hard-to-parse table headings. It takes a few tries, however, to get the hang of what works best in different scenarios.
These high marks are a result of an option that allows users to define the document's formatting. In addition to a few pre-defined formats (like "single column, no table" and "spreadsheet"), users can also create templates and set rows and columns while viewing the document. That's particularly helpful when the document is uniform.
With a little upfront work, reporters can use OmniPage's fine-grain control to all but eliminate clean-up in Excel, although this does take some time.
Even after processing the document, OmniPage allows users to browse the result in a split-pane text editor, then make changes and reprocess on the fly -- even by individual pages or selections of pages.
Another one of its built-in capabilities -- overlay matching -- came in handy in another test, where it ignored the garbled, embedded font and instead accurately translated the tabular entries. When users upload documents, OmniPage automatically checks the appearance of the text with the recognized font. If things don't match up, the software disregards the recognized fonts and runs its own OCR to fix it. No other product we've tested has been able to handle this test, meaning OmniPage is ideal for dealing with documents with corrupted or weird embedded fonts.
Given its focus on OCR, it's not surprising that OmniPage's accuracy in text recognition is one of the highest we've tested. Our PDF with low-resolution text however, did give the software trouble.
It was at or near flawless in many of our tests for accuracy, and seemed to have trouble only in areas where the scanned material was lower quality.
Much like its ability as a spreadsheet converter, OmniPage also allows users to edit the text it recognizes in a split pane, then save the corrections into the resulting searchable PDF. That's particularly helpful for on-the-fly fixes.
Its OCR proofreader, which pops up automatically during the recognition process, allows users to be a little more methodical about these corrections. Much like a spell checker, it skips to every point where it's not confident about the text it recognized. Users can then manually fix the error, ignore it if it's correct or select a change from a list of suggestions.
With a price point marginally higher than its competitors (at least for the standard version), OmniPage offers an extremely capable product for dealing with documents. Its learning curve makes it less suitable for one-off projects, but if you plan on handling tricky records routinely, OmniPage is the way to go.
NOTE: Although we worked with the professional version of OmniPage ($499.99), the much-cheaper standard version ($149.99) has the same capabilities we tested.
18.0
//April 23, 2009
//$149.99-$499.99
//No
//No
//No
OmniPage did a poor job converting this list of Bernie Madoff's customers with the software's default setting. But because the document was so uniform, quickly defining and applying a template resulted in a spreadsheet that looked identical to the original PDF.
READ OUR FULL TEST RESULT »OmniPage had no trouble with these scanned memos from the Obama-Biden transition team, flawlessly recognizing the text despite a variety of different scan qualities.
READ OUR FULL TEST RESULT »It took an hour or so, but OmniPage is the only software we've tested to date that converted this 1,000-page PDF of border fence contributors into a spreadsheet -- reproducing the original almost perfectly. Its OCR feature was even able to handle the weird font that's consistently foiled other programs.
READ OUR FULL TEST RESULT »OmniPage's guesswork on the formatting of this PDF database report will only go so far toward converting the document into a sortable table. But a little manual help from the user before the conversion saves hours of data wrangling later and results in a perfect table.
READ OUR FULL TEST RESULT »OmniPage Pro whipped through these transcripts from combatant tribunals fast, recognizing the text in all but a few areas with lower quality scans.
READ OUR FULL TEST RESULT »Processing these scanned disclosure forms from North Carolina legislators is time-consuming with OmniPage, and although it recognized most of the text accurately, a software hangup prompted by this 1,700-page file made the results more difficult to wrangle.
READ OUR FULL TEST RESULT »Although it does give users some ability to customize the results of the spreadsheets it outputs, OmniPage's options aren't much help when it comes to making sense of this unlined PDF of Clinton administration political appointments. Results would take a long time to clean up without programming knowledge.
READ OUR FULL TEST RESULT »OmniPage completely ignores the relevant text in this low-resolution PDF index of Congress reports containing partial text. Even with ample options for recognizing text, the software only manages to capture page numbers and annotations -- worthless in this context.
READ OUR FULL TEST RESULT »Testing
Testing
The Reporters' Lab welcomes relevant discussion from readers, but reserves the right to remove comments flagged as inappropriate or spam. The lab is not responsible for the content of user comments and cannot guarantee their accuracy.