OmniPage

OmniPage tackles tougher documents, unlocking text and spreadsheets

Mastering the learning curve yields largely stellar performance.

Overall:

Excels at PDF conversion, OCR; slight learning curve

Documentation:

How-to, user's guides available in program; release notes dated

Usability:

Takes time, experimentation to learn best practices

Community:

Knowledgebase allows registered users to submit questions, track requests

Performance:

Glitchy with larger documents; robust enough to handle most challenging conversions

Product:

OmniPage

//
Company:

Nuance Communications Inc.

//
Cost:

$149.99-$499.99

If you're a reporter who regularly deals with tricky documents that need text recognition or conversion into sortable spreadsheets, OmniPage is worth the investment of money and time to conquer its quirks. But despite great performance on many of our toughest tests, it's not as ideal for time-strapped journalists looking for a quick solution.

The software presents users with a myriad of options for defining the type of document you're looking to convert, with a particular bend toward optical character recognition. Although this gives OmniPage the flexibility to handle tests no other program could, the added complexity takes time to master.

It's worth noting that OmniPage's performance on our largest test documents was glitchy at times. More than once, the software crashed or hung up in the middle of processing. Nuance representatives said this could be a result of running our tests on a virtual machine on a Mac, which means less processor resources.

There are several ways around this though. The software makes it easy to split conversion into chunks without reloading or splitting your PDF just by selecting the pages you want to process - probably a good idea when your document exceeds 500 pages. Another way is to save work between steps, which allows users to pick up where they left off. Crashes still mean lost time, but not starting from scratch.

Custom options help tackle tables

As a PDF tool, OmniPage matched or exceeded the performance of every other product we've tested, save one document with particularly hard-to-parse table headings. It takes a few tries, however, to get the hang of what works best in different scenarios.

These high marks are a result of an option that allows users to define the document's formatting. In addition to a few pre-defined formats (like "single column, no table" and "spreadsheet"), users can also create templates and set rows and columns while viewing the document. That's particularly helpful when the document is uniform.

With a little upfront work, reporters can use OmniPage's fine-grain control to all but eliminate clean-up in Excel, although this does take some time.

Even after processing the document, OmniPage allows users to browse the result in a split-pane text editor, then make changes and reprocess on the fly -- even by individual pages or selections of pages.

Another one of its built-in capabilities -- overlay matching -- came in handy in another test, where it ignored the garbled, embedded font and instead accurately translated the tabular entries. When users upload documents, OmniPage automatically checks the appearance of the text with the recognized font. If things don't match up, the software disregards the recognized fonts and runs its own OCR to fix it. No other product we've tested has been able to handle this test, meaning OmniPage is ideal for dealing with documents with corrupted or weird embedded fonts.

OCR performance solid

Given its focus on OCR, it's not surprising that OmniPage's accuracy in text recognition is one of the highest we've tested. Our PDF with low-resolution text however, did give the software trouble.

It was at or near flawless in many of our tests for accuracy, and seemed to have trouble only in areas where the scanned material was lower quality.

Much like its ability as a spreadsheet converter, OmniPage also allows users to edit the text it recognizes in a split pane, then save the corrections into the resulting searchable PDF. That's particularly helpful for on-the-fly fixes.

Its OCR proofreader, which pops up automatically during the recognition process, allows users to be a little more methodical about these corrections. Much like a spell checker, it skips to every point where it's not confident about the text it recognized. Users can then manually fix the error, ignore it if it's correct or select a change from a list of suggestions.

With a price point marginally higher than its competitors (at least for the standard version), OmniPage offers an extremely capable product for dealing with documents. Its learning curve makes it less suitable for one-off projects, but if you plan on handling tricky records routinely, OmniPage is the way to go.

NOTE: Although we worked with the professional version of OmniPage ($499.99), the much-cheaper standard version ($149.99) has the same capabilities we tested.

 
Product:

OmniPage

//
Company:

Nuance Communications Inc.

//
Version Tested:

18.0

//
Release Date:

April 23, 2009

//
OS Tested:

Microsoft WIndows 7 x64

//
Cost:

$149.99-$499.99

//
Open Sourced:

No

//
Demo Available:

No

//
Obsolete:

No

 

How OmniPage performed on our tests

Verdict:

Processing with template results in flawless spreadsheet

User-defined templates help OmniPage perfect results

OmniPage did a poor job converting this list of Bernie Madoff's customers with the software's default setting. But because the document was so uniform, quickly defining and applying a template resulted in a spreadsheet that looked identical to the original PDF.

READ OUR FULL TEST RESULT »

Verdict:

Perfect text recognition; works fast

OmniPage error-free in recognizing scanned memo text

OmniPage had no trouble with these scanned memos from the Obama-Biden transition team, flawlessly recognizing the text despite a variety of different scan qualities.

READ OUR FULL TEST RESULT »

Verdict:

Translates garbled text, converts PDF into near-flawless spreadsheet with minimal cleanup

OmniPage accurately converts massive, garbled PDF

It took an hour or so, but OmniPage is the only software we've tested to date that converted this 1,000-page PDF of border fence contributors into a spreadsheet -- reproducing the original almost perfectly. Its OCR feature was even able to handle the weird font that's consistently foiled other programs.

READ OUR FULL TEST RESULT »

Verdict:

Time spent cleaning up program's guesses pays off with flawless table

With help, OmniPage creates organized table from database report

OmniPage's guesswork on the formatting of this PDF database report will only go so far toward converting the document into a sortable table. But a little manual help from the user before the conversion saves hours of data wrangling later and results in a perfect table.

READ OUR FULL TEST RESULT »

Verdict:

Quick text recognition, only minor errors

OmniPage's text recognition of transcripts nearly flawless

OmniPage Pro whipped through these transcripts from combatant tribunals fast, recognizing the text in all but a few areas with lower quality scans.

READ OUR FULL TEST RESULT »

Verdict:

Some misses and false positives; big file takes time, causes glitches

Software crashes bring down passable performance on scanned forms

Processing these scanned disclosure forms from North Carolina legislators is time-consuming with OmniPage, and although it recognized most of the text accurately, a software hangup prompted by this 1,700-page file made the results more difficult to wrangle.

READ OUR FULL TEST RESULT »

Verdict:

Only extensive manual labor and/or programming skills could clean-up this mishmash of messy data

OmniPage not flexible enough to convert unlined table

Although it does give users some ability to customize the results of the spreadsheets it outputs, OmniPage's options aren't much help when it comes to making sense of this unlined PDF of Clinton administration political appointments. Results would take a long time to clean up without programming knowledge.

READ OUR FULL TEST RESULT »

Verdict:

Fails to recognize text; returns only headers, footers

Low-resolution text stumps OmniPage's text recognition

OmniPage completely ignores the relevant text in this low-resolution PDF index of Congress reports containing partial text. Even with ample options for recognizing text, the software only manages to capture page numbers and annotations -- worthless in this context.

READ OUR FULL TEST RESULT »
comments powered by Disqus

The Reporters' Lab welcomes relevant discussion from readers, but reserves the right to remove comments flagged as inappropriate or spam. The lab is not responsible for the content of user comments and cannot guarantee their accuracy.

Testing

Testing