Although its price tag is steep, Monarch Professional is the best solution we've found so far for converting documents into sortable spreadsheets and has the best potential to save newsrooms countless hours of retyping and data cleaning.
But Monarch isn't a pick-up-and-play application. Unlike other converters we've tested, there's no automatic feature for processing documents. Users must set up a model for every PDF or text file that tells Monarch what the embedded data look like. This strategy allowed the software to handle even our most difficult tests in minutes (and in some cases seconds).
Among its most useful features is the ability to handle one-to-many relationships in tabular data -- for example, when headings are meant to apply to every row below.
Applying these headings correctly often requires a separate step. That's not the case with Monarch, which allows users to set up "append" fields that automatically fill in next to corresponding rows. The software had no trouble performing that task with output from an Access database report.
Monarch can also handle other tricky formatting quirks, like data fields of varying sizes.
For more straightforward documents with simple tables, setting up models capable of capturing the data you want takes only minutes. But navigating more complicated documents requires a familiarity with Monarch's advanced features, which will take some time.
Although there is a learning curve here, Monarch has ample documentation -- along with customer support and community forums -- to help you master the intuitive software quickly.
Monarch is also built for efficient guessing and testing. Before you ever export your information to a text file or Excel spreadsheet, the software shows you the results in a "data view" tab that automatically updates on the fly whenever you make a change to your model. A spellcheck-like "verify" feature even allows you to walk through errors in the data being captured and fix them automatically.
When you're ready for the export step, no software we've tested can match Monarch for speed: Massive spreadsheets more than 10,000 rows long took less than a minute to extract.
It did fail one of our tests -- a PDF with a damaged embedded text layer. Unlike other software we've tested, Monarch has no built-in optical character recognition, so handling such a problem is outside its capabilities.
Monarch's high cost may take it off the table for many newsrooms. But if wasting hours wrangling troublesome public records into workable formats is a regular part of your beat, it will likely pay for itself over time.
11.3.12.0
//July 2011
//$1,350
//No
//Yes
//No
It took less than 30 minutes to guide Monarch through the entire process of perfectly extracting information from this database report of housing violations in Washington, D.C.
READ OUR FULL TEST RESULT »Monarch's flexible system for defining the format of this report of Clinton administration appointments was the perfect solution to tackle the document's separated headers and unlined rows.
READ OUR FULL TEST RESULT »The consistent formatting of this listing of disgraced financier Bernie Madoff's customers made converting the document into a sortable spreadsheet an easy task for Monarch.
READ OUR FULL TEST RESULT »Monarch isn't equipped to handle damaged, corrupted or unrecognized fonts, so any conversion of this table of contributors to Arizona Gov. Jan Brewer's border fence project results in a well-organized but unreadable jumble of meaningless characters.
READ OUR FULL TEST RESULT »Testing
Testing
The Reporters' Lab welcomes relevant discussion from readers, but reserves the right to remove comments flagged as inappropriate or spam. The lab is not responsible for the content of user comments and cannot guarantee their accuracy.