The Your Seat at the Table documents provide a treasure trove of information supplied to the Obama-Biden transition team from lobbyists, governmental organizations, citizens and other sources. All of these documents are available in PDF format, but not of all of them appear as searchable text, which reporters would need if they were looking for specific information contained in reams of pages.
Download the following specific documents from the overall set and run them through OCR software to make the text searchable. All documents are typed, and aside from a few basic formatting challenges (italics, for example), should be a relatively easy task for most software.
DESIRED OUTCOME: Samples from the output should match four pre-selected samples that have already been transcribed into text.
To test the accuracy of your output, first download the comparison sample file, where you'll find the locations of the selections in the documents above. Copy these selections from your results and paste them directly into a new text document. Then run "Compare Documents" (available in most word processors) using the sample file and your results file.
The OCR software's performance on this task should be judged based on how well the two documents match up.
OmniPage had no trouble with these scanned memos from the Obama-Biden transition team, flawlessly recognizing the text despite a variety of different scan qualities.
READ OUR FULL TEST RESULT »ABBYY FineReader 11 easily converted scanned memos submitted to the Obama-Biden Transition team to searchable PDFs, getting the text mostly right.
READ OUR FULL TEST RESULT »This collection of scanned-in memos from the Obama-Biden transition wasn't much of a match for Adobe Acrobat, which recognized the text with only a few minor errors. It even handled italics well.
READ OUR FULL TEST RESULT »Able2Extract takes these scanned in memos and quickly and reliably makes them searchable.
READ OUR FULL TEST RESULT »DocumentCloud recognized the text in three of four memos from the Obama-Biden transition team we tested with few problems. But the service did have a harder time with italicized text.
READ OUR FULL TEST RESULT »Due to its file size limit, Google Drive was only able to recognize text in half the memos we tested from the Obama administration's Your Seat at the Table site.
READ OUR FULL TEST RESULT »Testing
Testing