DocumentCloud recognized the text in three of four memos from the Obama-Biden transition team we tested with few problems. But the service did have a harder time with italicized text.
In the majority of the documents, It made only minor errors -- reading "scientific" as "scientitic" and "modernize" as "modemize" -- but the mistakes didn't significantly hinder the readability of the document. In all three documents, the errors were 3 percent or less of the total words in the test sample.
It does have an annoying habit of inserting additional paragraph marks, incorrectly breaking up the text. Tweaking the fonts and sizes of the original and processed copy didn't solve the problem.
DocumentCloud's OCR struggled a little more with one of the four documents, which included large swaths of italicized text. It made 14 minor errors in the 134-word passage, though the mistakes ("infrastructure" became "iiumastructure" and "Pacific" became "Pac U'ic") didn't disrupt the readability of the document.