DocumentCloud only spotted about half the references of organizations mentioned in these memos from the Obama-Biden transition team -- although it performed better than our annotator on those it did catch.
For three of the six government agencies and organizations, it spotted all of the references from the test criteria and then some, picking up others completely missed by our own test creators. For example, it accurately found 64 references to "Ducks Unlimited" -- the test required it find only 43.
However, the other three groups it missed completely. Some, like the Global Privacy and Information Quality Working Group or its abbreviation, GPIQWG, are hidden in a muddle of acronyms. But it also completely missed "U.S. Department of Justice."
According to developer Jeremy Ashkenas, DocumentCloud's entity analysis service, OpenCalais, uses a complex set of lists and rules to identify names and organizations, and they may simply miss some entities.
A second pass through the system produced identical results.