Tesseract is an open-source optical character recognition engine currently maintained by Google. The software uses a command-line interface. For more on Tesseract's OCR performance, see our review of DocumentCloud, which uses the service.
From the site:
The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google and is probably one of the most accurate open source OCR engines available. Combined with the Leptonica Image Processing Library it can read a wide variety of image formats and convert them to text in over 40 languages.
Oct. 21, 2011//