Find a tool
Outwit Hub Pro is an intuitive scraper that excels at simpler scraping tasks, but a steep learning curve for complex datasets may prove too daunting for novice data journalists.
Pdftotext does a great job of speedily converting PDF documents into delimiter-friendly text files, making it a valuable addition to every journalist's toolbox. However, the spartan functionality of this command-line software means it's ill-suited for documents with multiple headers and complex layouts.
Although its price tag is steep, Monarch Professional is the best solution we've found so far for converting documents into sortable spreadsheets and has the best potential to save newsrooms countless hours of retyping and data cleaning.
Although deskUNPDF is capable of a few simple tasks, using the program to convert documents into spreadsheets creates more problems than it solves.
Helium Scraper's easy-to-use interface, robust customization and excellent customer support makes it an excellent tool for even the most complex scraping tasks, despite one caveat: its comparatively steep entry price and long list of features demand a modest investment.
For journalists on a mission to convert PDF files to data spreadsheets, Zamzar's free conversion application won't get the job done in many cases because of its tendency to split data over multiple sheets.
This spreadsheet-scraping Chrome extension can import basic spreadsheets, but little else. Unless you're a Google Doc die-hard, stick with Table2Clipboard, its more functional Firefox equivalent.
What Data Toolbar does, it does well. Give it multiple pages of consistent data and it will draw the records in cleanly and quickly and deliver easily imported, delimited files. It had no problem going through multiple pages of results and did not skip records. But it can't automate input of search terms, and it fumbles when record fields are inconsistent.
Although Scraper perfectly grabs information from a single Web page and produces a spreadsheet ready for export, the free Chrome extension isn't helpful for reporters looking to scrape more complicated databases that would require any sort of automation.
This free Firefox extension does a nice job copying and pasting HTML tables into spreadsheets, but it wasn't designed to help with more difficult Web scraping tasks that require navigating through multiple pages or searchable forms.