A review of Outwit Hub Pro

Outwit Hub Pro can scrape almost any database, but learning curve steepens with complex datasets

Outwit Hub Pro is an intuitive scraper that excels at simpler scraping tasks, but a steep learning curve for complex datasets may prove too daunting for novice data journalists.

A review of Pdftotext (Xpdf)

Pdftotext excels at extracting data from conventional tables, stumbles with more complex tasks

Pdftotext does a great job of speedily converting PDF documents into delimiter-friendly text files, making it a valuable addition to every journalist's toolbox. However, the spartan functionality of this command-line software means it's ill-suited for documents with multiple headers and complex layouts.

A review of Monarch Professional

For Monarch, high cost translates to unmatched spreadsheet conversion

Although its price tag is steep, Monarch Professional is the best solution we've found so far for converting documents into sortable spreadsheets and has the best potential to save newsrooms countless hours of retyping and data cleaning.

A review of deskUNPDF

DeskUNPDF stumbles with complex formatting

Although deskUNPDF is capable of a few simple tasks, using the program to convert documents into spreadsheets creates more problems than it solves.

A review of Helium Scraper

Helium Scraper a powerful extraction tool that rewards patience

Helium Scraper's easy-to-use interface, robust customization and excellent customer support makes it an excellent tool for even the most complex scraping tasks, despite one caveat: its comparatively steep entry price and long list of features demand a modest investment.

A review of Zamzar

Zamzar's data conversion function is subpar

For journalists on a mission to convert PDF files to data spreadsheets, Zamzar's free conversion application won't get the job done in many cases because of its tendency to split data over multiple sheets.

A review of Table Capture

Table Capture hindered by limited functionality, wonky UI

This spreadsheet-scraping Chrome extension can import basic spreadsheets, but little else. Unless you're a Google Doc die-hard, stick with Table2Clipboard, its more functional Firefox equivalent. 

A review of Data Toolbar

Data Toolbar handles multiple pages; needs help on harder tasks

What Data Toolbar does, it does well. Give it multiple pages of consistent data and it will draw the records in cleanly and quickly and deliver easily imported, delimited files. It had no problem going through multiple pages of results and did not skip records. But it can't automate input of search terms, and it fumbles when record fields are inconsistent. 

A review of Scraper

Scraper's use limited to single, well-formatted Web pages

Although Scraper perfectly grabs information from a single Web page and produces a spreadsheet ready for export, the free Chrome extension isn't helpful for reporters looking to scrape more complicated databases that would require any sort of automation.

A review of Table2Clipboard

Table2Clipboard won't help with heavy lifting

This free Firefox extension does a nice job copying and pasting HTML tables into spreadsheets, but it wasn't designed to help with more difficult Web scraping tasks that require navigating through multiple pages or searchable forms.

PDF tools

Convert PDFs into spreadsheets capable of sorting and filtering information, as well as other formats

Scrapers

Gather data or documents from online sources and convert them into more useable formats

Project/document management

Allows collaboration with multiple users to organize and edit documents and coordinate large projects

Entity extraction

Identifies and tags words and phrases in certain categories like names, places, groups, etc. for easier analysis of documents

Speech recognition

Recognizes speech directly or within audio files and transcribes to text

OCR

Optical character recognition software converts images into selectable, searchable text

Transcription tools

Helps users more efficiently transcribe audio and video using advanced playback controls combined with text editors

Data cleaning

Reformats, reorganizes or corrects inconsistencies in raw data for easier analysis

Text analysis

Use a variety of techniques, like entity extraction or machine learning, to examine text for patterns and trends

Network analysis

Plots and visualizes connections between entities like people, companies or political groups

Testing

Testing