Helium Scraper's easy-to-use interface, robust customization and excellent customer support make it an excellent tool for even the most complex scraping tasks. But there is one caveat: its comparatively steep entry price and long list of features demand a modest investment.
Helium allows novice data journalists to create scraping operations through an intuitive user interface broken down into three parts. "Kinds" allows you to point and click to select the data you want, "actions" allows you to add different automated actions to the task and "database" allows you to manage the extraction and eventual export of the data.
The simplicity of this interface belies Helium's power and versatility. Kinds can be updated on the fly to include tricky elements, and once calibrated, it flawlessly navigated thousands of records at a time in our tests. The creation of custom "action trees" meant fully automated jobs that allowed Helium to navigate our most difficult tests with relative ease.
After tinkering with the program for an hour or two, we were able to create a scraper that identified and scraped specific elements within hundreds of results pages. After a few more hours of tinkering, we had a scraper that could navigate a search form, drill down several levels of information, scrape a select number of elements into a table, then repeat the entire process again with a different, predetermined search term.
That being said, it can't initially get around obstacles like search result limitations. But with some patience, journalists can still save time on jobs like these in most cases.
Despite an attractive interface, the significant amount of time journalists must invest into learning the program may be the biggest drawback. It even features further customization options for experienced data journalists -- custom extraction options and the ability to implement custom JavaScript operations, for example -- which we didn't address in our testing. We spent roughly eight hours with this program during testing, combing through robust documentation that includes both text and video tutorials. But we still resorted to trial and error, and in some cases we eventually reached out to tech support.
In that respect, you get what you pay for. In less than 12 hours, developer Juan Soldo had solved our problem, going so far as to provide us with a sample template. Still, that quick time frame may not be feasible for journalists on a tight deadline. And that premium level of support expires with the basic package after a month (although the support forums are still available).
Helium Scraper is everything you'd expect from a premium scraper tool. Be warned, though: it demands a time investment and financial commitment, but one we believe will be worth it in the long run.
2.3.6.2
//2011
//$149
//No
//Yes
//No
Helium Scraper was able to download all 186 PDFs in this database of memos from the Obama-Biden transition team, despite the odd structure of the website.
READ OUR FULL TEST RESULT »Helium Scraper easily recognizes the table-formatted South Dakota lobbyist registry and is capable of automatically navigating and extracting all of the 8,000-plus entries.
READ OUR FULL TEST RESULT »Although Helium Scraper is technically capable of pulling the information from this form-based database of teachers in British Columbia, getting there required a custom scraping solution from one of the program's developers.
READ OUR FULL TEST RESULT »Helium Scraper is able to identify and extract all of the data elements from this finicky, multi-level database of physicians, but it can't bypass the site's search result limitations without some user intervention.
READ OUR FULL TEST RESULT »Testing
Testing
The Reporters' Lab welcomes relevant discussion from readers, but reserves the right to remove comments flagged as inappropriate or spam. The lab is not responsible for the content of user comments and cannot guarantee their accuracy.