[Editor's note: The Needlebase team announced its technology would be retired June 1, 2012, as team members work to integrate it with Google. Read more on the future of Needlebase and the state of Web scraping solutions here. -TD]
If you can afford the cost and the time to handle the learning curve, Needlebase can solve most common scraping problems in the newsroom, aside from gathering and downloading files.
This isn't the type of product you can just pick up and play. It takes a little time to figure out how to set up models for data, which are oriented around loosely connected tags.
Mediocre documentation doesn't really speed up that learning curve. Its video tutorial is helpful, but features an older version with a slightly different interface. Several text tutorials appear to be more up-to-date and helpful. There's not much of a user community to go to for help either -- the forum only had about 80 posts at the time of this review.
After users master the basics though, Needlebase becomes a powerful tool. Its real power comes from building a custom scraper with its visual interface. Just provide the URL of the page you want to scrape and let it load. You can then begin tagging.
The interface is mostly intuitive. It can handle form fields and links for pagination with simple clicks, and Needlebase almost always guesses what users need correctly.
It also handles detail pages well. After telling it what links to follow, the application gets a good idea of what to scrape.
Needlebase performed well on our scraping tasks, although it was unable to handle a site that required browser cookies. This is an unfortunate side effect of a hosted solution, and is hard to troubleshoot.
While the system isn't designed to gather files like PDFs or images, Needlebase will collect links to these files, making it easier to use another tool to download what you need.
Despite its strengths, many news organizations may find Needlebase too expensive. Users are charged on a per-cell basis, so larger databases mean larger fees. The free version will let users collect 100,000 cells from 5,000 pages a month, but data must be made public. Costs peak at $999 a month.
Free to $999/Month//
With its ability to automate searches and handle multilevel databases, Needlebase can easily root through and capture this collection of teacher information.READ OUR FULL TEST RESULT »
Needlebase effortlessly pulls down structured data from the South Dakota lobbyist database within minutes.READ OUR FULL TEST RESULT »
Needlebase proves more than adequate in tagging and saving the metadata associated with these Obama transition team documents. But it wasn't able to download the PDFs and missed some information.READ OUR FULL TEST RESULT »
Needlebase was simply not able to perform any part of this test, failing to connect to the page at all. But since it's a hosted solution, it's hard to figure out why.READ OUR FULL TEST RESULT »