Document: British Columbia teacher database

Scrape database with form-based search

Difficulty:

Multiple levels of data may challenge some scrapers

Build a listing of all certified teachers in British Columbia by scraping the province's searchable online database.

There's no way to list all the records at once, so a scraper should be able to search using an array of terms entered through the search form -- not through the URL. This may be a challenge for some scrapers unable to enter terms through POST.

Scrapers must then navigate through multiple levels to return data from main pages and detail pages by triggering javascript.

DESIRED OUTCOME: Create a database of all teachers in British Columbia, including contact information and other details.

 

Test Results

Verdict:

Easily downloads content using search forms, master/detail pattern

Needlebase plows through search fields to perfectly copy database

With its ability to automate searches and handle multilevel databases, Needlebase can easily root through and capture this collection of teacher information.

READ OUR FULL TEST RESULT »

Verdict:

Advanced functions required to tackle search form; data clean-up required

Helium combs search-based database, but minor clean-up needed

Although Helium Scraper is technically capable of pulling the information from this form-based database of teachers in British Columbia, getting there required a custom scraping solution from one of the program's developers.

READ OUR FULL TEST RESULT »

Verdict:

Scraping difficult for novice data journalists; requires additional software

Outwit's scraping job hindered by limitations of teacher database website

Outwit Hub is capable of scraping this dynamically-rendered database of B.C. Teachers, but the website inexplicably stopped responding to Outwit's queries in the middle of the test.

READ OUR FULL TEST RESULT »

Verdict:

Inability to automate search terms makes scraping clunky and slow, but serviceable; data capture incomplete

Data Toolbar hindered by form-based search, errors

Data Toolbar can't automate the entering of search terms, so it's a poor choice for scraping pages with form-based searches. But in some cases it could produce results, with some help from the user.

READ OUR FULL TEST RESULT »

Verdict:

Can't navigate through form; manual results copy, but without links

Table2Clipboard no help in getting at form-based teacher data

Table2Clipboard was not created to help query data on a form-based site, so the task of scraping this database of teachers was too complex -- and it was only able to copy some of the content we wanted after performing manual searches.

READ OUR FULL TEST RESULT »

Verdict:

Can't scrape search forms, multi-page databases

Table Capture no match for form-based teacher registry

While Table Capture can capture single HTML tables resulting from manual database searches, it was not designed to scrape multiple pages or search form databases like this teacher registry from British Columbia.

READ OUR FULL TEST RESULT »

Verdict:

Can't handle multiple pages, detail pages

Scraper fails to capture information from layered database

Scraper was able to capture some data from result pages after we manually entered a name search in this teacher database, but it doesn't have the ability to navigate through multiple pages, making it unsuitable for this task.

READ OUR FULL TEST RESULT »

Testing

Testing