Document: British Columbia physician and surgeons database

Scrape database requiring browser cookies

Difficulty:

Lack of consistent URLs and multiple levels will throw off most scrapers

Scrape this searchable database of doctors and surgeons to create a listing of all physicians in British Columbia.

Since the URLs are generated based on a browser cookie, scrapers must be able to maintain this cookie to return information. Products also have to handle multiple levels of data, starting at the search results page and drilling down into the detail pages.

Search can only return 200 results at a time, which may be a challenge for some scrapers.

DESIRED OUTCOME: Generate a database or spreadsheet of all doctors and surgeons, including details for each.

 

Test Results

Verdict:

Captures data from multiple detail pages; requires manual guidance to overcome search limits

Search limitations pose problems for Helium Scraper

Helium Scraper is able to identify and extract all of the data elements from this finicky, multi-level database of physicians, but it can't bypass the site's search result limitations without some user intervention.

READ OUR FULL TEST RESULT »

Verdict:

Needs major user assistance for form-based searches, limited record returns; failed to grab detail data

Website with limited search returns stymies Data Toolbar

Data Toolbar could not get past the search limitations of this database, a listing of physicans and surgeons in British Columbia that restricts results to 200 records at a time. It also failed to grab information from detail pages.

READ OUR FULL TEST RESULT »

Verdict:

Success requires basic coding knowledge, additional software

With some outside help, Outwit tackles results-limited database

With a slight workaround and some manual guidance, Outwit can scrape this database of physicians and surgeons in British Columbia.

READ OUR FULL TEST RESULT »

Verdict:

Wrong software for task

Table2Clipboard can't help with doctors database

Because Table2Clipboard only assists in copying and pasting tables from the Web to a spreadsheet, it's incapable of navigating through the BC physician's database.

READ OUR FULL TEST RESULT »

Verdict:

Can't scrape without extensive manual work; can't grab details

Scraper unable to tackle database with complex structure

Scraper required a lot of hand-holding to capture even a single page of data from this directory of physicians in British Columbia. But even with manual help, its navigation limitations make it a bad choice for the task.

READ OUR FULL TEST RESULT »

Verdict:

Fails to connect to site, returning no results

Needlebase won't connect to site, preventing scrape

Needlebase was simply not able to perform any part of this test, failing to connect to the page at all. But since it's a hosted solution, it's hard to figure out why.

READ OUR FULL TEST RESULT »

Verdict:

Wrong software for this task

Table Capture can't scrape BC doctor's database

Because Table Capture was designed for single Web pages with HTML tables, it was stumped by this database of physicians in British Columbia.

READ OUR FULL TEST RESULT »

Testing

Testing