Python Web Scraping Part: 3

iOSTom
Python + Data Science + Web
4 min readJan 25, 2020

--

In my last two articles I talked about web scraping with python, how to set up a python development environment, scrapy, beautifulSoup and rotating proxies. Along my web scraping journey I have learned it takes a lot to build a un-blockable scraper and that using a third party library might be useful. Below are some great web scraping tools that handle server responses gracefully and let your scraper do what it was programmed todo.

zenscrape — https://zenscrape.com/documentation

parsehub — https://parsehub.com/docs/ref/api/v2/

proxycrawl — https://proxycrawl.com/dashboard

scrapingBee — https://www.scrapingbee.com/documentation/

ApiFull — https://apifull.com/docs/

WrapAPI — https://wrapapi.com/

ScraperAPI — https://www.scraperapi.com/documentation

zenscrape

Zencscrap offers a monthly subscription ranging from free to XXXXL packages. With the free plan you get around 10,000 request, small package 75,000 and the largest package you get 30,000,000 requests. Pretty sure zenscrape has a package that will meet your needs.

You can spend your requests either by using the API or gui. Either way you are limited to 10 scrappers and one data extractor. To start off you need to create a scraper

Click, “Create Scraper”

Name your bot and enter the url you would like to scrape

Scraper Name and URL to scrape

The results are the display

Results page

All you have todo is click on the data that you would like to extract.

What I like about zenscrape is that after the bot is run it stores the results in the results folder, this is very convenient because the scraper and data in is in a central location.

Above is the data in .csv format for the data I scraped from https://tommarler.org

I really like the gui option, downfall is that you have to manually click on the data you want to extract and manually build your scrapers. If you have a large amount of data maybe zenscrape’s api is a better fit. Zenscrape’s api is pretty simple, they provide an endpoint to check your account status and another to fetch content.

Simple python request to check status account status.

From Docmentation

The response data

I know on the gui I could not pass a url that contained a query parameter, does not seem to be an issue when using the API.

Scraped specific data from linkedin using google dorks.

In my opinion the API is the way togo you have the ability to build some pretty powerful search queries and extract the data you need quickly. Even though zenscrape is very powerful and useful stay tuned because I still have to talk about six more web scraping tools. So if you are a recruiter trying to find candidates on linkedin or needing to scrape email addresses, stay tuned.

If you missed my last articles check them out below, also if you liked this articles give it a couple of claps or leave a comment below.

--

--

iOSTom
Python + Data Science + Web

iOS Developer, Go, Java, C#, Blockchain enthusiast, Data junkie