Introduction To Scrapy

Hello ! This blog will introduce you, the potential new user, to Scrapy.

Scrapy is an open source project in Python for web crawling and web scraping. I have personally used this website to scrape data in mass and create price / in stock alerts for certain items I wish to buy. I have found Scrapy easy to use and generally a solid open source project to support.The one feature I really wanted / would like to see was documentation in video form. I personally learn better with the video format and have created this blog and accompanying videos to help others in the same boat.

Official Scrapy: https://github.com/scrapy/scrapy

Official Scrapy Documentation: https://docs.scrapy.org/en/latest/

Part 1 – Install and run first scrape:

If you followed the above video correctly, you should see two new files created: quotes-1.html and quotes-2.html. You will notice the two output files are HTML files. In next section, we will move into data extraction.

Part 2 – Extracting data to JSON file:

From here, you could feed the JSON file into an alert system or any other program to suit your needs.

But wait, what if I want to extract data over numerous pages and do not want to set the URLS in start_urls??

Part 3 – How to extract data recursively:

This methodology is useful for extracting data from sites with numerous pages such as government websites with 50+ pages.

As you can see, Scrapy is an easy to use tool for web scraping and web data extraction use. I hope you consider Scrapy for your next project. Please visit the official website and official documentation page linked above for additional info!

Leave a comment

Your email address will not be published. Required fields are marked *