Hello ! This blog will introduce you, the potential new user, to Scrapy.
Scrapy is an open source project in Python for web crawling and web scraping. I have personally used this website to scrape data in mass and create price / in stock alerts for certain items I wish to buy. I have found Scrapy easy to use and generally a solid open source project to support.The one feature I really wanted / would like to see was documentation in video form. I personally learn better with the video format and have created this blog and accompanying videos to help others in the same boat.
Official Scrapy: https://github.com/scrapy/scrapy
Official Scrapy Documentation: https://docs.scrapy.org/en/latest/
Part 1 – Install and run first scrape:
If you followed the above video correctly, you should see two new files created: quotes-1.html and quotes-2.html. You will notice the two output files are HTML files. In next section, we will move into data extraction.
Part 2 – Extracting data to JSON file:
From here, you could feed the JSON file into an alert system or any other program to suit your needs.
But wait, what if I want to extract data over numerous pages and do not want to set the URLS in start_urls??
Part 3 – How to extract data recursively:
This methodology is useful for extracting data from sites with numerous pages such as government websites with 50+ pages.
As you can see, Scrapy is an easy to use tool for web scraping and web data extraction use. I hope you consider Scrapy for your next project. Please visit the official website and official documentation page linked above for additional info!