Beyond APIs: Web Scraping

A small exploration to break up the semester’s work

When I first discovered API’s on the web, my world felt expanded. As my need for different kind of data increased, however, I started to feel limited by the available API’s I could find. That is how my interest was peaked by web scraping. I was familiar with the concept, but my assumption was that it would be difficult to learn and I put it off for later. I was wrong, it was much more accessible than I imagined using pre-existing Python frameworks and unlocked buckets of possibilities for acquiring new information all across the web.

Options

Before even diving into web scraping with Python. There was the question of which framework to start with. Beautiful Soup was the biggest name out of Selenium and Scrapy. Selenium has the reputation of having more set up/ learning curve, while Scrapy has more built in features such as being able to make requests and parsing html more specifically than Beautiful Soup. Ultimately, I opted to start with Scrapy, but I am interested in trying all these options at some point!

Experiment

The YouTube tutorial based project I coded up scraped images of otters from the reddit otter subreddit. Then, the pixels are mapped to ASCII characters and printed to the console. Scrapy was ridiculously easy to use. After initializing the project from the command line the set up was simply importing Scrapy into the file and specifying a start URL. The request and filtering to get the source tags with specific alt text happened all in one single command. That was all there was to it.

Overall, I was pleased with the experience, but the results need tweaking. I got some repeat images which wasn’t desirable. The ASCII art little unclear, but I have some direction for what I can change to improve.

Can you see the otter?

Side Project Importance

I convinced myself that I did not have time for even a small side project. I was right, but I think I made the right call by spending time working on this. Even though web scraping is not related to current school projects, it was refreshing to try something new. I already have ideas for how I can use this technique in future projects that I have planned out and I can’t wait to see what other creative things others are doing with web scraping now that that fire is lit.

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published.