Category Archives: Tech Trails

Leveraging Scrapy for Data Collection in My MMA Prediction Model

In my journey to build an MMA Prediction Model, I quickly realized that data is at the heart of any accurate predictive analysis. To achieve the kind of insights and predictive power I’m aiming for, I need detailed, structured data on each fighter, including stats like strikes, takedowns, and grappling control. That’s where Scrapy, a powerful web-scraping framework, comes into play. With Scrapy, I’m not just collecting data; I’m building the foundation that my prediction model will rely on. In this post, I’ll dive into why Scrapy was my go-to choice, how I’m using it, and some of the challenges I’ve encountered along the way.

Why Scrapy?

There are several web-scraping tools out there, so why did I choose Scrapy for my MMA project? Scrapy stood out because it’s designed to handle large-scale scraping projects with ease. Unlike simpler scraping tools that might be limited to grabbing data from a few pages, Scrapy allows me to build spiders—specialized scripts that can crawl through multiple pages and automatically extract data. This level of automation is crucial because I need to collect stats on hundreds of fighters and bouts, which would be too time-consuming to do manually. Scrapy’s support for pipelines also means I can process and clean data right as it’s collected, making it ready for my model without extra steps.

The Data I’m Collecting

For this MMA prediction model, my goal is to extract detailed data on fighters and fights from a website like UFCStats. Here’s what I’m aiming to collect:

  • Fighter Stats: Details like age, height, reach, stance, and fight record, which are valuable indicators for my model.
  • Fight Metrics: Information on strikes landed, takedowns, submission attempts, and control time, which provide context on the fighter’s performance style.
  • Bout Outcomes: Win/loss records, method of victory (KO, submission, etc.), and round of conclusion, which will serve as the target variable for training my model.

All of this information will be stored in a structured format and transferred to a database, making it easy to query and analyze later for my machine learning algorithms.

How I’m Using Scrapy

To start, I created a spider in Scrapy that navigates through UFCStats’ pages, finds the relevant data, and scrapes it. Here’s how I’ve structured the process:

  1. Defining Spiders: My first spider crawls the list of fighters, gathering basic information and URLs for individual fighter pages. From there, the spider follows these URLs to collect more detailed metrics, like the number of strikes landed per minute or takedown accuracy.
  2. Saving Data in JSON: Scrapy makes it easy to save data in a JSON file, which acts as an intermediate storage. By saving data in JSON, I have a portable, easily accessible file format that I can inspect and validate before transferring it to the database.
  3. Transferring to Database: Once my data is saved in JSON, I use a Python script to load the JSON file and transfer its contents to a database. This extra step ensures that all data is clean and organized before entering the database. It also enables me to easily manage the database structure, creating tables for fighters, bouts, and metrics to ensure optimized storage and retrieval.
  4. Handling Dynamic Content: One challenge I faced was that some pages load data dynamically, which Scrapy can’t handle on its own. To solve this, I integrated Scrapy with Selenium, a browser automation tool, to render the pages and retrieve all the necessary data.

Challenges

While Scrapy is a powerful tool, I’ve encountered a few hurdles along the way. Dynamic content loading was an initial stumbling block, but using Selenium solved this issue. Another challenge was rate-limiting; to avoid overwhelming the server, I configured Scrapy to make requests at a controlled pace and added delays. These steps have not only kept my scraping within ethical boundaries but also ensured that my data collection is reliable and sustainable.

Scrapy isn’t just a data-collection tool; it’s an essential part of my project’s foundation. The data I’m collecting will feed directly into my rule-based and machine learning models. By having structured, comprehensive data on each fighter, my model will be able to learn from historical patterns and, ultimately, make more accurate predictions about future bouts. Working with Scrapy has been a rewarding experience. It’s not only helping me gather the necessary data for my MMA Prediction Model, but also teaching me valuable skills in web scraping and data handling.

Designing an MMA Prediction Dashboard with Figma: A Journey of Visual Punches and Data Jabs

Ever tried to make data punch harder than a knockout? That’s exactly what we’re aiming for in our current project – an MMA prediction model that’s not just about crunching members, but also about making those numbers look cool, clear useful. How do we make that happen? Well, let me introduce you to our secret tool: Figma.

So, here’s the thing: prediction models are only as good as the way they’re presented. I mean, what good is predicting fight outcomes if the insights are stuck in a tangled web of spreadsheets and confusing numbers? That’s where Figma comes in. Using Figma, I’m crafting a dashboard that’s visually appealing and intuitive enough for MMA fans to explore win probabilities, match histories, and key factors affecting fight outcomes-all without needing a PhD in data science. Imagine a user being able to look at a fighter’s win probability while getting visual cues about how past fights or certain fighting styles affect those chances. Clean and clear – just like we want it!

But Figma isn’t just about pretty screens. It’s like a training gym for my designs. We could use its prototyping features to create interactive demos for the dashboard. Think of it as sparring – before the real fight (in our case, coding) begins. By testing different flows – like where users might click to compare fighters or how they might explore stats. I get a good sense of what works and what needs to go back to training. Plus, the whole team can leave comments, and together we figure out if a design idea deserves a title belt or an early tap-out.

One of my favorite parts of Figma has been the component library. It’s like building our toolkit for the octagon. Designing with Figma is turning our data – heavy project into something even a casual MMA fan can enjoy. I think, is what makes the journey worthwhile, turning data into experience that delivers a knockout every time.

Journey into Software Engineering

Greetings! I’m Colin Cheng, Currently navigating the fascinating world of software engineering at Oregon State University (OSU). Residing in Roseburg, Oregon, I balance my studies with a full time job, exploring new technologies and methods that enhance my understanding and capabilities in software development.

Aside from being a student and professional, I am a family man devoted my two pets, Annie and Qunnie. My days are a blend of coding, providing IT solutions, and enjoying leisure activities like gardening and exploring the great outdoors.

My Journey with computers started in high school, fueled by an intense curiosity about how software games are created. This interest evolved over the years, guiding me to pursue a degree in computer science. The transition from gaming to creating software solutions was seamless but filled with challenges and learning curves.

OSU and Beyond

During my time at Oregon State University, I’ve engaged in various projects that have challenged and expanded my understanding of software engineering. One particular project through the CS361 course – a meal planning website called Mealow – has been especially impactful. This project allowed me to explore user-centric design and development deeply. Mealow isn’t just a meal planner; it’s designed to foster healthy eating habits through user-friendly interfaces and personalized meal suggestions. Working on Mealow has provided practical experience in developing intuitive user interfaces that cater to the unique needs and preference of users.

Current Job and internship

Working in IT support has been instrumental in understanding the practical aspects of software and system issue. The real-time problem-solving and user interaction have prepared me well for software engineering’s dynamic nature, where user feedback is crucial.

Favorite Technologies

Lately, I’m deeply engaged with OpenGL for graphics programming, which enhances my ability to render detailed 2D and 3D graphics, curcial not just for gaming but also for creating simulations in various industries like architecture and virtual reality. In web development, I utilized modern tools such as HTML5, CSS, and Javascript frameworks like React, which are essential for crafting responsive and user-friendly web applications. These technologies are pivotal in my projects, merging graphical precision with web functionality to push the boundaries of software development.

Favorite Projects in CS461

Prediction Model – Mixed Martial Arts – As a fan of data science, this project appeals to me. It invovles developing a predictive model that analyzes fighter statistics and fight history to forecast match outcomes, providing insights that are not only valuable for fans but could also be used for training and coaching.

Cloud-Based Algorithmic Trading Strategies for Individual Investors – This project captivates me because it merges finance with technology, enabling individual investors to leverage powerful cloud computing resources to execute sophisticated trading strategies.

Leveraging AI for Improved Public Transit – This project focuses on utilizing AI to enhance the efficiency and reliability of public transit systems. It’s a prime example of how AI can be applied to solve real-world problems, potentially transforming urban mobility by optimizing routes and schedules to improve passenger experiences.

Web Security Research Project – Given the increasing threats to digital security, this project in both timely and essential. It involves researching methodologies to safeguard website from cyber threats, which is crucial for protecting personal and corporate data online.

Crowd-Sourced Travel Planner – This project intrigues me because it combines technology with travel, using crowd-sources data to create dynamic, personalized travel itineraries. It’s a fantastic blend of social interaction and algorithmic data processing.

This blog will serve as a platform to share my experiences, challenges, and triumphs with fellow students and anyone interested in the world of software development. Stay tuned for more updates, tech insights, and personal reflections on navigating the complex yet thrilling world of software engineering.