All posts by Colin Cheng

Scraping Data and Syncing Up in the MMA Prediction Model Project

The capstone journey has been exciting, and our project, the MMA (Mixed Martial Arts) Prediction Model, is steadily taking shape. We recently completed the first big milestone: scraping raw data from the UFC Stats Website. Now, we’re gearing up for the next phase-preprocessing this raw data into a format suitable for analysis. But today, I want to reflect on the process so far, including the challenges, the tools we’re using, and how asynchronous communication through Discord has played a key role in our collaboration.

The Data Scraping Stage

Scraping data was both a challenging and rewarding task. We utilized the Scrapy library, a Python-based tool perfect for web scraping, to extract fighter statistics, fight outcomes, and other key metrics. Setting up Scrapy on my iMac was surprisingly smooth, thanks to its compatibility with the Unix-like systems. However, the process wasn’t without its hurdles. One challenge we faced was ensuring the scraper could handle dynamic elements on the UFC Stats site without breaking. After a bit of trial and error (and some helpful documentation), we fine-tuned the scraper to run efficiently and collect clean, structured data.

Preparing for Data Preprocessing

Now that the raw data is in hand, our next step is preprocessing. This stage will involve cleaning data, dealing with missing values, and transforming the dataset into a format our prediction algorithms can digest. It’s a critical step that will set the foundation for accurate predictions. I anticipate some interesting discussions with my teammates as we decide how to handle outliers and structure our data models.

Asynchronous Communication via Discord

One of the biggest takeaways from this project so far is how effectively our team has utilized asynchronous communication on Discord. With everyone juggling their own schedules-work, school, and other commitments-having a central hub for updates, questions, and discussions has been a game-changer.

Here’s how Discord has worked for us:

  1. Channels for Organization: We created separate channels for different aspects of our project: general, resources, meeting notes.
  2. Sharing Code via GitHub: Discord complements our GitHub repository beautifully. Whenever someone makes changes or pushes updates, they drop a quick note in the Discord, ensuring everyone is on the same page.
  3. Async Flexibility: The asynchronous nature of Discord allows us to contribute on our own schedules. Whether it’s dropping a quick idea or reviewing someone else’s code, we don’t have to coordinate real-time meetings to make progress.

Working on this project has highlighted how technology not only powers our tools but also shapes our collaboration. While we have different skill sets, we’ve come together to create something that feels cohesive and purposeful. Looking ahead, I’m eager to see how our preprocessing stage pans out and how our prediction model begins to take shape. I imagine there will be plenty of debugging, brainstorming, and learning as we move forward, but with the momentum we’ve built and the communication tools we’re using, I’m confident in our team’s ability to deliver.

Thanks for reading, and stay tuned for more updates as we continue our MMA prediction journey!

Leveraging Scrapy for Data Collection in My MMA Prediction Model

In my journey to build an MMA Prediction Model, I quickly realized that data is at the heart of any accurate predictive analysis. To achieve the kind of insights and predictive power I’m aiming for, I need detailed, structured data on each fighter, including stats like strikes, takedowns, and grappling control. That’s where Scrapy, a powerful web-scraping framework, comes into play. With Scrapy, I’m not just collecting data; I’m building the foundation that my prediction model will rely on. In this post, I’ll dive into why Scrapy was my go-to choice, how I’m using it, and some of the challenges I’ve encountered along the way.

Why Scrapy?

There are several web-scraping tools out there, so why did I choose Scrapy for my MMA project? Scrapy stood out because it’s designed to handle large-scale scraping projects with ease. Unlike simpler scraping tools that might be limited to grabbing data from a few pages, Scrapy allows me to build spiders—specialized scripts that can crawl through multiple pages and automatically extract data. This level of automation is crucial because I need to collect stats on hundreds of fighters and bouts, which would be too time-consuming to do manually. Scrapy’s support for pipelines also means I can process and clean data right as it’s collected, making it ready for my model without extra steps.

The Data I’m Collecting

For this MMA prediction model, my goal is to extract detailed data on fighters and fights from a website like UFCStats. Here’s what I’m aiming to collect:

  • Fighter Stats: Details like age, height, reach, stance, and fight record, which are valuable indicators for my model.
  • Fight Metrics: Information on strikes landed, takedowns, submission attempts, and control time, which provide context on the fighter’s performance style.
  • Bout Outcomes: Win/loss records, method of victory (KO, submission, etc.), and round of conclusion, which will serve as the target variable for training my model.

All of this information will be stored in a structured format and transferred to a database, making it easy to query and analyze later for my machine learning algorithms.

How I’m Using Scrapy

To start, I created a spider in Scrapy that navigates through UFCStats’ pages, finds the relevant data, and scrapes it. Here’s how I’ve structured the process:

  1. Defining Spiders: My first spider crawls the list of fighters, gathering basic information and URLs for individual fighter pages. From there, the spider follows these URLs to collect more detailed metrics, like the number of strikes landed per minute or takedown accuracy.
  2. Saving Data in JSON: Scrapy makes it easy to save data in a JSON file, which acts as an intermediate storage. By saving data in JSON, I have a portable, easily accessible file format that I can inspect and validate before transferring it to the database.
  3. Transferring to Database: Once my data is saved in JSON, I use a Python script to load the JSON file and transfer its contents to a database. This extra step ensures that all data is clean and organized before entering the database. It also enables me to easily manage the database structure, creating tables for fighters, bouts, and metrics to ensure optimized storage and retrieval.
  4. Handling Dynamic Content: One challenge I faced was that some pages load data dynamically, which Scrapy can’t handle on its own. To solve this, I integrated Scrapy with Selenium, a browser automation tool, to render the pages and retrieve all the necessary data.

Challenges

While Scrapy is a powerful tool, I’ve encountered a few hurdles along the way. Dynamic content loading was an initial stumbling block, but using Selenium solved this issue. Another challenge was rate-limiting; to avoid overwhelming the server, I configured Scrapy to make requests at a controlled pace and added delays. These steps have not only kept my scraping within ethical boundaries but also ensured that my data collection is reliable and sustainable.

Scrapy isn’t just a data-collection tool; it’s an essential part of my project’s foundation. The data I’m collecting will feed directly into my rule-based and machine learning models. By having structured, comprehensive data on each fighter, my model will be able to learn from historical patterns and, ultimately, make more accurate predictions about future bouts. Working with Scrapy has been a rewarding experience. It’s not only helping me gather the necessary data for my MMA Prediction Model, but also teaching me valuable skills in web scraping and data handling.

Designing an MMA Prediction Dashboard with Figma: A Journey of Visual Punches and Data Jabs

Ever tried to make data punch harder than a knockout? That’s exactly what we’re aiming for in our current project – an MMA prediction model that’s not just about crunching members, but also about making those numbers look cool, clear useful. How do we make that happen? Well, let me introduce you to our secret tool: Figma.

So, here’s the thing: prediction models are only as good as the way they’re presented. I mean, what good is predicting fight outcomes if the insights are stuck in a tangled web of spreadsheets and confusing numbers? That’s where Figma comes in. Using Figma, I’m crafting a dashboard that’s visually appealing and intuitive enough for MMA fans to explore win probabilities, match histories, and key factors affecting fight outcomes-all without needing a PhD in data science. Imagine a user being able to look at a fighter’s win probability while getting visual cues about how past fights or certain fighting styles affect those chances. Clean and clear – just like we want it!

But Figma isn’t just about pretty screens. It’s like a training gym for my designs. We could use its prototyping features to create interactive demos for the dashboard. Think of it as sparring – before the real fight (in our case, coding) begins. By testing different flows – like where users might click to compare fighters or how they might explore stats. I get a good sense of what works and what needs to go back to training. Plus, the whole team can leave comments, and together we figure out if a design idea deserves a title belt or an early tap-out.

One of my favorite parts of Figma has been the component library. It’s like building our toolkit for the octagon. Designing with Figma is turning our data – heavy project into something even a casual MMA fan can enjoy. I think, is what makes the journey worthwhile, turning data into experience that delivers a knockout every time.

Life Hacks for Balancing Life and Tech

Balancing a rigorous course load and personal projects is a substantial challenge. Here’re some hacks I’ve found useful:

Time Management

Prioritize tasks using tools like Trello or Asana. Break tasks into smaller, manageable goals. The Focus Timer app is particularly helpful for maintaining focus during study sessions or project work. It uses the Pomodoro technique to allocate specific times for focused work and breaks, ensuring you stay on track without burning out.

Stress Management

Regular exercise and meditation have been crucial in managing stress. Tools like Headspace provide guided sessions that help. Incorporating structured breaks with Focus Timer also helps in managing long study hours without increasing stress.

Handing Team Dynamics

Clear communication and regular check-ins are vital. Tools like Slack can facilitate seamless communication.

Overcoming Stagnation

When stuck, stepping away to take a walk or talking it out with peers can provide new perspectives. Sometimes, setting a short Focus Timer session to brainstorm or think differently about a problem can lead to breakthroughs.

Every step in this journey has been a learning opportunity. Whether it’s adapting project goals to meet user needs or exploring new technologies to enhance my skill set, the key has been to remain flexible and proactive. Looking ahead, I’m excited about the new challenges and innovations that await in the ever-evolving teach landscape.

Journey into Software Engineering

Greetings! I’m Colin Cheng, Currently navigating the fascinating world of software engineering at Oregon State University (OSU). Residing in Roseburg, Oregon, I balance my studies with a full time job, exploring new technologies and methods that enhance my understanding and capabilities in software development.

Aside from being a student and professional, I am a family man devoted my two pets, Annie and Qunnie. My days are a blend of coding, providing IT solutions, and enjoying leisure activities like gardening and exploring the great outdoors.

My Journey with computers started in high school, fueled by an intense curiosity about how software games are created. This interest evolved over the years, guiding me to pursue a degree in computer science. The transition from gaming to creating software solutions was seamless but filled with challenges and learning curves.

OSU and Beyond

During my time at Oregon State University, I’ve engaged in various projects that have challenged and expanded my understanding of software engineering. One particular project through the CS361 course – a meal planning website called Mealow – has been especially impactful. This project allowed me to explore user-centric design and development deeply. Mealow isn’t just a meal planner; it’s designed to foster healthy eating habits through user-friendly interfaces and personalized meal suggestions. Working on Mealow has provided practical experience in developing intuitive user interfaces that cater to the unique needs and preference of users.

Current Job and internship

Working in IT support has been instrumental in understanding the practical aspects of software and system issue. The real-time problem-solving and user interaction have prepared me well for software engineering’s dynamic nature, where user feedback is crucial.

Favorite Technologies

Lately, I’m deeply engaged with OpenGL for graphics programming, which enhances my ability to render detailed 2D and 3D graphics, curcial not just for gaming but also for creating simulations in various industries like architecture and virtual reality. In web development, I utilized modern tools such as HTML5, CSS, and Javascript frameworks like React, which are essential for crafting responsive and user-friendly web applications. These technologies are pivotal in my projects, merging graphical precision with web functionality to push the boundaries of software development.

Favorite Projects in CS461

Prediction Model – Mixed Martial Arts – As a fan of data science, this project appeals to me. It invovles developing a predictive model that analyzes fighter statistics and fight history to forecast match outcomes, providing insights that are not only valuable for fans but could also be used for training and coaching.

Cloud-Based Algorithmic Trading Strategies for Individual Investors – This project captivates me because it merges finance with technology, enabling individual investors to leverage powerful cloud computing resources to execute sophisticated trading strategies.

Leveraging AI for Improved Public Transit – This project focuses on utilizing AI to enhance the efficiency and reliability of public transit systems. It’s a prime example of how AI can be applied to solve real-world problems, potentially transforming urban mobility by optimizing routes and schedules to improve passenger experiences.

Web Security Research Project – Given the increasing threats to digital security, this project in both timely and essential. It involves researching methodologies to safeguard website from cyber threats, which is crucial for protecting personal and corporate data online.

Crowd-Sourced Travel Planner – This project intrigues me because it combines technology with travel, using crowd-sources data to create dynamic, personalized travel itineraries. It’s a fantastic blend of social interaction and algorithmic data processing.

This blog will serve as a platform to share my experiences, challenges, and triumphs with fellow students and anyone interested in the world of software development. Stay tuned for more updates, tech insights, and personal reflections on navigating the complex yet thrilling world of software engineering.