Monthly Archives: January 2025

Using SQLAlchemy to Migrate Cleaned MMA Data to PostgreSQL

Why SQLAlchemy?

SQLAlchemy provides an ORM (Object-Relational Mapping) that makes database interactions more intuitive. Instead of writing raw SQL queries, I can use Python objects to define, manipulate and query data. This has significantly improved my workflow when dealing with large datasets scraped from UFC Stats.

How I used SQLAlchemy

I structured our database migration process into several scripts, each handling a specific dataset:

  • Competitors: Fighter details like height, weight, reach, stance, fight records.
  • Events: Fight event metadata such as location and date.
  • Bouts: Individual fight details, including results.
  • Round Stats: Detailed statistics per round.

Each script reads from a cleaned JSON file and inserts the data into the appropriate table in PostgreSQL using SQLAlchemy.

Establishing the Database Connection

To interact with PostgreSQL, I configured SQLAlchemy in the connection.py file like below:

from sqlalchemy import create_engine, MetaData
from sqlalchemy.orm import sessionmaker
DATABASE_URL = "postgresql+psycopg2://postgres:password@104.168.19.177/pmmma"
engine = create_engine(DATABASE_URL)
Session = sessionmaker(bind=engine)
session = Session()
metadata = MetaData()
metadata.reflect(bind=engine)
events = metadata.tables["events"]
competitors = metadata.tables["competitors"]
bouts = metadata.tables["bouts"]
round_stats = metadata.tables["round_stats"]

This setup allows me to establish a connection and dynamically access tables.

Migrating Data with SQLAlchemy

To migrated our cleaned fight data into PostgreSQL, we used SQLAlchemy to handle database transactions efficiently. The process involved:

  1. Loading the cleaned data – The structured JSON files containing bout details were read and pre-processed.
  2. Resolving foreign keys – To maintain data integrity, we mapped events names and fighter names to their respective database IDs before inserting records.
  3. Batch inserting records – Using SQLAlchemy, we inserted multiple fight records into the bouts table in one transaction to improve performance.
  4. Handling errors and rollbacks – If an issue arose, SQLAlchemy allowed me to roll back changes without corrupting the database.

Initially, processing the entire dataset took nearly 35 minutes, which led me to explore optimizations like caching frequently accessed foreign keys and using bulk inserts.

Favorite and Least Favorite Parts

My favorite aspect of using SQLAlchemy is how it abstracts complex database operations into Python code, making data migration smooth and efficient. The ORM capabilities reduce the need for writing raw SQL maintaining flexibility.

My least favorite aspect is dealing with performance issues. Some of my scripts, particularly migrate_bouts.py, take up to 30 minutes to execute. This is due to large datasets and the need to resolve foreign keys dynamically before inserting records. I plan to optimize these scripts by using bulk inserts instead of looping through entries individually.

Challenges and Lessons Learned

At first, handling foreign key relationship between tables was tricky. I need to fetch and cache foreign keys before inserting records to prevent unnecessary queries. Once implemented, this made a noticeable difference in performance.

I now appreciate SQLAlchemy more after overcoming these challenges. Its commit/rollback system has been especially useful in debugging errors without corrupting the database.

If I could start over, I might consider using Alembic, an extension for SQLAlchemy that simplifies schema migrations. Right now, table creation and modifications required manual changes. Alembic would allow for version-controlled schema updated without breaking our existing data.

SQLAlchemy has been an essential part of migrating our MMA data into PostgreSQL. While there were initial hurdles, it has streamlined our workflow and made interacting with the database more intuitive. As we refine our model, I’ll be looking at ways to improve performance and scalability.

Clean Code and Code Smells: Lessons Learned

Have you ever revisited an old project and left completely lost by your own code? If so, you’ve experienced the problem of code that’s not quite as clean, readable, or maintainable as it could be. This week, I explored articles on clean code and code smells, and dove into a few chapters of Robert Martin’s Clean Code and Martin Fowler’s Refactoring. I’ll share one practice I’m excited to start doing more often, and one practice I intend to avoid.

Something I’m Doing More Often: Meaningful Variable and Function Names

One key takeaway from Robert Martin’s book, Clean Code: A Handbook of Agile Software Craftsmanship, is the importance of clear, descriptive names [1]. In my code from the 2024 summer CS340 Into to Database course, I noticed a few spots where variable names weren’t as self-explanatory as they could be:

Figure 1 CS340 Code Snippet. Source: https://github.com/cchengbb/cs340web/blob/main/public/js/add_dogAtEvent.js

While this is workable, renaming these variable to more descriptive terms can help future readers (and my future self) quickly understand the purpose:

// Improved snippet with more descriptive names
// …
let dogIDField = document.getElementById(“input-dogID-ajax”);
let eventIDField = document.getElementById(“input-eventID-ajax”);

let dogID = dogIDField.value;
let eventID = eventIDField.value;

Even though this is a small change, it makes the variables more meaningful at a glance. According to Martin’s emphasis on clarity, “meaningful variable names” is one of the simplest yet most powerful improvements you can make to your codebase (Martin 2008).

Something I’m Avoiding: Repetitive (Duplicated) Code

From Martin Fowler’s Refactoring (especially in the “Bad Smell in Code” chapter), the concept of “Duplicate Code” is highlighted as one of the biggest hazards [2]. During my project, I noticed I had nearly identical AJAX request setups across multiple file, add_dog.js, add_event.js, add_location.js, and so forth. Each file had the same pattern:

// Example from add_event.js
var xhttp = new XMLHttpRequest();
xhttp.open(“POST”, “/add-event-ajax”, true);
xhttp.setRequestHeader(“Content-type”, “application/json”);

// Example from add_location.js
var xhttp = new XMLHttpRequest();
xhttp.open(“POST”, “/add-location-ajax”, true);
xhttp.setRequestHeader(“Content-type”, “application/json”);

Figure 2 CS340 Code Snippet. Source: https://github.com/cchengbb/cs340web/tree/main/public/js

The repetition here is a classic “code smell.” If I need to change how I set headers or handle errors, I have to modify multiple files, risking inconsistencies. By creating a small utility function (for example, sendPostRequest) that handles the AJAX logic, I can call it from each file and reduce duplication:

// In a shared utilities.js file
function sendPostRequest(url, data, onSuccess, onError) {
  let xhttp = new XMLHttpRequest();
  xhttp.open("POST", url, true);
  xhttp.setRequestHeader("Content-type", "application/json");
  xhttp.onreadystatechange = () => {
   if (xhttp.readyState === 4) {
    if (xhttp.status === 200) {
       onSuccess(xhttp.response);
     } else {
       onError(xhttp.status);
     }
    }
   };
   xhttp.send(JSON.stringify(data));
 }

Then in each form handler (e.g., add_event.js, add_location.js), I can simply do:

sendPostRequest(
"/add-event-ajax",
data,
(response) => {
   addRowToTable(response);
   clearFormFields();
 },
 (status) => {
    alert(Error. Status code: ${status});
 }
);

This not only keeps my code DRY (“Don’t Repeat Yourself”) but also makes maintenance and refactoring simpler.

By focusing on small, incremental improvements, like better naming conventions (from Clean Code) and removing duplication (from Refactoring), I can make my code significantly more readable and maintainable. Adopting these practices ensures that when I return to my project in the future, I’ll spend less time deciphering variable names or refactoring nearly identical functions scattered throughout my files.

Reference

[1] R. C. Martin, Clean code: A handbook of agile software craftsmanship, 1st ed.
Upper Saddle River, NJ, USA: Pearson, 2008. [Online] Available: https://ptgmedia.pearsoncmg.com/images/9780132350884/samplepages/9780132350884.pdf

[2] M. Fowler, Refactoring: Improving the design of existing code, 2nd ed.
Boston, MA, USA: Addison-Wesley Professional, 2018. [Online] Available: https://dl.ebooksworld.ir/motoman/Refactoring.Improving.the.Design.of.Existing.Code.2nd.edition.www.EBooksWorld.ir.pdf