
Why SQLAlchemy?
SQLAlchemy provides an ORM (Object-Relational Mapping) that makes database interactions more intuitive. Instead of writing raw SQL queries, I can use Python objects to define, manipulate and query data. This has significantly improved my workflow when dealing with large datasets scraped from UFC Stats.
How I used SQLAlchemy
I structured our database migration process into several scripts, each handling a specific dataset:
- Competitors: Fighter details like height, weight, reach, stance, fight records.
- Events: Fight event metadata such as location and date.
- Bouts: Individual fight details, including results.
- Round Stats: Detailed statistics per round.
Each script reads from a cleaned JSON file and inserts the data into the appropriate table in PostgreSQL using SQLAlchemy.
Establishing the Database Connection
To interact with PostgreSQL, I configured SQLAlchemy in the connection.py file like below:
from sqlalchemy import create_engine, MetaData
from sqlalchemy.orm import sessionmaker
DATABASE_URL = "postgresql+psycopg2://postgres:password@104.168.19.177/pmmma"
engine = create_engine(DATABASE_URL)
Session = sessionmaker(bind=engine)
session = Session()
metadata = MetaData()
metadata.reflect(bind=engine)
events = metadata.tables["events"]
competitors = metadata.tables["competitors"]
bouts = metadata.tables["bouts"]
round_stats = metadata.tables["round_stats"]
This setup allows me to establish a connection and dynamically access tables.
Migrating Data with SQLAlchemy
To migrated our cleaned fight data into PostgreSQL, we used SQLAlchemy to handle database transactions efficiently. The process involved:
- Loading the cleaned data – The structured JSON files containing bout details were read and pre-processed.
- Resolving foreign keys – To maintain data integrity, we mapped events names and fighter names to their respective database IDs before inserting records.
- Batch inserting records – Using SQLAlchemy, we inserted multiple fight records into the bouts table in one transaction to improve performance.
- Handling errors and rollbacks – If an issue arose, SQLAlchemy allowed me to roll back changes without corrupting the database.
Initially, processing the entire dataset took nearly 35 minutes, which led me to explore optimizations like caching frequently accessed foreign keys and using bulk inserts.
Favorite and Least Favorite Parts
My favorite aspect of using SQLAlchemy is how it abstracts complex database operations into Python code, making data migration smooth and efficient. The ORM capabilities reduce the need for writing raw SQL maintaining flexibility.
My least favorite aspect is dealing with performance issues. Some of my scripts, particularly migrate_bouts.py, take up to 30 minutes to execute. This is due to large datasets and the need to resolve foreign keys dynamically before inserting records. I plan to optimize these scripts by using bulk inserts instead of looping through entries individually.
Challenges and Lessons Learned
At first, handling foreign key relationship between tables was tricky. I need to fetch and cache foreign keys before inserting records to prevent unnecessary queries. Once implemented, this made a noticeable difference in performance.
I now appreciate SQLAlchemy more after overcoming these challenges. Its commit/rollback system has been especially useful in debugging errors without corrupting the database.
If I could start over, I might consider using Alembic, an extension for SQLAlchemy that simplifies schema migrations. Right now, table creation and modifications required manual changes. Alembic would allow for version-controlled schema updated without breaking our existing data.
SQLAlchemy has been an essential part of migrating our MMA data into PostgreSQL. While there were initial hurdles, it has streamlined our workflow and made interacting with the database more intuitive. As we refine our model, I’ll be looking at ways to improve performance and scalability.