Databases: SQL vs NOSQL

Credit: James Wheeler, pexels.com

In the journey from coding personal projects and tutorials to full fledged professional applications, one of the biggest hurdles to overcome in my opinion is moving away from tiny datasets, “mock” databases or simple text files and instead interacting with real, actual databases. Not to say there aren’t skills and experience to be gained from smaller and more beginner projects, but there is a reason most of the websites and applications produced by major companies almost always involve a database. Databases allow data to be personalized and to persist, and allow all kinds of modern day tasks we take for granted are organized, queried and edited thanks to databases. And in the world of databases the biggest question from the get go is usually: SQL vs NOSQL (or relational vs non-relational/not only relational)?

With the introduction of computers into the business world in the later half of the 20th century, databases became a necessity for organizing and managing data. The first and largest solution to come about was the SQL database. According to https://www.ibm.com/cloud/blog/sql-vs-nosql:

“SQL, which stands for “Structured Query Language,” is the programming language that’s been widely used in managing data in relational database management systems (RDBMS) since the 1970s. In the early years, when storage was expensive, SQL databases focused on reducing data duplication.”


However in more recent years there has been the development of so-called NOSQL or Not Only Structured Query Language databases. SQL was necessarily developed in a very rigid format, putting data neatly into tables and requiring each piece of data in a table to adhere to the same rules. But inevitably there arose cases where people wanted to break out of that system, and create more dynamic databases where rules are looser and the formats can take on many different forms. NOSQL in comparison to SQL allows for a variety of Database types, such as Document, Graph, Key-Value Pair, and Column (as shown on https://builtin.com/data-science/sql-vs-nosql)

Credit: panumas nikhomkhai, pexels.com

So what’s the point of all this? Well in both my full time job and my current capstone project at Oregon State University, one of the first design choices that needed to be made was an SQL vs NOSQL database. So how does one go about making that choice? One of the traps that is easy to fall into when going through school is that you often get very busy completing assignments on deadlines, and following instructions and rubrics to satisfy the requirements of what you’re meant to do. But in my experience, in the workplace you don’t always have the luxury of someone telling you what to do or how to do it. Being able to design and execute well thought out plans on your own is a difficult skill and one of the most valuable things you can do in the eyes of many managers. So as I work with my team creating our project this term, it seems like a good idea to catalog the important decisions that are being made and how we made them.

Let’s start by taking a look at what we have to work with. The project I’m working on is a job tracker web application that let’s users keep track of and to some degree analyze the job applications, contacts, and skills they upload to the website. Seems straightforward at first glance, but adding in the ability to let each user keep track of their proficiency in each skill, sorting which skills appear most across all of their applications, and adding contacts to multiple applications quickly complicates how each piece of data must interact with each other. So, which type of database should we use?

It’s important to think about the fact that our group has varying technical experiences and that we have to complete our project in the span of a ten week class. We all took mostly the same classes at OSU and this as well as time constraints played a big role in our final decisions. But to break things down, one of the things I did was take a look at some of the information on https://www.talend.com/resources/sql-vs-nosql/#:~:text=SQL%20is%20the%20programming%20language,generally%20do%20not%20use%20SQL:

“To make informed decisions about which to use, practitioners should be aware of the differences between SQL, NoSQL, individual Database Management Systems (DBMS) and languages, as well as the situations each is best-suited for, and how the landscape is changing.”

It goes on to suggest the five main differences between SQL and NOSQL databases are Language, Scalability, Structure, Properties, and Support/Communities. Let’s take a look at each in the context of our project.

Language

Credit: Pixabay, pexels.com

The first in the list is language. SQL has several different distributions, but while there are a few syntax differences they all more or less provide the same core operations and ways of doing things. My group members all had experience in the database class with MySQL and certainly that would be an option for us.

However NOSQL has much more variety in types of language. From JSON style documents to graphs to key-value pairs, there are a lot more options here. All three of our group members took the Cloud Application Development class at OSU and all had used Datastore as a NOSQL database in that class, which again lent itself as a viable option. No major determination yet.

Scalability

Credit: Iulian Patrascu, pexels.com

Moving on to scalability, many online resources state SQL databases can be scaled vertically (meaning increasing RAM or CPU power), while NOSQL typically are more easily able to scale horizontally (through sharding or running the data on multiple servers). We already planned to deploy our application through GCloud which meant that both were in some way possible, but really comes down to pricing. Cloud in particular is by nature shared across servers so horizontal scaling would be an easy option with GCloud.

Structure

Credit: Pixabay, pexels.com

The third difference on the list is structure. As previously stated, SQL is all rigidly and necessarily defined as Table based data, and requires a good deal of planning and creating of schemas before developing. In a way the stakes are high because a badly designed database in SQL is more or less locked in and the only real way to fix it is to start over. Not ideal for a short timeline. On the other hand NOSQL is flexible by nature, and can be adapted or changed in some ways as development happens. New data can be added without following old rules necessarily. It does also run the risk of poorly defined designs, but in some ways allows for quicker response and mitigation of these issues. This is a point for NOSQL.

Properties

Credit: Alexander Grey, pexels.com

Fourth in the listed differences of the article is Properties, and this basically comes down to the principles adhered to by each database type. SQL again tends towards rigidly defined rules such as Atomicity of data (each transaction must fully succeed or fail) and Consistency, while NOSQL is again more loose. For a smaller database built on a short timeline between three relatively inexperienced developers, it again feels more flexible and lenient to choose the NOSQL option.

Support/Community

Credit: Dio Hasbi Saniskoro, pexels.com

Finally the last practical difference in the article is Support/Community. SQL has been around longer, is better documented and essentially has more of this. This sort of gives a point in favor of SQL but with a caveat; All three of the developers on my team had worked in the Cloud Application Development class using Datastore (NOSQL), and we all have coding references from that class, while none of us have used an SQL database in GCloud. In this way we probably have more familiarity and more help resources at our fingertips with the NOSQL option. It’s always important to take into consideration what you are personally more skilled in and if it’s worth learning something new for the sake of a project.

In the end my group felt the flexibility and familiarity of NOSQL with our technology stack was the better choice for our project. However we have already had a few instances of having to redesign our database due to the fact that the increased flexibility means increased room for error.

I think this portion of the project has shown me how much of what developers tend to focus on is pure code, and how many other elements like design and understanding project requirements are important and sometimes not focused on enough. Hopefully our solution will be successful and serve to better inform our projects in the future.

Leave a Reply

Your email address will not be published. Required fields are marked *