To SQL or Not to SQL

Continuing the topic of system design and scaling, when is it appropriate to use SQL or NoSQL?

For clarification, SQL databases (relational databases) are structured and have predefined schemas like phone books that store phone numbers and addresses. Relational databases store data in rows and columns. Some relational databases include MySQL, Oracle, Postgres, and MariaDB.

In contrast, noSQL databases (non-relational databases) are unstructured, distributed, and have a dynamic schema like file folders. Non-relational databases can store data in “Key-Value Stores” (Redis, Voldemort, Dynamo), “Document Databases” (CouchDB, MongoDB), “Wide-Column Databases” (Cassandra, HBase), and “Graph Databases” (Neo4J, InfiniteGraph).

Some noteworthy differences between SQL and NoSQL are the differences between the schema, scalability, and reliability.

For the schema, in SQL, each record has a fixed schema, which means that the columns must be decided before data entry, and each row must have data for each column. To change the schema, the entire database needs to be modified and taken offline. In contrast, in NoSQL, schemas are dynamic, which means that columns can be added at anytime, and each row does not need data for each column.

For scalability, in general, SQL databases are vertically scalable; even though horizontal scaling is possible, horizontally scaling a relational database across multiple servers is challenging and time-consuming. In contrast, NoSQL databases are horizontally scalable and more servers can be easily added to the non-relational database infrastructure to handle more traffic.

For reliability, the majority of SQL databases are ACID (Atomicity, Consistency, Isolation, Durability) compliant. In this case, relational databases are better when it comes to data reliability and guarantees of safe transactions. In contrast, most NoSQL databases tradeoff ACID compliance for more performance and scalability.

So what are the reasons to use a SQL database vs a NoSQL database? We would need to consider the tradeoffs as well as the features and requirements of the application to best choose one over the other.

Reasons for SQL:

Structured data
Strict schema
Relational data
Need for complex joins
Need for ACID compliance or transactions
Clear patterns for scaling
More established (developers, community, code, tools, etc.)
Lookups by index are very fast

Reasons for NoSQL:

Semi-structured data
Dynamic or flexible schema
Non-relational data
No need for complex joins
Store many TB (or PB) of data
Very data intensive workload
Very high throughput for IOPS
Making the most of cloud computing and storage
Rapid development

Some example sample data well-suited for NoSQL may include, rapid ingest of clickstream and log data, leaderboard or scoring data, temporary data (ex. shopping cart), frequently accessed tables, metadata/lookup tables.

AP vs NP

To SQL or Not to SQL

Leave a Reply Cancel reply