To SQL or Not to SQL


Continuing the topic of system design and scaling, when is it appropriate to use SQL or NoSQL?

For clarification, SQL databases (relational databases) are structured and have predefined schemas like phone books that store phone numbers and addresses. Relational databases store data in rows and columns. Some relational databases include MySQL, Oracle, Postgres, and MariaDB.

In contrast, noSQL databases (non-relational databases) are unstructured, distributed, and have a dynamic schema like file folders. Non-relational databases can store data in “Key-Value Stores” (Redis, Voldemort, Dynamo), “Document Databases” (CouchDB, MongoDB), “Wide-Column Databases” (Cassandra, HBase), and “Graph Databases” (Neo4J, InfiniteGraph).

Some noteworthy differences between SQL and NoSQL are the differences between the schema, scalability, and reliability.

For the schema, in SQL, each record has a fixed schema, which means that the columns must be decided before data entry, and each row must have data for each column. To change the schema, the entire database needs to be modified and taken offline. In contrast, in NoSQL, schemas are dynamic, which means that columns can be added at anytime, and each row does not need data for each column.

For scalability, in general, SQL databases are vertically scalable; even though horizontal scaling is possible, horizontally scaling a relational database across multiple servers is challenging and time-consuming. In contrast, NoSQL databases are horizontally scalable and more servers can be easily added to the non-relational database infrastructure to handle more traffic.

For reliability, the majority of SQL databases are ACID (Atomicity, Consistency, Isolation, Durability) compliant. In this case, relational databases are better when it comes to data reliability and guarantees of safe transactions. In contrast, most NoSQL databases tradeoff ACID compliance for more performance and scalability.

So what are the reasons to use a SQL database vs a NoSQL database? We would need to consider the tradeoffs as well as the features and requirements of the application to best choose one over the other.

Reasons for SQL:

  • Structured data
  • Strict schema
  • Relational data
  • Need for complex joins
  • Need for ACID compliance or transactions
  • Clear patterns for scaling
  • More established (developers, community, code, tools, etc.)
  • Lookups by index are very fast

Reasons for NoSQL:

  • Semi-structured data
  • Dynamic or flexible schema
  • Non-relational data
  • No need for complex joins
  • Store many TB (or PB) of data
  • Very data intensive workload
  • Very high throughput for IOPS
  • Making the most of cloud computing and storage
  • Rapid development

Some example sample data well-suited for NoSQL may include, rapid ingest of clickstream and log data, leaderboard or scoring data, temporary data (ex. shopping cart), frequently accessed tables, metadata/lookup tables.

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *