I’d like to discuss related documents, subdocuments, and schemas in a database and our team’s approach to a particular problem. Maintaining relationships between tables in a database is a common issue and no solution is ever perfect. Ultimately most decisions are based on which compromises the development team is more willing to make based on the issue at hand.
We have a schema (let’s call it bike to keep it simple) that has a nested schema (let’s call that bike parts). A bike might have a brand, a cost, a color, a type, maybe even a name, and of course the bike parts. Those bike properties are single values but the bike parts property can be a multitude of values since the bike can have parts of many types themselves. Now, when we want to ship out a bike with a certain configuration we would only need to reference the bike schema so we know how to build that particular bike. Since a schema lines out what a bike should look like on paper, we can validate it easily when it is built. The problem comes when we realize this bike order exists on a separate schema (let’s call this one retailers with the nested schema, bike orders). So here we have retailers, each with a list of bike configurations in bike orders purchasable via wholesale based on the original bike schema. What happens when a retailer purchases a couple bikes with their own configurations but right before that order is fulfilled, we as the bike manufacturer decide to change the available bike parts for some of their bikes? We have two options:
- We can tell that retailer that since they already made the order, we will have to honor it with the configuration they decided on.
- We can tell them sorry, no can do. You are stuck with the new set of options.
Each option has it’s own set of problems to deal with but lets tackle the first one since it is far simpler. Since the bike part information is already there, we can build the bike based on the information presented in the order. But when we try to reference what item they purchased, we can’t reference the original bike schema for validation since the configuration no longer matches what was lined out previously. Technically it’s a different bike. When creating a new set of available bike parts, we can either make an entirely new bike with that new set of bike parts options or we can edit the original bike according to those new options. The first choice is impractical especially when the only thing changed is perhaps pedals with a new color. I would hardly call that a new bike. If we follow this paradigm, we’ll be flooded with a slew of vaguely similar bike schemas. So we’re left with the second choice. The simplest approach would be to attach some sort of id to the bike and verify only that id regardless of configuration. We can assume then that when the configurations in an order are different than what is lined out in the schema, the schema has changed at some point. Tracking those changes comes with another set of problems. Even worse, it will become increasingly difficult to verify whether an order is truly valid or not.
Option one at first glance seems to be inundated with potential faults but option two is far from perfect either. Let’s say what they originally chose is no longer available. Then, we have to do a couple things. First we figure out how to notify them of the changes. In other words, how do we propagate those changes to all bike orders under all retailers with that particular bike. This is another impractical but perhaps a necessary step. Then again, for all we know, they don’t check their emails regularly so we might set a time out for when to cancel the order. In a retail setting this is an appropriate course of action however, that isn’t always the case in other settings. As an alternative, we can have the order stay pending until a response is given and it will stay valid so long as the bike exists (not the set of bike parts). We can then validate to the new bike part options once they give a response. In the mean time, whatever configuration they have on hand still exists, is invalid, and takes up space in the database. That’s not great either and if we do decide to do away with the bike entirely, we must still propagate to all retailers with that particular bike in their bike orders. And so, we will only be delaying that very laborious process.
Clearly, no matter which path we decide on, we must make sacrifices in either efficiency or consistency. For the record, our project decided on the pending option since there isn’t a time constraint or deadline to running custom workflows. To continue with the analogy, those “bikes” could ship at anytime as needed only if and when a response is given regardless of validation. An order is created but a confirmation response is required for the order to be pushed through regardless of configuration. The expectation is that there will also be very few “retailers” so the propagation step will be relatively simple. In our situation, it is debatable on whether the final propagation step is even necessary for the same reason and is something we could discuss based on our client’s needs.
Leave a Reply