Monday, July 27, 2009

Why Do We Continue to Join?

Just recently my wife and I built a new walk-in closet. When we were finished, in preparing to use the new space, I thought about how my clothes used to be packed and realized there were a lot of inefficiencies. As an example, I used to keep my boxers in a separate drawer from my undershirts. I had been doing this since my childhood. It meant that every day that I got ready for work I would open one drawer to get a pair of boxers, then open another drawer to fetch my undershirt. Thinking about it a bit, it seemed to be such a waste of time to keep them in separate drawers. Why didn't I just put my boxers and undershirts in the same drawer?

So that, along with other improvements, is exactly what I did. Now getting dressed in the morning is a little bit easier for me.

This is what I want to talk about today. Revisiting old data schema arrangements and asking ourselves whether or not this is the best schema for our product?

Most of the database schemas today, backing traditional CRM applications are designed in such a way that data entities belonging to the same customer record are stored in different tables. To bring those entities together to form the entire customer record, these same CRM applications rely heavily on relational database joins. Individual data entities are stored in separate tables, but are retrieved via joins to make up the customer record. This would make a lot of sense if these records were normally retrieved in parts, but more and more this is not the case.

Instead, the business expects us to retrieve customer records in whole. Now there is talk of the 360 degree view of the customer. Although there are varying opinions of what this means, it is generally understood that it means that all information about the customer should be retrievable via the customers unique identification number. If you think this sounds like a hashtable key, value relationship I will have to agree with you.

So why do we continue to store data in this manner? I believe that one word gives us the answer, and that word is "tradition". We have an entire generation of programmers, relational database administrators, and technical managers that believe that "good" database design always incorporates normalized entities that are connected via foreign keys to other entities. They think this without giving much thought to the way the data they are storing will actually be retrieved.

Instead should we not revisit the notion that customer records need to be separated into different data entities? I would submit that once this examination is completed, it will become apparent that relational schemas make little or no sense for the operational stores of modern CRM applications.

Does this mean that there is no place for relational databases? Of course not! Relational databases are great for creating reports and performing ad-hoc data analytics. As they are designed today, using a key value store for reporting would be painful at best. But for the types of canned data requests that modern day call centers produce, a key value store is tough to beat.

Instead, what this does mean is that more consideration towards using key value stores in the enterprise is needed. There are several that are becoming popular. MemCache makes a lot of sense when caching joined information by key, as does EHCache. Using this approach, this joined information can be stored in a persistent relational database, but once the expensive join is completed, the now flattened information from that join can be stored in memory and retrieved via any number of keys. Amazon's S3 remains a viable option, as does the open source distributed key stores like Voldemort, Cassandra, and Couch DB. Finally, Oracle's Berkeley DB bears some looking into as well.

No comments:

Post a Comment