data modeling whitepaper

On the importance of keeping data model fluid

Recent advances in data analytics have produced an insatiable appetite for data. Satellites, cell phones, and computers constantly churn out streams of information. They flow from their point of origin to a target site and, in the process, are molded into shapes conducive to analysis. These strategically cleansed and canonically shaped bodies of data are carefully examined and parceled to make management decisions more precise and gain an advantage over the ever-present competition. The challenge is to quickly collect, process, and store vast quantities of data so that historical information can peacefully co-exist with that which is newly collected. The demand to efficiently deliver reliable, analytics-ready data in whatever form best meets constantly changing business requirements puts stress on an organization’s technological infrastructure. As a business evolves, its tactics and strategies change. This results in a steady stream of business requirements that often are in direct conflict with those established only a few weeks prior. This constant flow feeds directly into an IT layer with the expectation that IT teams will deliver in a very short amount of time. What complicates matters is that rapidly changing business requirements have the potential to break existing data models and thrust IT teams into a downward spiral. Data models are especially important because they serve as any application’s backbone.

There is nothing wrong with changing requirements. Changes are propelled by a business's innate need to evolve to retain a competitive advantage. Once existing data models stop satisfying business needs, the time required to answer relatively simple business questions increases exponentially and IT teams' workloads multiply. Integration teams must either:

  1. Create new logic and produce code that fits new data into existing and no-longer confirmatory data models, or
  2. Upgrade existing data models and write integration code to fit current, no longer confirmatory data into the new models.

Both approaches strain budgets, time, and resources. After a series of such iterative development cycles, data models undergo drastic changes and become unrecognizably disfigured. Performance suffers. Hard work must be put into optimizing existing database schemas to produce intricate indexing and partitioning schemes. Yet all this work will again become obsolete after only a few rounds of changes. Data modelers and integration architects must build data structures upon other data structures to keep existing databases on life support (and stave off the unavoidable). Eventually, data modeling patching projects reach a level of saturation such that it becomes more expensive to maintain the original models than start from scratch and build something new. Then, current work must be put on hold and all available modeling resources rededicated to schema redesign so as to stay aligned with current business needs. Once a new schema is produced, data integration teams must work to produce integration code to pump data from the existing model into the new one.

Situations such as these teach data modelers to learn from past mistakes and become well versed in the latest information modeling methodologies. The same design pattern used on one successful data modeling implementation cannot be blindly carried over to another modeling project because the underlying business requirements are almost always different. A solid knowledge of various data modeling methodologies, coupled with a clear understanding of business desires and needs, will lead to robust, fluid database schemas that avoid common pitfalls.

For a more thorough introduction into Corepula Method, please download Corepula Method data modeling whitepaper (PDF)