data modeling whitepaper

Corepula Method Data Modeling Example

Using the 6NF Normal Form paradigm and DW 2.0 architectural guidelines as a foundation, the Corepula Method blends these elements with a set of original ideas to produce a methodology that clearly formulates a means of organizing and managing temporalized data. This method can be used to model such diverse business requirements as EDW, MDM, and Reference Data Management (RDM). In fact, it can be used to model any business requirement that deals with data temporization. Data change constantly in terms of both meaning and content, albeit with varying levels of velocity, and these changes must be seamlessly integrated into existing database schemas with little or no downtime. This is especially true in highly regulated financial environments where data must be audited on a regular basis. Erroneously or partially tracked changes can lead to painful fines, negative publicity, and a loss of credibility. MDM and RDM environments are other critical business domains where data often change and IT is expected to react quickly to keep systems running smoothly and with minimal downtime. Unfortunately, the redefinition of a key business concept can break existing MDM applications and render them unusable for lengthy periods of time. The Corepula Method helps to make underlying database schemas more dynamic and allows IT teams to gracefully cope with the never-ending changes to business requirements.

The Corepula Method recognizes that each entity has temporal and non-temporal attributes, and classifies them into their respective attribute groups.

data modeling whitepaper

During the definitional stage, domain experts classify the attributes of a given entity into temporal and non-temporal attribute groups in preparation for the subsequent modeling stage. This classification is necessary because the attributes in each group are modeled differently. Let’s define and examine the underlying modeling blocks. The Corepula Method strategically splits every database entity into four distinct building (or modeling) blocks:

  1. Core business entity
  2. Non-historized attribute entity
  3. Historized attribute entity
  4. Copula entity modeling construct whose main task is to store a given entity’s underlying foreign key attributes

An Overview of the Core Entity Type

A core entity helps modelers identify the key business concepts that need to be modeled. These core entities can be visualized as abstractions that correspond one-to-one with concrete items in need of modeling. These entities are structured simply, consisting of surrogate key/identifier and a date attribute. The date attribute helps identify the beginning of the time slice for when a particular record came to exist in the source system.

In the diagram below, a SECURITY CORE entity was created as an abstraction for the SECURITY business concept. A SINCE DATE is a temporal attribute that identifies the beginning of the time slice for when a particular source record came into existence. The SECURITY CORE KEY attribute helps uniquely identify each security record; it is system generated.

data modeling whitepaper

The reasoning behind such a simple and minimalistic core entity design is to keep such structures immune to change. A core entity can be viewed as an interface to a business concept (SECURITY in the above diagram). An interface is a contract that acknowledges that a particular business concept exists and hides the underlying data management complexities from the rest of the data model. As an interface, the core entity never changes and semantic alterations to the underlying business entity have minimal impact on the rest of the database schema. Core entities act as pillars of stability, shielding the rest of the model from the impacts of change. The only way entities can be linked is through core entities and their synthetically generated primary key attributes (SECURITY CORE KEY in our example). If a given entity undergoes a semantic change, that change is localized to a set of non-core entities and strategically managed. Note that every core entity has the same structure, consisting of the two above-mentioned attributes.

Below, we will discuss naming conventions in greater detail. Core entities inherit names from their corresponding business counterparts. The entities and CORE KEY attribute names of the corresponding core entities depend on the underlying names of the business concepts being modeled. In our example, the main business concept is SECURITY; we’ve leveraged this name in naming the core entity: SECURITY CORE.

The SINCE DATE attribute is consistently named. It is recommended that data modelers not change the name of this attribute. The resulting simple and predictable core entity structure helps in designing the ETL templates that will eventually be used in loading these same entities with data, once a physical model is deployed. ETL templates, based on database metadata, can be developed once and reused in multiple places throughout the model.

An Overview of the Non-Historized Attribute Entity Type

A non-historized attribute construct stores a set of non-historized attributes. Non-historized attributes are those whose value should never change (at least in the eyes of the business). A pool of non-historized attributes is created by stripping away all static attributes from the underlying source entity. The final list of non-historized attributes must be approved by the business. Each non-historized entity is directly linked to one and only one core entity type.

From the data management point of view, a source system’s natural keys must be added to the non-historized attribute pool. These natural key attributes play a crucial role in tying together source and Corepula Method-based schema records. In addition, they are leveraged by the resulting ETL process in the record-matching phase to help identify whether a given source record already exists. Attributes such as GENDER CODE, EYE COLOR, and BIRTH DATE are good candidates for being classified as non-historized. The list of non-historized attributes is typically industry- and business-specific.

An Overview of the Historized Attribute Entity Type

A historized attribute layer consists of attributes whose underlying values the business expects to change. The rate of change is insignificant. As long as the value of a given attribute has the potential to change, in all likelihood it will change. Such attributes must be identified and grouped together in a stand-alone “historized” pool. In the data modeling phase, historized attributes are modeled separately and confined to individual, 6NF-compliant entities. All historized attribute entities have a similar structure. This sameness facilitates the creation of ETL reusable templates

An Overview of the Copula Entity Type

We’ve discussed core entities, as well as how each structure acts as an interface and has a one-to-one correspondence with a concrete business concept. We’ve also defined and explored historized and non-historized attribute types. In this section, we will briefly discuss the last of the Corepula Method’s building blocks: an entity type called a copula. The Corepula Method strips away foreign key attributes from underlying business concepts and stores them separately in copula entities. Below is a list of rules that govern the modeling of copula entities:

  1. This entity type only stores foreign key links. The exception to this rule is surrogate foreign key attributes (see below).
  2. The Corepula Method assumes that a set of all copula’s foreign key attributes uniquely identifies a given copula record, including surrogates.
  3. Each foreign key attribute must have a value. NULL values are not allowed. Examples in subsequent sections will demonstrate how this can be accomplished when one of the source system’s foreign key attributes is nullable.
  4. By design, copula entities only store foreign key attributes and cannot have attributes of their own. There is one exception to this rule. When modeling entities whose primary (or alternate) source composite key consists of foreign and non-foreign key attributes, the non-foreign key attributes are treated as surrogate foreign keys and must be added to the resulting copula entity as regular foreign key attributes.
  5. Each copula entity must be linked to a copula-specific core (called a copula core entity). This makes each copula record historized. Copula-specific core entities are structured the same as other core entity types.
  6. All copula-specific attributes are linked to this copula-specific core entity. Quite often an intersection entity, in addition to storing foreign key links, has attributes of its own. These attributes are indirectly attached to the copula entity through a copula-specific core entity.
  7. Copula-specific attributes can either be historized or non-historized.

Tying Things Together: A Complete Example

An example will make things clearer and clarify the key concepts mentioned above. A sample source schema is shown below. Next, we demonstrate all of the necessary steps required to translate it into a corresponding Corepula Method-based schema.

As a first step, to convert any source schema into a Corepula Method-based schema design, we must map key business concepts onto the corresponding core entity types. In this example, we create two core entities (one for each business concept) and name them: CHILD ENTITY CORE and PARENT ENTITY CORE. These core entities encapsulate underlying business concepts and hide potential future semantic changes from the rest of the data model.

data modeling whitepaper

In the second step, with the help of a domain expert we begin classifying attributes into historized and non-historized attribute pools. This produces the following attribute classifications:

The CHILD ENTITY non-historized attribute group:

  1. CHILD ENTITY IDENTIFIER, which acts as a Primary Key

The CHILD ENTITY historized attribute group:

  1. HISTORIZED ATTRIBUTE A

The PARENT ENTITY non-historized attribute group:

  1. PARENT ENTITY IDENTIFIER, which acts as a Primary Key

The PARENT ENTITY historized attribute group:

  1. HISTORIZED ATTRIBUTE B

The completed scheme is illustrated in a data model below. The presence of the foreign key attribute (PARENT ENTITY IDENTIFIER) in the CHILD ENTITY source indicates that we must extract and store this relationship in a stand-alone copula entity (the PARENT CHILD ENTITY COPULA). As discussed above, a copula entity must be linked to a corresponding copula core entity (called the PARENT CHILD ENTITY COPULA CORE in this example). The PARENT CHILD ENTITY COPULA CORE is a copula-specific core entity that can easily accommodate the storage of copula-related attributes, either historized or non-historized. In data modeling, a one-to-many relationship often evolves into a many-to-many relationship. This many-to-many relationship may have attributes of its own. The Corepula Method anticipates such an evolution in entity relationships and has a pre-built mechanism for coping with such modeling situations.

data modeling whitepaper

There is another reason why the copula specific core structure is required. The primary key for a copula entity is provided by a corresponding copula core entity as part of identifying relationship. For each unique combination of foreign key attributes in the copula entity during a specific time slice, a corresponding copula core entity generates a unique identifier (the COPULA CORE IDENTIFIER). This unique identifier is always dated in the corresponding copula core entity (using the SINCE DATE attribute) and identifies the beginning of the time slice during which a given combination of foreign key attributes existed in the source system. As a result, this mechanism always temporalizes foreign key relationships.

A unique PARENT CHILD ENTITY COPULA CORE.PARENT CHILD ENTITY COPULA CORE IDENTIFIER value will be generated for each combination of CHILD ENTITY CORE IDENTIFIER and PARENT ENTITY CORE IDENTIFIER attributes in the PARENT CHILD ENTITY COPULA entity. The PARENT CHILD ENTITY COPULA CORE.SINCE DATE attribute specifies the beginning of the time slice during which a particular combination of foreign key attributes existed in the source system.

The resulting Corepula Method-based schema has a well-defined and predictable structure.

Each historized entity type has the following set of attributes:

  1. A CORE IDENTIFIER, which is an identifying link from the underlying core entity,
  2. A SINCE DATE attribute, and
  3. An actual attribute that business wants to historize.

All non-historized entities have:

  1. A CORE IDENTIFIER, which is an identifying link from the underlying core entity, and
  2. A list of non-historized and static (according to business) attributes.

Similarly, all core entity types have a similar structure:

  1. A synthetically generated core identifier (CORE KEY), and
  2. A SINCE DATE attribute.

The copula entity consists of:

  1. A COPULA CORE IDENTIFIER, which is an identifier from the underlying copula-specific core entity,
  2. A list of core identifiers from the underlying parent and child “core” entities (these are foreign key links), and
  3. OPTIONAL surrogate foreign key attribute(s).

This structural sameness and consistency has numerous benefits. It drastically shortens the ETL development time and encourages the use (and re-use) of ETL templates. ETL templates can be developed once and re-used in other parts of the project. Indexing and partitioning scheme rules can be formalized, removing guesswork when creating indexing and partitioning schemes in the physical database layer.

For a more thorough introduction into Corepula Method, please download Corepula Method whitepaper (PDF)