How to create a Central Facts Model

2023-04-17

In this section we explain the steps you need to follow to create a central facts model.

At any moment, you can perform a model check to verify if there are errors in the model.

Example - Part 1

The logical validation model below is used as an example. It represents the entities and relationships of the TPCH benchmark.

LVM TPCH
LVM TPCH

Before you begin

Before you begin to convert the LVM constructs (entities, attributes and references) to CFM constructs (hubs, links, satellites and references), you need to figure out the independent and dependent entities from the LVM.

An independent entity does not rely on another entity for identification. A dependent entity is one that relies on one or more entities for identification. In other words, the identification of an independent entity does not depend on a foreign key, while a dependent entity depends on a foreign key for uniqueness.

{example} In our example, Order, Customer and Supplier are independent entities. Part Supplier and Order Line Item are examples of dependent entities.
The Order Line Item depends on the Order entity. This dependency is represented by a reference from Order Line Item to Order and the foreign key attribute Order Nbr as part of the primary key of Order Line Item. In this case, the dependent entity Order Line Item is the child entity of the reference, while the independent entity Order is the parent entity of the reference.

A dependent entity can be further classified as:

  • Associative (relationship) entity: when there are many to many relationships between two or more entities in the LVM, then an associative entity is created. The existence of an associative entity depends on more than one entity, which is represented by a reference from the associative entity to each entity on which it depends for its existence (the associative entity is the child, and the referenced entity is the parent of the reference). As a result, the primary key of the associative entity is a composition of the primary keys of each one of its parent entities.

{example} In our example, Part Supplier is an Associative entity created to represent the many to many relationship between Part and Supplier (a part can have multiple suppliers and a supplier can supply multiple parts). The attributes Available Quantity, Supply Cost and Comment describe this relationship between parts and suppliers.

  • Regular dependent entity: to ensure the uniqueness of the dependent entity, the primary key of the dependent entity is composed by its partial key and the primary key of the entity on which it depends for its existence.
    When an entity has one or more attributes associated to a valid time (“valid from” and “valid to”), then we have a special case of regular dependent entity, called bi-temporal entity. The primary key of the bi-temporal entity is composed by the primary key of the parent entity and the “valid from” attribute. The attribute associated to the valid time is also an attribute of the bi-temporal entity.
    It is possible that an entity depends on more than one entity. In this case, the primary key of the regular dependent entity is composed by the partial key of the dependent entity together with the primary key of all parent entities on which the entity depends for its existence.

{example} Order Line Item has the attribute Line Nbr, whose values are unique in the context of a given order. However, different orders can have one or more line numbers with the same value. In this case, Line Nbr is a partial key because it does not uniquely identify order line items from different orders.
A reference from Order Line Item to Order ensures the uniqueness of each order line item. The primary key of Order Line Item is composed by its partial key (Line Nbr) and the primary key of its parent entity Order (Order Nbr).

{example} In our example, Part Retail Price Temporal is a Bi-temporal entity that depends on Part. The Part Retail Price was created because the value of Retail Price is valid for a specific period.

Step 1. Adding an empty model

To start building a CFM, first you should create an empty central facts model from a template, then the model will have all the necessary default settings.

Step 2. Converting independent entities

After identifying the independent and dependent entities of the LVM, you can start the process of converting the LVM to a CFM. The first step consists of converting independent entities.

For each independent entity:

  1. Create a hub.
  2. For each attribute that compose the primary key of the independent entity, add a business key column to the new hub.
  3. Add a standard satellite for each subset of descriptive attributes that share the same timeline and are not part of a foreign key relationship. Even when the independent entity has no descriptive attributes, you need to add a standard satellite. Every hub should have at least one standard satellite to keep track of its existence.
  4. Add an immutable satellite for the attributes that will never change.
  5. Add descriptive attributes to the satellites.

For the given LVM, the following hubs and satellites are created:

Hubs and Satellites TPCH
Hubs and Satellites TPCH

Step 3. Converting dependent entities

For each associative entity and for each regular dependent entity that depends on more than one entity:

  1. Create a link.
  2. If the entity has a partial key, then for each attribute that composes the partial key, add a link key column to the new link.
  3. Add a reference from the new link to each one of the entities referenced by the dependent entity. For each referenced entity, one foreign key is added to the link.
  4. Add a standard satellite for each subset of descriptive attributes that share the same timeline and are not part of a foreign key relationship. Even when the dependent entity has no descriptive attributes, you need to add a standard satellite. Every link should have at least one standard satellite to keep track of its existence.
  5. Add an immutable satellite for the attributes that will never change.
  6. Add descriptive attributes to the satellites.

{example} In our example, there are three dependent entities: Part Supplier, Order Line Item and Part Retail Price. Part Supplier gave origin to the link Part Supplier and the satellite Part Supplier Properties.

Part Supplier TPCH
Part Supplier TPCH

For each regular dependent entity that depends on just one entity and it is bi-temporal, you add a bi-temporal satellite.

  1. Add a bi-temporal satellite to the hub (or link) created for the entity referenced by the dependent entity, i.e., the entity on which it depends on its existence.
  2. Add the remainder attributes to the bi-temporal satellite.

{example} The bi-temporal entity Retail Price Temporal originates one bi-temporal satellite to keep track of the history of the attribute Retail Price.

Bi-temporal entity TPCH
Bi-temporal entity TPCH

For each regular dependent entity that depends on just one entity, but it is not bi-temporal, you have the option to create a link (see the steps above) or add a multi-active satellite.

  1. Add a standard satellite to the hub (or link) created for the entity referenced by the dependent entity, i.e., the entity on which it depends on its existence.
  2. For the partial key of the dependent entity, add a versioning key column to the new satellite as a versioning key column.
  3. Add the remainder attributes to the satellite.

{example} There are two options two convert Order Line Item. The first one is to create a link dependent on Order, add the partial key of Order Line Item as a link key column and add a satellite to Order Line Item with its descriptive attributes.

Order Line Item as Link entity TPCH
Order Line Item as Link entity TPCH

{example} The second option is to add a multi-active satellite to Order with all attributes from Order Line Item and add the partial key Line Nbr as part of the primary key of the new satellite. The attribute Line Nbr is the versioning attribute of the multi-active satellite.

Order Line Item as Context entity TPCH
Order Line Item as Context entity TPCH

Example - Part 2

To show the use of the integration patterns, suppose a data source with information about customer's complaints. The LVM and the CFM for this data source are shown below.

LVM Complaints
LVM Complaints


CFM Complaints
CFM Complaints

Step 4. Aligning with the integration patterns

When you use integration patterns (key roots) to relate entities from different data sources, you need to add the relationships between the key roots and the corresponding hubs (or links).

If the key root belongs to a different model, for example a Concept Integration Model, then:

  1. Add a shortcut for this key root in the central facts model.
  2. For each entity (hub or link) in the central facts model, related to this key root, add a subtype reference from the entity to the shortcut added for the key root.

{example} Suppose that an integration pattern (KeyRoot-Customer) is created, in a Concept Integration Model, to integrate data about customers. A shortcut for this key root and a subtype reference from Customer to the KeyRoot-Customer are added to the central facts model.

KeyRoot Customer as shortcut TPCH
KeyRoot Customer as shortcut TPCH

If you opt for adding the integration pattern directly in the central facts model, then:

  1. Create a key root.
  2. For each entity (hub or link) in the central facts model, related to this key root, add a subtype reference from the entity to the new the key root.

{example} Now, suppose that the key root for customers does not need to be shared and there is no Concept Integration Model. In this case, a key root for customers and a subtype reference from Customer to the KeyRoot-Customer are added directly to the central facts model.

KeyRoot Customer
KeyRoot Customer

Step 5. Converting foreign key relationships

For each nonkey attribute that refers to a key attribute in a foreign key relationship identified in the corresponding LVM:

  • Option 1: Add a reference from one of the satellites of the hub (or link) that includes the nonkey attribute to the hub (or link) that holds the key attribute; or

  • Option 2: First, add a foreign key link, then add a reference from the new foreign key link to the hub (or link) that holds the nonkey attribute, and add a reference from the new foreign link to the hub (or link) that holds the key attribute. You choose this option if you want to track the changes of this relationship over time.

{example} In our example, the entity Order has foreign key relationships with Customer and Clerk. To represent this in the CFM, a reference was added from the satellite Order Properties to the hubs Customer and Clerk, respectively. The entity Order Line Item has a foreign key relationship with Part Supplier, which originates a reference from the satellite Order Line Item to the link Part Supplier.

This is the CFM obtained after converting the given LVM:

CFM TPCH
CFM TPCH

Step 6. Creating mappings

Mappings in a central facts model are mainly used to specify how to populate entities in a CFM from entities in a LVM. Besides these mappings, there are two special types of mappings in a central facts model:

  • Master mappings: is a special type of mapping used to identify existence of entities. A mapping is a master mapping if the business key of the anchor is equal to the primary key of the corresponding source entity (an entity in a LVM) that is mapped to the anchor. An anchor must have exactly one master mapping, i.e. the business key attributes of an anchor can be mapped to only one entity in a logical validation model.

  • Integrity mappings: you use integrity mappings to deal with referential integrity constraints between entities in a central facts model. Suppose that there is a foreign key in entity B that refers to the primary key of entity A and these two entities are not always populated at the same time. This means that, in a given moment, entity B may have a value for the foreign key that does not yet exists as a primary key in entity A. If no actions are taken, the loading process will fail due to the referential integrity constraint.
    To avoid this fail, you can create an integrity mapping. These mappings make sure that all foreign key values of entity B are also populated as primary key values of entity A. Doing this, the referential integrity constraint is respected even when the corresponding instance of entity A does not yet exist. This instance "will exist" only when entity A also provides the new value. In this case, the data in the central fact layer will be eventually consistent.

{example} In our example, there is a referential integrity constraint between the entities Supplier and Nation. Suppose that Nation is delivered at a different time than Supplier. In this case, you can create an integrity mapping to prevent the fail of a delivery when a supplier is delivered with a value for the attribute nation that does not exist yet as a primary key in the entity Nation.

To create a mapping between entities or attributes from the logical validation model (source model) to the central facts model (target model), first you need to select the source and target models. After this, you can create the mappings.

For each new hub, add the mappings between the corresponding independent entity in the LVM and the new hub in the CFM:

  1. Map each attribute that compose the primary key of the independent entity to the corresponding business key attributes added to the new hub.
  2. Map the attributes that compose the primary key of the independent entity to the primary key (ID) of each one of the satellites added to the hub.
  3. If necessary, map the descriptive attributes from the independent entity, which are not part of a foreign key relationship, to the corresponding satellites (standard or immutable) created for the hub.
  4. If necessary, map the foreign key descriptive attributesfrom the independent entity, to the corresponding foreign key attributes of the satellites created for the hub.

{example} In our example, the independent entity Order in the LVM is mapped to the hub Order and the satellite Order Properties. The primary key Nbr from the entity Order is mapped to the business key Nbr of the hub Order and to the primary key (ID) of the Order Properties satellite. The descriptive attributes of the entity Order are mapped to the corresponding attributes of the satellite Order Properties. The foreign key attributes Clerk Nbr and Customer Nbr are mapped to the corresponding foreign key attributes Clerk ID and Customer ID, respectively.

Order mappings TPCH
Order mappings TPCH
Order mappings TPCH
Order mappings TPCH

For each new link, add mappings between the corresponding dependent entity in the LVM and the new link in the CFM:

  1. Map the foreign key attributes that compose the primary key of the dependent entity to the corresponding foreign key attributes added to the link.
  2. Map the attributes that compose the partial key of the dependent entity to the corresponding link key columns added to the new link.
  3. Map the attributes that compose the primary key of the dependent entity to the primary key (ID) of each one of the satellites added to the link. The order of the attributes from the primary key should be respected when creating a mapping to the primary key of the link. You can check the order of the mappings and, if necessary, you can change the order of the mappings.
  4. Map the descriptive attributes from the dependent entity, which are not part of a foreign key relationship, to the corresponding satellites (standard or immutable) created for the link.

{example} In our example, the dependent entity Part Supplier in the LVM is mapped to the link Part Supplier and to the satellite Part Supplier Properties. The foreign key attributes Part Nbr and Supplier Nbr are mapped to the corresponding attributes Part ID and Supplier ID of the link Part Supplier and to the primary key (ID) of the Part Supplier Properties satellite. The descriptive attributes of the entity Part Supplier are mapped to the corresponding attributes of the satellite Part Supplier Properties.

Part Supplier mappings TPCH
Part Supplier mappings TPCH
Part Supplier Properties mappings TPCH
Part Supplier Properties mappings TPCH

For each new multi-active satellite, add the mappings between the corresponding dependent entity in the LVM and the new multi-active satellite in the CFM:

  1. Map the partial key of the dependent entity to the corresponding versioning key columns added to the satellite.
  2. Map the foreign key attributes that reference the entity on which the entity depends on its existence to the primary key (ID) of the satellite.
  3. Map all the descriptive attributes of the dependent entity, which are not part of a foreign key relationship, to the new satellite.

{example} In our example, the dependent entity Order Line Item in the LVM is mapped to the multi-active satellite Order Line Item Properties. The foreign key attribute Order Nbr is mapped to the primary key (ID) of the multi-active satellite. The partial key Line Nbr is mapped to the corresponding attribute Line Nbr of the multi-active satellite. The descriptive attributes of the entity Order Line Item are mapped to the corresponding attributes of the multi-active satellite Order Line Item Properties.

Part Supplier Properties mappings TPCH
Part Supplier Properties mappings TPCH

For each new bi-temporal satellite, add the mappings between the corresponding dependent entity in the LVM and the new bi-temporal satellite in the CFM:

  1. Map the valid time attributes (“valid from” and “valid to”) to their corresponding attributes in the new satellite.
  2. Map the foreign key that references the entity on which the dependent entity depends on its existence to the primary key (ID) of the satellite.
  3. Map all the descriptive attributes of the dependent entity, which are not part of a foreign key relationship, to the new satellite.

{example} In our example, the dependent entity Part Retail Price in the LVM is mapped to the bi-temporal satellite Part Retail Price Temporal. The attributes Valid From and Valid To are mapped to Valid Start Datetime and Valid End Datetime attributes, respectively. The foreign key attribute Order Nbr is mapped to the primary key (ID) of the bi-temporal satellite. The attribute Retail Price is mapped to the corresponding attribute Retail Price of the bi-temporal satellite.

Part Retail Price mappings TPCH
Part Retail Price mappings TPCH

For each foreign key link, add the mappings between the corresponding entities in the LVM and the new foreign key link:

  1. Map the attributes that compose the primary key of the entity that holds the key attribute to the primary key (ID) of the foreign key link.
  2. Map the attributes that compose the primary key of the entity that holds the non key attribute to the corresponding foreign key attributes added to the foreign key link.

Example - Part 3

To show the use of subtypes and generalization, suppose that in the TPC-H data source a Customer is either a Person or a Company. This means that persons and companies share their identification and are described by the same attributes.

When a customer is a person, besides the common attributes that describe both persons and companies, then the customer has an extra attribute Date of birth.When a customer is a company, besides the common attributes that describe persons and companies, then the customer has two additional attributes Contact person name and Contact person title.

In this case, Customer is considered a supertype and Person and Company are subtypes.

Supertype LVM TPCH
Supertype LVM TPCH

Step 7. Aligning with a logical validation model

Once you have the hubs and links, you may align the central facts model with the logical validation model in the following ways.

Step 7.1 Adding specializations

You can add supertype/subtype relationships to a CFM to better reflect choices made at the Logical Validation Layer.

For each supertype/subtype relationship between two entities in the logical validation model, add a subtype reference from the hub (or link) created for the supertype entity to the hub (or link) created for the subtype entity. A hub can have only "hub subtypes", while a link can have only "link subtypes".

{example} In our example, to reflect the fact that Person and Company are subtypes of Customer, two new hubs are created (Person and Company) and a subtype reference from each new hub to the hub Customer is added to the central facts model.

Supertype CFM TPCH
Supertype CFM TPCH

Step 7.2 Adding generalizations

You can add a hub to represent a generalization of other existing hubs, and add a context on the generalization level with the common attributes shared among the related hubs. Remember that hubs related through the generalization do not need to share the same business key.

To create a generalization:

  1. Add a generalization hub.
  2. Add a hyponym reference from each hub (or link), for which you create the generalization, to the new generalization.

{example} In our example, customers place an order and clerks handle the order. However, a clerk can also be a customer and place an order. In this case, Customer and Clerk can be generalized as a Person. To reflect this, a generalization (Person) is added to the central facts model and and an hyponym reference from Customer (and Clerk) to Person is also added.

Generalization CFM TPCH
Generalization CFM TPCH

Step 7.3 Adding alternate key satellites

You can add an alternate key satellite to a hub (or link) to place an additional uniqueness constraint on its attributes (either being descriptive or reference attributes). The uniqueness constraint only checks if the combination of the current values of all context attributes is unique for each key in its anchor, not if the combination is unique across the entire transaction timeline. This option tracks alternate key(s) separately and implements a uniqueness constraint for the alternate key(s).

{info} If you have multiple alternate keys that are unique individually but not combined, you have to create one alternate key context per alternate key.


{example} In our example, suppose that the entity Part has an alternate key composed by the attributes Name and Manufacturer Nbr. An alternate key satellite can be added to the hub Part to ensure the uniqueness of the combination of these two attributes.

Step 8. Adding business rule helpers

You can add business rule helpers to the central facts model to perform calculations or derivations that use data outside of the current delivery or historical versions of the data. Existing entities from the central facts model are the input for the helper and its results can be materialized in an existing hub or satellite.

{example} In our example, an order may have multiple order line items, which describe the items of a specific order. The business rule helper Order Total is used to compute values related to the whole set of order line items of a specific orde and the satellite Order Totals stores the computed values.

Order Totals Helper TPCH
Order Totals Helper TPCH

Step 9. Creating context-based entities

You create a context-based entity in the central facts layer to reflect the choices made for context-based entities in the logical validation layer. In the case of an untrusted delivery, i-refactory will use the context-based entity parameters for data processing in the central facts model.

{example} In our example, there is a context relationship between Part and Part Supplier: Part is the parent context of the entity Part Supplier. Suppose that all suppliers for just one single part are delivered, i.e. Part Supplier is delivered as a full set and Part is delivered as delta set. Since there is only focus on the specific part that you specify in the delivery, all data of the other parts remains unchanged. In this case, in order to allow implicit delete detection, the existing set of Part Suppliers must be filtered based on the context of its parent context. To allow this, the option 'Use as context filter' of the reference between Part and Part Supplier should be checked.

Constraint violation actions are applicable to certain constraint categories. Not all combinations of constraint categories and violation actions are allowed.

An attribute must have a value, whatever that value may be. It must not be NULL.

A data type of an attribute defines what value an attribute can hold. The data type specifies what type of mathematical, relational, or logical operations can be applied to it without causing an error.

An attribute datatype constraint is the most basic constraint type. It checks for the datatypes we support and have implemented.

For example, we check for string, length of string, integer, date, etc. In the following figure you can see the supported data types by PowerDesigner.

Image is omitted: Supported data types

Constraints can be violated and there are some actions that can be performed when a violation occurs. The possible actions are: EMPTY COLUMN, NO ACTION and SKIP ROW.

An attribute value constraint is an expression that is evaluated. The person who writes the expression is responsible for the correctness of it. The expression should be formulated in a positive way and lead to a Boolean answer. If the expression validates to True, than the value is correct.

Examples

  • The values in attribute X has to be bigger than 10: X > 10
  • The email address has to be in a certain pattern: email address LIKE '%_@_%.__%'

A Concept Integration Model is also a central facts model on which you place integration patterns. It is not required to create a concept integration model, but it can be very useful.

Every constraint is assigned to a constraint classification.

The main purposes of the Generic Data Access Layer (GDAL) are to provide logical perspectives for data consumption and to manage CRUD actions.

A generic data access model is a virtual data model that acts as an interface bridge between consumer applications and the central fact storage.

Every attribute is assigned to an attribute classification.

An entity record constraint checks whether an attribute meets the requirements set by another attribute belonging to the same entity.

The main purpose of the Logical Validation Layer (LVL) is to transform the data received from external data sources to fit into the logical data model structure. It is also responsible for validating deliveries. The Logical Validation Layer is also known as the Historical Staging In (HSTGIN) Layer.

The logical validation model is the representation of a single external data source in a logical format. It represent how data delivered by a specific tenant should be transformed, temporalized and validated in the {popup}logical validation layer. The logical validation model is also known as Historical Staging model (HSTGIN).

Multi-active attributes are attributes that contain a business key to provide multiple context records at the same time. For example: a customer has multiple types of phone numbers. “Home”, “Work” and “Mobile”. In that case we add a dependent entity on customer with key “Phone Nbr Type”. This is to prepare for the CFPL multi-active key on customer.

The main purpose of the Technical Staging Layer (TSL) is to create a common starting point for further data processing. It receives data delivered from external data sources and temporally stores them in a database. The input data should be in a tabular format (rows and columns).

Bi-temporal attribute is an attribute that changes over time: they follow a valid timeline. For example, a Part may have a price valid for December and a price valid for January.

Every entity is assigned to an entity classification and to a parent entity classification. The possible values for entity classification are: ALTERNATE KEY CONTEXT, ATTRIBUTE CONTEXT, GENERALIZATION,HELPER, REFERENCE CONTEXT, STABLE, STABLE DEPENDENT and STABLE INDEPENDENT

Entity Set Constraint An entity set constraint can be used to perform a check concerning values of two or more attributes that belong to different entities or to perform a check concerning the value of an attribute with respect to a set of values.

A Set Constraint Helper is a helper in the logical validation model. It is the implementation of a set constraint. The helper returns the records of an entity for a given set constraint, where the instances of this entity do not meet the definition of this set constraint.

The business requirements describe how data should be delivered for the data consumers (end users or applications) in terms of concepts, relationships between concepts and constraints to validate the data. These requirements can be described in a logical data model, for example.

A Business Rule Helper is a helper in the central facts model. It is a set-based calculation of derived facts. You need to use a Business Rule Helper if you want to make a calculation and want to keep a transaction history of the results of this calculation. You use the existing entities from the central facts model as input. The results of the helper must be materialized in 'regular' fact entities, such as Anchors and Contexts, to make them accessible in the Generic Data Access Layer.

Closed Open means that the timeline is valid from (vanaf in Dutch) the supplied valid start date until - but not including - (tot in Dutch) the supplied valid end date. In practice, this means that the start date of a valid time record is equal to the end date of the previous valid time record.

You need to create context-based entities when a set of data may be delivered within the boundaries of a parent context. A context-based entity applies when:

  • At least 2 entities are delivered.
  • A context relationship exists between these 2 entities. One entity is the parent context of the other entity.
  • The parent context entity is delivered as a delta and the child entity is delivered as a full set.

You need to create context-based entities when a set of data may be delivered within the boundaries of a parent context. A context-based entity applies when:

  • At least 2 entities are delivered.
  • A context relationship exists between these 2 entities. One entity is the parent context of the other entity.
  • The parent context entity is delivered as a delta and the child entity is delivered as a full set.

The Management Model contains the PowerDesigner objects for the Unified Anchor Modelling (UAM). When a UAM object is created, a so-called PowerDesigner replica of the corresponding Management Model object is created. This means that certain properties such as metadata columns and column stereotypes are configured in the Management Model and cannot be changed. The replication settings specify which elements of an object can be changed after creating a replica from the template object. It is possible to override the replication settings of an UAM object and change a specific property.

The temporal atomic type describes the datatype of the temporal attributes|

The main purposes of the Central Facts Layer (CFL) is to store data historically. It can also integrate data from different sources. The Central Facts Layer is also known as Central Facts Persistency Layer (CFPL)

The central facts persistence implementation model is the representation of facts in an anchorized data model with the ability to integrate multiple logical models.

In the context of i-refactory, data transformation refers to operations involved in turning raw data readily useful and closer to the business requirements.

Integration patterns are used to integrate entities from different data models. If two or more entities from different data models share the same business key, you can use the Integration Pattern named Key Root. It is a good practice to capture integration patterns in a separate model, named Concept Integration Model.

An attribute is mandatory when its value can not be empty (NULL).

A Physical Data Model (PDM) represents how data will be implemented in a specific database.

{note} The i-refactory uses four PDMs: technical staging model, logical validation model, central facts model and generic access model. Each one of these models is implemented as an additional database, which is used to store data from external and internal data sources.

Reverse engineering is the process of reconstructing a physical and/or Entity Relationship (ER) model from an existing data source. The purpose of reverse engineering is to avoid manual work as much as possible.

Architecture layer

The core of the i-refactory architecture has four layers: TSTGIN, LVL, CFL and GDAL. There are also two auxiliary layers: UCLVL and EXT.

If an entity has one or more attributes that changes over time and you want to keep track of when a attribute is valid at a certain transaction time, then you have a special case of a regular dependent entity, called bi-temporal entity. The bi-temporal entity stores historical data with two timelines. The primary key of the bi-temporal entity is composed by the primary key of the parent entity and the valid start date attribute. The attribute that changes over the valid time is called a bi-temporal attribute.

If an entity has one or more attributes that changes over time and you want to keep track of when a attribute is valid at a certain transaction time, then you have a special case of a regular dependent entity, called bi-temporal entity. The bi-temporal entity stores historical data with two timelines. The primary key of the bi-temporal entity is composed by the primary key of the parent entity and the valid start date attribute. The attribute that changes over the valid time is called a bi-temporal attribute.

A delivery agreement is a contract between a Tenant and a Logical Implementation Model or Generic Data Access model. An agreement has a duration. The delivery agreement set the architecture layer (interface) where the data should be ingested as well as the default settings to be applied to the deliveries.

A dependency mapping is a mapping between a helper (or BR helper) and a source entity used in the query of the helper. The helper and the source entity must belong to the same model.

  • Default dependency is set on entity level (source entity to helper entity)
  • To allow lineage on attribute level, via the Mapping editor, you could manually add the dependency on attribute level.

An Independent Entity is an entity that implements an Anchor for a Business Key that ‘stands alone’ e.g. that does not contain a reference to another Entity.

An Independent Entity is an entity that implements an Anchor for a Business Key that ‘stands alone’ e.g. that does not contain a reference to another Entity.

A Logical Data Model (LDM) matches the language, structure and quality of the business, regardless of the physical data implementation. The Logical Data Model reflects the business requirements.

A delivery may be considered as "untrusted" if deletes of data in the Logical Validation Layer have taken place and the processing of new deliveries cannot 100% rely (trust) on having enough statistics and data available to detect logical deletes, to determine the exact delta and to execute set based validations.

A delivery may be considered as "untrusted" if deletes of data in the Logical Validation Layer have taken place and the processing of new deliveries cannot 100% rely (trust) on having enough statistics and data available to detect logical deletes, to determine the exact delta and to execute set based validations.

A Dependent Entity is an entity that implements an Anchor for a Business Key that ‘depends’ in its existence on another Entity. A Dependent Entity contains Business Key fields of which at least one is a foreign key (FK).

A Dependent Entity is an entity that implements an Anchor for a Business Key that ‘depends’ in its existence on another Entity. A Dependent Entity contains Business Key fields of which at least one is a foreign key (FK).

The transaction time in i-refactory is different from what is commonly understood by transaction time. Transaction time is usually seen as the moment when a fact was stored in the database. In the i-refactory, the transaction time is the time, as dictated by the source system, not by the i-refactory database.

The Attribute type links the attribute to one of the existing interfaces.

Computed columns are columns whose content is computed from values in other columns in the table.

Functional date A functional date or time is a point in time and is defined by a user. An example is an order date or date of birth.

The technical model (also known as Technical Staging In model: TSTGIN) is a representation of how exactly one delivery from a specific data source will be processed in the technical staging layer.

Generalization is the process of extracting shared characteristics from two or more classes (hyponyms), and combining them into a generalized superclass (hypernym). For example: an 'employee' and a 'customer' are both 'persons'.

The Mapping Editor provides a graphical interface for creating and viewing mappings between models. It provides a global view of all the mappings related to the entities of a given model, allowing you to quickly identify those which are mapped and not mapped.

When a certain fact can change over time and you need to capture when that fact is valid in the real world, you can add a valid start date and a valid end date to the entity.

A valid time tells us in which period a record is valid. While a functional date represents just one point in time, the valid time has a begin and an end date, for example:

  • For Order item 123, a Retail price of 10.00 was valid from 2019-01-01 to 2019-06-01.
  • For Order item 123, a Retail price of 12.00 was valid from 2019-06-01 to 2020-01-01.

Alternate key is an attribute or a group of attributes whose values uniquely identify every record in an entity, but which is not the primary key

Candidate key

A candidate key consists of one or more attributes and meets the following requirements:

  • Unique: The value of the key defines uniquely one instance of a concepts. There are no double values.
  • Non-volatile: (Almost) doesn't change.
  • Minimal: Contains only the elements needed.

There are two kinds of candidate keys:

  • primary key
  • alternative key

Normalization is the process of decomposing tables in a database in order to reduce data redundancy and improve data integrity.

A strongly typed model is a model in which each all attributes have a predefined data type, for example: integers, doubles, date.

Surrogate Key A surrogate key is a system generated unique identifier that does not have any contextual or business meaning.

Business Key

A business key is an unique identifier that has business meaning and exists in the real world outside of the database. It consists of a column or a set of columns that already exists in a table. A business key is also known as a natural key

A Key Root Hub is an integration concept that must be used when the exact same business concept or independent business key occurs in different models. The Hubs for this independent business key in the different UAM models are all subtypes of the Keyroot Hub.

A relationship shows how two entities are related to one another. For example, a customer can place an order, and a order can have a customer.

Every Attribute has an atomic type (data type) which is linked to the attribute type of that attribute.

The cardinality shows how many instances of an entity can take place in a relationship.

The cardinality shows how many instances of an entity can take place in a relationship.

An enumeration consists of the list of values that a given attribute should adhere to.

{example} An order can have different statuses, such as shipped,packing,created anddone. Other statuses are not allowed.

Foreign Key

A foreign key is an attribute or a set of attributes that refers to the primary key of another entity. The original entity containing the primary key is called the 'parent' entity and the entity containing the foreign key is called the 'child' entity.

A natural key is an unique identifier that has business meaning and exists in the real world outside of the database. It consists of an column or a set of columns that already exists in a table. A natural key is also known as a business key

The primary key is an assigned key that consists of a minimal set of attributes to uniquely specify an instance of a record. The attribute or a combination of attributes should meet the following characteristics:

  • Unique: The attribute values of the key uniquely identify one instance of a concept. There are no duplicate instances.
  • Non-volatile: The key does not change.
  • Mandatory: All values are filled; there are no NULL values.

It is good practice to choose a primary key that also meet the following characteristic:

  • Safe: Doesn't contain private or sensitive information, such as a social security number.

Constraints are related to the other elements depending of the type of the constraint. Certain constraints are associated to attributes, entities, helper entities, unique keys or relationships between entities.

An attribute may be assigned to one or more entities (ex: acm_exists_ind) and an entity may have several attributes

Each layer may have one or more interfaces. The amount of interfaces depend on how many tenants and delivery agreements have been configured.

Namespace is what in the terminology of SQL Server is called database schema.|

A Delivery is a container that holds the specification of what is actually pushed to the i-refactory platform. This specification consists of a list of entities.

A Delivery is a container that holds the specification of what is actually pushed to the i-refactory platform. This specification consists of a list of entities.

Key Root A Key Root is a central repository for Business Keys. A Key Root ensures that similar records out of different data sources are identified by both the same Business Key as the Surrogated Key.

Context

A Context is a temporal table with a transaction start and end date. The Context tracks all changes of the context attributes related to a business key in the transaction time. This means that every change of an attribute value in a source system leads to a new record in the Context. The old record is end dated with the load date and the new record is start dated with the load date.

Hyponym is a term that denotes a subcategory of a more general class. For example: 'cat' and 'dog' are a hyponyms of 'animal'.

A mapping establishes relationships between concepts of separate data models. It creates a link between entities and attributes from a source model to related entities and attributes in the target model. A source model should precede the target model in the i-refactory architecture.

oasi_bk is an abbreviation for One Attribute Set Interface (OASI) with business keys. A normal view in the generic data access layer (GDAL) consists of the surrogate key, foreign key and attributes. The oasi_bk-view in the GDAL is a view where the business key(s) are also shown.

A subtype is a subgroup of an entity. You can create a subtype if a group of instances share some attributes and relationships that only exist for that group. For example, entity Customer can have a subtype Company and a subtype Person. They share the common attribute customer number, and can have some attributes of their own. Such as birth date for a Person. The entity Customer is called a supertype.

A subtype:

  • inherits all attributes of the supertype
  • inherits all relationships of the supertype
  • usually has one or more own attributes
  • can have subtypes of its own

Anchor: Independent Entity

An Independent Entity is an entity that implements an Anchor for a Business Key that ‘stands alone’ e.g. that does not contain a reference to another Entity.

Anchor: Dependent Entity

A Dependent Entity is an entity that implements an Anchor for a Business Key that ‘depends’ in its existence on another Entity.

A domain will help you to identify the types of information in your model. It defines the set of values for which a column is valid. A domain can specify a data type, length, precision, mandatoriness, check parameters, and business rules. It can be applied to multiple columns, which makes it easier to standardize data characteristics for columns in different tables.

Each interface may have one or more entities and one entity belongs to only one interface. An entity belongs to an i-refactory data model.

Each interface may have one or more entities and one entity belongs to only one interface. An entity belongs to an i-refactory data model.

A helper entity creates a derived entity and can be used when you need to transform, filter, or calculate data. The purpose of a helper differs per model:

  • Technical model: a helper is used to transform data.
  • Logical validation model: a helper is an implementation of a set constraint (Set Constraint Helper).
  • Central facts model: a helper is used for a set-based calculation of derived facts (Business Rule Helper).

HSTGIN is the abbreviation of Historical STaging IN. It is an older term to indicate the Logical Validation Model or Logical Validation Layer.

A schema is a set of database objects, such as tables, views, triggers, stored procedures, etc. In some databases a schema is called a namespace. A schema always belongs to one database. However, a database may have one or multiple schema's. A database administrator (DBA) can set different user permissions for each schema.

Each database represents tables internally as <schema_name>.<table_name>, for example tpc_h.customer. A schema helps to distinguish between tables belonging to different data sources. For example, two tables in two schema's can share the same name: tpc_h.customer and complaints.customer.

A Tenant is a delivering party for a dataset or datarecord as agreed in the Delivery Agreement.

TSTGIN is the abbreviation of Technical STaging IN. It is an older term to indicate the Technical Model or Technical Staging Layer.

An index organizes data in a way that improves the speed of data retrieval from a database. To maintain the index data structure, there is a cost of additional writes and storage space.

An index organizes data in a way that improves the speed of data retrieval from a database. To maintain the index data structure, there is a cost of additional writes and storage space.

The acronym CRUD stands for create, read, update, and delete. These are the four basic functions of persistent storage.

OLAP is a acronym for Online Analytical Processing. OLAP is category of software tools which provide analysis of data for business decisions. It uses complex queries to analyze aggregated historical data from OLTP systems.The primary objective is data analysis and not data processing.

OLTP is a acronym for Online transaction processing. OLTP captures, stores, and processes data from transactions in real time. Its primary objective is data processing and not data analysis.

A hub or independent entity is an entity that implements an Anchor for a business key that ‘stands alone’ e.g. that does not contain a reference to another entity. An independent entity contains business key fields, that show up as alternate key (AK), and the primary key (PK) is its surrogate key (ID).

A key is a combination of one or more attributes of an entity that uniquely defines one instance of that entity.