How to create a logical validation model

2023-04-25

In this section we explain the steps you need to follow to create a logical validation model (LVM) from scratch. The main assumption is that you understand the business data request (the Need) and you performed a source analysis (the Supply). In another section we explain how to generate a logical validation model from an existing LDM (Logical Data Model).

Step 1. Adding an empty model
Step 2. Creating entities
Step 3. Adding references
Step 4. Creating subtypes and generalizations
Step 5. Adding advanced constraints
Step 6. Creating mappings
Step 7. Preparing for delta-loads

At any moment, you can perform a model check to verify if there are errors in the model.

Example

The logical data model and the technical staging model below are used as an example to illustrate the logical validation model concepts and to explain the steps for creating a logical validation model. It represents entities and relationships of the TPCH benchmark.

The TPCH logical data model:

The TPCH technical staging model:

Before you begin

Before you begin to create a logical validation model for a specific technical staging model you should analyze the technical model (the Supply) and how it is related with the business requirements (the Need). Based on this knowledge, you identify the entities, references between entities and constraints of the logical validation model.

Identifying entities

When the technical model already meets the business requirements, then the logical validation model will be a copy from the technical model. For each entity in the technical model, you add a corresponding entity in the logical validation model.

When there is no one to one correspondence between the entities in the technical model and the concepts from the business requirements, then you need to identify the concepts from the business requirements, whose instances can be obtained from entities of that specific technical model. For each identified concept, you add an entity in the logical validation model. It is possible that one entity in the technical model corresponds to multiple concepts. In the logical validation model, this entity is normalized to reflect the business requirements.

It is possible that the naming of entities and attributes in the technical staging model already match the naming of the business concepts, but it is also possible that the technical staging model uses different naming conventions. In the last case, you need to adjust the names.

{example} In the TPCH logical data model, brand and manufacturer are seen as concepts, but in the TPCH technical staging model, brand and manufacturer are attributes from the entity Part. To reflect the business requirements, in the corresponding logical validation model, the entity Part is normalized and two new entities are created: Brand and Manufacturer. When data flows from the Technical Layer to the Logical Validation Layer, the entities Part, Brand and Manufacturer are populated with data coming from the entity Part in the technical staging model.

In a similar way, Clerk is seen as a concept in the logical data model, but in the technical model is represented as an attribute from the entity Orders. To reflect the business requirements, in the TPCH logical validation model, an entity is created for the concept Clerk.

Identifying references

References between entities in a logical validation model can be identified from the references between entities in the technical staging model, when available, but also from the relationships between entities specified in the business requirements.

Relationships one-to-one (1:1) and one-to-many (1:N) between two entities are represented as references in the logical validation model. For relationships many-to-many, an associative entity is created in the LVM with a reference between the associative entity and each one of the entities from the relationship.

{example} In our example, consider that there is a one-to-many (1:N) relationship between Region and Nation, which means that a region may have multiple nations, and one nation belongs to only one region. This relationship is represented in the LVM as a reference between the entity Nation and the entity Region.

The relationship between Part and Supplier is a many-to-many (N:M) relationship, which means that a part may have multiple suppliers and a supplier may supply multiple parts. This relationship in the LVM is represented with an associative entity, called Part Supplier, and one reference between Part Supplier and Part, and another reference between Part Supplier and Supplier.

In a reference between two entities, one of them is the parent and the other is the child entity. For each reference, you can specify its cardinality, which can have one of the following values:

0..* : A parent can have zero or more children.
0..1 : A parent can have zero or one children.
1..* : A parent can have one or more children.
1..1 : A parent must have exactly one child

Besides the cardinality, additional properties can be specified to indicate if the participation of the parent entity is mandatory. When the parent is mandatory then the child entity is a dependent entity, and it relies on the parent entity for identification. In this case, the primary key of the parent entity is included in the child entity as a foreign key, and it becomes part of the primary key of the child entity. When the parent is not mandatory then the child entity is an independent entity.

{example} Consider the reference between the entities nation and region (region is the parent entity and nation is the child entity). This reference has cardinality 0..*, which specifies that an instance of the entity region can be related to none or several instances of nation. The participation of the parent entity (region) is not mandatory, i.e., it is possible to have instances of nation without a region.

The picture below illustrates the possible cardinality values for the reference between region and nation. In all cases, the participation of the parent entity is not mandatory.

Different cardinality values for the reference between nation and region

Identifying constraints

Constraints are used to ensure the accuracy and the reliability of the source data. Constraints are identified based on the business requirements and can be classified as Built-in and Advanced. Built-in constraints are inherent to the data model and can be checked automatically, i.e. there is no need to perform additional steps to validate the inbound data delivery. Advanced constraints are added to perform validations at attribute, entity record, or entity set level.

The table below shows the different types of constraints, which can be used in a logical validation model to prevent non-compliant data from being further processed.

Name	Type	How it works?
Datatype	Built-in	A datatype specifies the possible values for an attribute. When an attribute has different data types for the technical staging model and the logical validation model, then i-refactory checks if the data can be converted from the source datatype (datatype in the technical model) to the target datatype (datatype in the logical validation model). When it is not possible, the values of the violated constraints will be returned.
Mandatory value	Built-in	When an attribute is set as mandatory, the attribute must have a value. i-refactory will check if the attribute has a value.
Key	Built-in	When an attribute is set as a primary key, i-refactory will check if the primary key has a value and if this value is unique. When an attribute is set as an alternate key and a value is being populated, i-refactory will check if the alternate key value is unique.
Relationship	Built-in	When there is a relationship between two entities (parent and child entities) in the logical validation model then i-refactory will check: If the parent relationship exists: do the child entities have a parent entity? If the child relationship exists: does the parent entity have a child entity?
Enumeration	Advanced	Enumerations specify a list of values that an attribute may have. It can be applied to one specific attribute or to a domain.
Attribute value	Advanced	Attribute value constraints check whether the value of an attribute meets predefined conditions and can be assigned to one or more attributes. These constraints are formulated as a Should Comply To boolean expression. i-refactory uses this constraint as part of the validation process and gather statistic of non compliant attribute values. This will lead to a refused data delivered or it can help to identify data quality mismatches.
Entity-record	Advanced	Entity record constraints check whether an attribute meets the requirements set by another attribute. Entity record constraints use values of two different attributes from the same record to verify with regard to each other.
Entity-set	Advanced	Entity set constraints perform checks concerning the values of two or more attributes that belong to different entities or the value of an attribute with respect to a set of values (this happens when incoming data needs to be checked on existing data).

{example} In our example, constraints about datatypes, primary keys and relationships are explicit in the diagram that represents the logical data model. Other validations are implemented as entity-set and entity-record constraints. An entity-set constraint is created to validate the value of the total order price of a given order (the value of the total order price of a given order should be equal to the value obtained when computing the total order price from the several order line items that compose the order). This validation is implemented through an entity-set constraint because values from different entities are compared. An entity-record constraint is created to make sure that the shipment date (ship date) of an order line item is less or equal to its receipt date.

Step 1. Adding an empty model

To start building a logical validation model, first you should create an empty logical validation model from a template, then the model will have all the necessary default settings. At any moment, you can perform a model check to verify if there are errors in the model.

Step 2. Creating entities

For each identified entity:

Create an entity.
Consider creating data domains to further unify your data.
Add one or more attributes to the new entity.
- Adjust settings for valid time attributes: If the logical validation model uses valid times, these columns require special attention in order to be correctly treated in the runtime process as a bi-temporal entity.
Add a primary key to the new entity.
If necessary, create an index.

{tip} It is good practice to use domain types instead of a datatype. This is especially convenient when multiple attributes need to adhere to the same datatype and values. If the datatype or enumeration changes, you only have to adjust one domain instead of several attributes.

{info} If you use dates and times, it is important to know the differences between functional date, valid time, and transaction time.

{example} In our example, the set of entities in the logical validation model is the same set of entities from the logical data model. But, when comparing the logical validation model with the technical staging model, you see three extra entities in the logical validation model: Clerk, Brand and Manufacturer. These entities are not explicitly represented in the technical staging model, but their information is hidden in the entities Orders and Part. There are also some differences in the name of entities (Orders, LineItem, PartSupp) and attributes because the naming in the logical validation model should follow the naming convention from the business requirements. The datatypes from attributes in the logical validation model also follow the logical data model.

Step 3. Adding references

For each identified reference:

Add a reference from the child entity to the parent entity of the relationship.
Indicate the cardinality of the reference, i.e. the minimum and maximum number of instances in a child entity permitted for each corresponding instance in the parent entity.
Indicate if the parent is mandatory, i.e. each foreign key value in the child entity must have a corresponding key value in the parent entity (the child entity depends on the parent entity).
If necessary adjust other properties of the reference.

{example} In our example, the references from the logical validation model are the same references from the logical data model.

Logical validation Model with references

Step 3.1. Change reference settings to create a context-based entity

This step is necessary when you have context-based partial deliveries and you need to change the reference settings to create a context-based entity. You need to create context-based entities when a set of data of a child entity may be delivered within the boundaries of a parent context. A context-based entity applies when:

At least 2 entities are delivered.
A context relationship exists between these 2 entities. One entity is the parent context of the other entity.
The parent context entity is delivered as a delta and the child entity is delivered as a full set.

{example} It may happen that all the Part Suppliers for just one single Part number are delivered. Since there is only focus on the part that you specify in the delivery, all data of the other parts remains unchanged. To allow implicit delete detection and set based validations, the existing set of part suppliers must be filtered based on its parent context. Suppose that Part is delivered as a delta delivery and only parts 1, 2, 200 are delivered. Part Supplier is delivered as a full load. This is what will happen:

For the Part numbers 1, 2, 200 we will do a full compare of the Part Supplier records between what is delivered and what is in CFPL.

If a new key of Part Supplier is added, new records are added to Part Supplier.

If a key of Part Supplier is delivered but attributes are altered, the record is updated.

If a key of Part Supplier is not delivered, the record in CFPL is flagged as deleted.

If you add a not yet existing Part number, it will be added as a record in Part.

The constraints are also checked only for this context delivery. So, between the boundary of the context data.

If you forget to set the context filter to True and you make a partial delivery for only Part numbers 1, 2, 200, than all other Part numbers will be deleted.

Reference "supplier supplies many parts" with use context as filter option on

Another term used for this is context-based enddating, since it enddates records that are not taken into the delivery.

{warning} Do not forget to set the context filter in the central facts model as well. It is not automatically added.

Step 4. Creating subtypes and generalizations

When you identify subtype/supertype patterns and generalization patterns in the business requirements, you can use a supertype/subtype relationship in the LVM to capture them. You have two options to create a supertype/subtype relationship in a LVM:

Create separate tables for each subtype (or specialization)
Create a rollup table that contains all the attributes of the supertype (or generalization) and the attributes of the subtypes (or specializations)

{warning} Be aware that you need to implement keys in the generalized entity, although a strict generalized entity doesn't need to have keys.

Option 1: Create separate tables for each subtype

To generate subtypes as separate tables:

If necessary, add an entity for each subtype as well as an entity for the supertype.
Add a reference between each subtype and the supertype. The child entities (subtypes) have a zero-to-one relationship to the parent table (supertype).

{example} Three separate entities (Customer, Person and Company) are used to represent the supertype/subtype relationship.

In the logical validation model it is possible to define subtype-supertype pattern as Complete and/or Mutual Exclusive.

Complete pattern: all instances in the supertype can be found in the related subtypes. For example, a provider offers TV, internet, and phone plans, there are no other type of plans. A customer can choose to get a TV and internet plan, without a phone.
Mutually exclusive pattern: one instance in the supertype can be found in none or just one Subtype. For example, a person can either be male or female.
Mutually exclusive and complete pattern: one instance in the supertype must be found in just one subtype. For example, a customer must either be a (natural) person or a company, there are no other types of customer available.

These patterns are not automatically implemented in the logical validation model or generated as metadata by i-refactory. To account for these patterns, you need to add set constraints to the logical validation model.

{warning} Depending on the type of subtype-supertype pattern (for example, mutually exclusive), the cardinality of the relationship between a subtype and supertype might not be correct. You need to adjust it accordingly.

Option 2: Create a rollup table

To create a rollup table containing all the attributes of the subtypes and the supertype:

Create an entity for the supertype.
Add all the attributes that are common to the subtypes and all attributes specific for each subtype to the supertype.

The attributes of the subtypes are rolled up into one table.

{example} Instead of having separate entities, you have only one entity (Customer), with the common attributes of Person and Company, as well as the specific attributes of each subtype.

Step 5. Adding advanced constraints

Advanced constraints are user-defined constraints, which can specify:

constraints on values of an attribute or domain (enumeration or attribute value constraint)
constraints on at least two attributes within one record (entity record constraint)
constraints on at least one attribute within one or more entities (entity set constraint)

After creating these constraints, you need to attach them to the relevant attribute, domain or entity.

Step 5.1. Attribute value constraint

To add an attribute value constraint:

Create the attribute value constraint.
Assign the attribute value constraint to the relevant attribute(s).

{tip} If you want to use regular expressions to validate values, you need to add a .NET function. This website gives access to a .NET function to provide access to regular expressions (regex). For using regular expressions, first validate the query through regex101.

Step 5.2. Enumeration constraint

To add an enumeration constraint:

Choose the attribute or domain whose value should be validated.
Create the enumeration constraint.

{example} In our example, the number of a region must be between 0 and 4. To validate this, you create an enumeration constraint for the attribute Nbr of the entity Region.

Step 5.3. Entity-record constraint

Since Entity record constraints concern more than one attribute, they are assigned to an entity instead of an attribute. The regular expression is formulated as Should Comply To boolean expression.

Entity Record Constraints explicitly define the columns used - case sensitive. Columns are identified by their code.

To add an entity record constraint:

Create the entity record constraint.
Assign the entity record constraint to the relevant entity.

{example} In our example, there is a constraint that specifies that all records from Order Line Item for which the value of Valid From is smaller than the value of Valid To are returned.

Step 5.4. Entity-set constraint

To add an entity set constraint:

Create a helper that contains a query which returns the records where the constraint is violated.
Create an entity set constraint and assign the helper to this constraint.
Assign the entity set constraint to the relevant entity.

{info} An entity set constraint uses a set constraint helper, which needs to contain a query formulated as Identified as Incorrect. This means that it returns the set of records that are faulty, i.e., all records that do not comply with the constraint are returned. The choice to use Identified as Incorrect is based on performance. By selecting incorrect records directly, you avoid additional processing steps, such as comparing the correct records with the total set to determine the incorrect records.

{example} In our example, the value of total price of a given order should be equal to the total price of the order when calculated from its order line items. To validate this, there is an entity set constraint that returns all records from the entity Order that violate this requirement.

Step 3: Assign entity set constraint to entity

Step 6. Creating mappings

When you create a mapping between the technical model and the logical validation model, it is important to remember the following mapping rules:

Generally, there is a one-to-one mapping between attributes from the technical model to the logical validation model.
An attribute in the logical validation model must be mapped to one and only one attribute in the technical model. You can not concatenate multiple attributes from the technical model to the logical validation model.
However, an attribute in the technical model can be mapped to multiple attributes in the logical validation model.
An entity in the technical model can be mapped to multiple attributes in the logical validation model.

To create a mapping between entities or attributes from the technical staging model (source model) to the logical validation model (target model), first you need to select the source and target models. After this, you can create the mappings.

For each new entity in the LVM:

Add a mapping between the corresponding entity or helper in the TSM and the new entity in the LVM.
If necessary, add mappings between the attributes of the corresponding entity or helper in the TSM and the attributes of the new entity in the LVM.

Data type conversion: the data types of the mapped columns in a technical model and in a logical validation model do not necessarily have the same data type. The data type in a logical validation model can be stricter than the datatype in a technical model. When an attribute in a technical model is mapped to an attribute in a logical validation model, i-refactory explicitly converts the data to the datatype from the logical validation model.

{example} Suppose that in the technical model a date in ISO date format is provided as a string field: 8900-12-31. In the logical validation model, this field is modeled as a datetime2(7). At runtime, the loading process will execute a TRY_CAST on the string field as datetime2(7): 8900-12-31 00:00:00.0000000.

Mapping of Primary key: when primary key columns from the source entity in the technical model are mapped to the corresponding target entity in the logical validation model, the uniqueness is checked when data is loaded in the Logical Validation Layer.

{warning} When there is no mapping between a primary key in the logical validation model and a primary key in the technical model, then a Distinct function needs to be used to generate uniqueness in the Logical Validation Layer.

Normalization: when the target model (LVM) is more normalized than the source model (TSM), i-refactory automatically normalizes the data if the entity is mapped to its normalized entities.

{example} The TPCH technical model has an entity ORDER that includes information about orders but also information about clerks. Such as O_ORDERKEY, O_ORDERSTATUS, O_CLERK, O_CLERKKEY. The clerk information is hidden in the entity and there will be duplicate clerk keys and clerk names in Order records. In the logical validation model, the data should be normalized and two entities should exist: one entity for Order and one entity for Clerk. The mapping can be made from ORDER.O_CLERKKEY in the technical model to Clerk.Nbr in the logical validation model. Any key that is mapped in this scenario is automatically validated by calculating the distinct set.

Entity 'Clerk' populated from the normalized technical staging entity

{info} During a delivery the DISTINCT is applied while loading the data into the normalized entity. If the denormalization ends up in key conflicts, you need to add a helper to the technical model to resolve the conflicts. This reflects an improved target model.

Step 7. Preparing for delta-loads

Additional mapping activities in the logical validation model are required to support delta-loads. Since a delta-load must indicate if a record is inserted (I), updated (U), or deleted (D), this information from the technical model must be transferred to the logical validation model.

For each entity for which a delta-load is expected:

Apply the Staging Template to reveal the hidden metadata attributes.
Map delta indicator from technical model to logical validation model

{warning} You need to make sure that the values in ACM source transaction indicator corresponds to I, U, or D. The runtime processing through i-refactory uses this value as a given.

Constraint violation actions are applicable to certain constraint categories. Not all combinations of constraint categories and violation actions are allowed.

An attribute must have a value, whatever that value may be. It must not be NULL.

A data type of an attribute defines what value an attribute can hold. The data type specifies what type of mathematical, relational, or logical operations can be applied to it without causing an error.

An attribute datatype constraint is the most basic constraint type. It checks for the datatypes we support and have implemented.

For example, we check for string, length of string, integer, date, etc. In the following figure you can see the supported data types by PowerDesigner.

Image is omitted: Supported data types

Constraints can be violated and there are some actions that can be performed when a violation occurs. The possible actions are: EMPTY COLUMN, NO ACTION and SKIP ROW.

An attribute value constraint is an expression that is evaluated. The person who writes the expression is responsible for the correctness of it. The expression should be formulated in a positive way and lead to a Boolean answer. If the expression validates to True, than the value is correct.

Examples

The values in attribute X has to be bigger than 10: X > 10
The email address has to be in a certain pattern: email address LIKE '%_@_%.__%'

A Concept Integration Model is also a central facts model on which you place integration patterns. It is not required to create a concept integration model, but it can be very useful.

Every constraint is assigned to a constraint classification.

The main purposes of the Generic Data Access Layer (GDAL) are to provide logical perspectives for data consumption and to manage CRUD actions.

A generic data access model is a virtual data model that acts as an interface bridge between consumer applications and the central fact storage.

Every attribute is assigned to an attribute classification.

An entity record constraint checks whether an attribute meets the requirements set by another attribute belonging to the same entity.

The main purpose of the Logical Validation Layer (LVL) is to transform the data received from external data sources to fit into the logical data model structure. It is also responsible for validating deliveries. The Logical Validation Layer is also known as the Historical Staging In (HSTGIN) Layer.

The logical validation model is the representation of a single external data source in a logical format. It represent how data delivered by a specific tenant should be transformed, temporalized and validated in the {popup}logical validation layer. The logical validation model is also known as Historical Staging model (HSTGIN).

Multi-active attributes are attributes that contain a business key to provide multiple context records at the same time. For example: a customer has multiple types of phone numbers. “Home”, “Work” and “Mobile”. In that case we add a dependent entity on customer with key “Phone Nbr Type”. This is to prepare for the CFPL multi-active key on customer.

The main purpose of the Technical Staging Layer (TSL) is to create a common starting point for further data processing. It receives data delivered from external data sources and temporally stores them in a database. The input data should be in a tabular format (rows and columns).

Bi-temporal attribute is an attribute that changes over time: they follow a valid timeline. For example, a Part may have a price valid for December and a price valid for January.

Every entity is assigned to an entity classification and to a parent entity classification. The possible values for entity classification are: ALTERNATE KEY CONTEXT, ATTRIBUTE CONTEXT, GENERALIZATION,HELPER, REFERENCE CONTEXT, STABLE, STABLE DEPENDENT and STABLE INDEPENDENT

Entity Set Constraint An entity set constraint can be used to perform a check concerning values of two or more attributes that belong to different entities or to perform a check concerning the value of an attribute with respect to a set of values.

A Set Constraint Helper is a helper in the logical validation model. It is the implementation of a set constraint. The helper returns the records of an entity for a given set constraint, where the instances of this entity do not meet the definition of this set constraint.

The business requirements describe how data should be delivered for the data consumers (end users or applications) in terms of concepts, relationships between concepts and constraints to validate the data. These requirements can be described in a logical data model, for example.

A Business Rule Helper is a helper in the central facts model. It is a set-based calculation of derived facts. You need to use a Business Rule Helper if you want to make a calculation and want to keep a transaction history of the results of this calculation. You use the existing entities from the central facts model as input. The results of the helper must be materialized in 'regular' fact entities, such as Anchors and Contexts, to make them accessible in the Generic Data Access Layer.

Closed Open means that the timeline is valid from (vanaf in Dutch) the supplied valid start date until - but not including - (tot in Dutch) the supplied valid end date. In practice, this means that the start date of a valid time record is equal to the end date of the previous valid time record.

You need to create context-based entities when a set of data may be delivered within the boundaries of a parent context. A context-based entity applies when:

At least 2 entities are delivered.
A context relationship exists between these 2 entities. One entity is the parent context of the other entity.
The parent context entity is delivered as a delta and the child entity is delivered as a full set.

You need to create context-based entities when a set of data may be delivered within the boundaries of a parent context. A context-based entity applies when:

At least 2 entities are delivered.
A context relationship exists between these 2 entities. One entity is the parent context of the other entity.
The parent context entity is delivered as a delta and the child entity is delivered as a full set.

The Management Model contains the PowerDesigner objects for the Unified Anchor Modelling (UAM). When a UAM object is created, a so-called PowerDesigner replica of the corresponding Management Model object is created. This means that certain properties such as metadata columns and column stereotypes are configured in the Management Model and cannot be changed. The replication settings specify which elements of an object can be changed after creating a replica from the template object. It is possible to override the replication settings of an UAM object and change a specific property.

The temporal atomic type describes the datatype of the temporal attributes|

The main purposes of the Central Facts Layer (CFL) is to store data historically. It can also integrate data from different sources. The Central Facts Layer is also known as Central Facts Persistency Layer (CFPL)

The central facts persistence implementation model is the representation of facts in an anchorized data model with the ability to integrate multiple logical models.

In the context of i-refactory, data transformation refers to operations involved in turning raw data readily useful and closer to the business requirements.

Integration patterns are used to integrate entities from different data models. If two or more entities from different data models share the same business key, you can use the Integration Pattern named Key Root. It is a good practice to capture integration patterns in a separate model, named Concept Integration Model.

An attribute is mandatory when its value can not be empty (NULL).

A Physical Data Model (PDM) represents how data will be implemented in a specific database.

{note} The i-refactory uses four PDMs: technical staging model, logical validation model, central facts model and generic access model. Each one of these models is implemented as an additional database, which is used to store data from external and internal data sources.

Reverse engineering is the process of reconstructing a physical and/or Entity Relationship (ER) model from an existing data source. The purpose of reverse engineering is to avoid manual work as much as possible.

Architecture layer

The core of the i-refactory architecture has four layers: TSTGIN, LVL, CFL and GDAL. There are also two auxiliary layers: UCLVL and EXT.

If an entity has one or more attributes that changes over time and you want to keep track of when a attribute is valid at a certain transaction time, then you have a special case of a regular dependent entity, called bi-temporal entity. The bi-temporal entity stores historical data with two timelines. The primary key of the bi-temporal entity is composed by the primary key of the parent entity and the valid start date attribute. The attribute that changes over the valid time is called a bi-temporal attribute.

A delivery agreement is a contract between a Tenant and a Logical Implementation Model or Generic Data Access model. An agreement has a duration. The delivery agreement set the architecture layer (interface) where the data should be ingested as well as the default settings to be applied to the deliveries.

A dependency mapping is a mapping between a helper (or BR helper) and a source entity used in the query of the helper. The helper and the source entity must belong to the same model.

Default dependency is set on entity level (source entity to helper entity)
To allow lineage on attribute level, via the Mapping editor, you could manually add the dependency on attribute level.

An Independent Entity is an entity that implements an Anchor for a Business Key that ‘stands alone’ e.g. that does not contain a reference to another Entity.

A Logical Data Model (LDM) matches the language, structure and quality of the business, regardless of the physical data implementation. The Logical Data Model reflects the business requirements.

A delivery may be considered as "untrusted" if deletes of data in the Logical Validation Layer have taken place and the processing of new deliveries cannot 100% rely (trust) on having enough statistics and data available to detect logical deletes, to determine the exact delta and to execute set based validations.

A Dependent Entity is an entity that implements an Anchor for a Business Key that ‘depends’ in its existence on another Entity. A Dependent Entity contains Business Key fields of which at least one is a foreign key (FK).

The transaction time in i-refactory is different from what is commonly understood by transaction time. Transaction time is usually seen as the moment when a fact was stored in the database. In the i-refactory, the transaction time is the time, as dictated by the source system, not by the i-refactory database.

The Attribute type links the attribute to one of the existing interfaces.

Computed columns are columns whose content is computed from values in other columns in the table.

Functional date A functional date or time is a point in time and is defined by a user. An example is an order date or date of birth.

The technical model (also known as Technical Staging In model: TSTGIN) is a representation of how exactly one delivery from a specific data source will be processed in the technical staging layer.

Generalization is the process of extracting shared characteristics from two or more classes (hyponyms), and combining them into a generalized superclass (hypernym). For example: an 'employee' and a 'customer' are both 'persons'.

The Mapping Editor provides a graphical interface for creating and viewing mappings between models. It provides a global view of all the mappings related to the entities of a given model, allowing you to quickly identify those which are mapped and not mapped.

When a certain fact can change over time and you need to capture when that fact is valid in the real world, you can add a valid start date and a valid end date to the entity.

A valid time tells us in which period a record is valid. While a functional date represents just one point in time, the valid time has a begin and an end date, for example:

For Order item 123, a Retail price of 10.00 was valid from 2019-01-01 to 2019-06-01.
For Order item 123, a Retail price of 12.00 was valid from 2019-06-01 to 2020-01-01.

Alternate key is an attribute or a group of attributes whose values uniquely identify every record in an entity, but which is not the primary key

Candidate key

A candidate key consists of one or more attributes and meets the following requirements:

Unique: The value of the key defines uniquely one instance of a concepts. There are no double values.
Non-volatile: (Almost) doesn't change.
Minimal: Contains only the elements needed.

There are two kinds of candidate keys:

primary key
alternative key

A Key Root Link is an integration concept that must be used when the exact same business concept or dependent business key occurs in different models. The Links for this dependent business key in the different UAM models are all subtypes of the Keyroot Link.

Normalization is the process of decomposing tables in a database in order to reduce data redundancy and improve data integrity.

A strongly typed model is a model in which each all attributes have a predefined data type, for example: integers, doubles, date.

Surrogate Key A surrogate key is a system generated unique identifier that does not have any contextual or business meaning.

Business Key

A business key is an unique identifier that has business meaning and exists in the real world outside of the database. It consists of a column or a set of columns that already exists in a table. A business key is also known as a natural key

A Key Root Hub is an integration concept that must be used when the exact same business concept or independent business key occurs in different models. The Hubs for this independent business key in the different UAM models are all subtypes of the Keyroot Hub.

A relationship shows how two entities are related to one another. For example, a customer can place an order, and a order can have a customer.

Every Attribute has an atomic type (data type) which is linked to the attribute type of that attribute.

The cardinality shows how many instances of an entity can take place in a relationship.

An enumeration consists of the list of values that a given attribute should adhere to.

{example} An order can have different statuses, such as shipped,packing,created anddone. Other statuses are not allowed.

Foreign Key

A foreign key is an attribute or a set of attributes that refers to the primary key of another entity. The original entity containing the primary key is called the 'parent' entity and the entity containing the foreign key is called the 'child' entity.

A natural key is an unique identifier that has business meaning and exists in the real world outside of the database. It consists of an column or a set of columns that already exists in a table. A natural key is also known as a business key

The primary key is an assigned key that consists of a minimal set of attributes to uniquely specify an instance of a record. The attribute or a combination of attributes should meet the following characteristics:

Unique: The attribute values of the key uniquely identify one instance of a concept. There are no duplicate instances.
Non-volatile: The key does not change.
Mandatory: All values are filled; there are no NULL values.

It is good practice to choose a primary key that also meet the following characteristic:

Safe: Doesn't contain private or sensitive information, such as a social security number.

Constraints are related to the other elements depending of the type of the constraint. Certain constraints are associated to attributes, entities, helper entities, unique keys or relationships between entities.

An attribute may be assigned to one or more entities (ex: acm_exists_ind) and an entity may have several attributes

Each layer may have one or more interfaces. The amount of interfaces depend on how many tenants and delivery agreements have been configured.

Namespace is what in the terminology of SQL Server is called database schema.|

A Delivery is a container that holds the specification of what is actually pushed to the i-refactory platform. This specification consists of a list of entities.

Key Root A Key Root is a central repository for Business Keys. A Key Root ensures that similar records out of different data sources are identified by both the same Business Key as the Surrogated Key.

Context

A Context is a temporal table with a transaction start and end date. The Context tracks all changes of the context attributes related to a business key in the transaction time. This means that every change of an attribute value in a source system leads to a new record in the Context. The old record is end dated with the load date and the new record is start dated with the load date.

Hyponym is a term that denotes a subcategory of a more general class. For example: 'cat' and 'dog' are a hyponyms of 'animal'.

A mapping establishes relationships between concepts of separate data models. It creates a link between entities and attributes from a source model to related entities and attributes in the target model. A source model should precede the target model in the i-refactory architecture.

oasi_bk is an abbreviation for One Attribute Set Interface (OASI) with business keys. A normal view in the generic data access layer (GDAL) consists of the surrogate key, foreign key and attributes. The oasi_bk-view in the GDAL is a view where the business key(s) are also shown.

A subtype is a subgroup of an entity. You can create a subtype if a group of instances share some attributes and relationships that only exist for that group. For example, entity Customer can have a subtype Company and a subtype Person. They share the common attribute customer number, and can have some attributes of their own. Such as birth date for a Person. The entity Customer is called a supertype.

A subtype:

inherits all attributes of the supertype
inherits all relationships of the supertype
usually has one or more own attributes
can have subtypes of its own

Anchor: Independent Entity

An Independent Entity is an entity that implements an Anchor for a Business Key that ‘stands alone’ e.g. that does not contain a reference to another Entity.

Anchor: Dependent Entity

A Dependent Entity is an entity that implements an Anchor for a Business Key that ‘depends’ in its existence on another Entity.

A domain will help you to identify the types of information in your model. It defines the set of values for which a column is valid. A domain can specify a data type, length, precision, mandatoriness, check parameters, and business rules. It can be applied to multiple columns, which makes it easier to standardize data characteristics for columns in different tables.

Each interface may have one or more entities and one entity belongs to only one interface. An entity belongs to an i-refactory data model.

A helper entity creates a derived entity and can be used when you need to transform, filter, or calculate data. The purpose of a helper differs per model:

Technical model: a helper is used to transform data.
Logical validation model: a helper is an implementation of a set constraint (Set Constraint Helper).
Central facts model: a helper is used for a set-based calculation of derived facts (Business Rule Helper).

HSTGIN is the abbreviation of Historical STaging IN. It is an older term to indicate the Logical Validation Model or Logical Validation Layer.

A schema is a set of database objects, such as tables, views, triggers, stored procedures, etc. In some databases a schema is called a namespace. A schema always belongs to one database. However, a database may have one or multiple schema's. A database administrator (DBA) can set different user permissions for each schema.

Each database represents tables internally as <schema_name>.<table_name>, for example tpc_h.customer. A schema helps to distinguish between tables belonging to different data sources. For example, two tables in two schema's can share the same name: tpc_h.customer and complaints.customer.

A Tenant is a delivering party for a dataset or datarecord as agreed in the Delivery Agreement.

TSTGIN is the abbreviation of Technical STaging IN. It is an older term to indicate the Technical Model or Technical Staging Layer.

An index organizes data in a way that improves the speed of data retrieval from a database. To maintain the index data structure, there is a cost of additional writes and storage space.

The acronym CRUD stands for create, read, update, and delete. These are the four basic functions of persistent storage.

OLAP is a acronym for Online Analytical Processing. OLAP is category of software tools which provide analysis of data for business decisions. It uses complex queries to analyze aggregated historical data from OLTP systems.The primary objective is data analysis and not data processing.

OLTP is a acronym for Online transaction processing. OLTP captures, stores, and processes data from transactions in real time. Its primary objective is data processing and not data analysis.

A hub or independent entity is an entity that implements an Anchor for a business key that ‘stands alone’ e.g. that does not contain a reference to another entity. An independent entity contains business key fields, that show up as alternate key (AK), and the primary key (PK) is its surrogate key (ID).

A key is a combination of one or more attributes of an entity that uniquely defines one instance of that entity.

Index