Create a delivery

2023-10-17

In this section we describe how to create a delivery. A delivery is required to interact with the facts stored in the i-refactory.

Deliveries always point to one certain Layer. The following type of deliveries are supported:

  1. Delivery to the Logical Validation Layer (LVL), defining incoming batch deliveries from data-suppliers
  2. Delivery to the Generic Data Access Layer (GDAL), defining Create-Read-Update-Delete (CRUD) interaction
  3. Delivery from the File Export Layer (FILE_EXPORT), defining outgoing batch deliveries to data-consumers


There are two ways to create a delivery. In most cases the delivery is created with an API. For developmental purposes, it is also possible to create and process a delivery manually:

  1. Click here to jump to the API section.
  2. Click here (mostly for development purposes) to read more about manually creating a delivery.

Before you Begin

How to create a delivery

Deliveries trigger the activities to process incoming or outgoing data-exchange. In this section we explain how to configure each type of delivery.

  1. Click on the Menu > Supply > Delivery.

  2. Select the Layer for which you want to create a Delivery and then press New.

    Create new delivery
    Create new delivery

  3. In the form that appears you are required to assign a Delivery Agreement. The delivery will inherit the relevant metadata of the Delivery Agreement, Layer, Interface and Entities.

  4. Optionally you can enter you own reference number in Message Number or other reference details in Message Text. Dependent on the Layer for which you create Delivery, you can fill in further details. Next paragraphs explain how to configure each type of delivery in the form that appears.

  5. Don't forget to save the delivery once you completed the configuration settings for the delivery.

    Saved delivery
    Saved delivery

Configure settings for delivery to the Logical Validation Layer

Track and Trace

  • Choose the Received Date. Submit the date on which the delivery was actually received.

    {warning} The Received Date you enter must always be later than the Received Date of previous deliveries.

  • Optionally choose a Sent Date. Submit the date on which delivery was actually sent.

Configure the Validation process

  • Choose what level of Constraint Validation needs to be executed. If you want to cancel the selection, click on No selection.

    • None: executes minimum technical prerequisite. The checks are fully based on database processing and are not part of the validation process. If you select this option and there are data errors, this may result in a database error or an other technical error that breaks the processing of the delivery.
    • All: all constraints are checked during the validation proces.
    • Default: all delivery agreements constraints in the default group are checked during the validation process.
    • Mandatory: constraints that are inherent (mandatory) to the structure of a data model are checked during the validation process. The mandatory constraints to check are:

      • attribute datatype
      • attribute mandatory
      • primary key
      • alternate key

        {tip} A good practice is to chose the validation level Mandatory as a minimum option, since this controls minimum set of validations when loading data into the Central Facts Layer in a controlled manner.

  • Violated Constraint Max Sample Size: maximum sample size of violated constraints that are logged. By default, this is 1000. A lower sample size can improve the throughput of the validation process and lowers the storage cost in case of a high volume of constraint violations. Sample size value is per constraint. So for each constraint a maximum of sample size records will be registered, including the record on which the constraint violated.

  • For delivery's on the Logical Validation Layer you can override the inherited includeExistingDataForSkippedRowsInSetValidation value from Delivery Agreement. includeExistingDataForSkippedRowsInSetValidationConclusion shows the computed value for this Delivery.
    This setting prompts the user to decide whether to include rows that there rejected for basic constraints in previous deliveries by default the i-refactory tries to fetch the known data of these rejected rows from previous deliveries. This data is then used to perform set validations on rows in the current delivery that did not get rejected. The table below shows how includeExistingDataForSkippedRowsInSetValidationConclusion is computed for a Delivery.

    Delivery value Delivery Agreement value Delivery Conclusion
    DEFAULT YES YES
    DEFAULT NO NO
    YES NO YES
    NO YES NO
  • Context Related Entities Contain New Rows (for context-based deliveries): select this option if you are sure the context-related entities contain only new records. If this option is selected, i-refactory handles the delivery as a full delivery for every entity the context filter is set to True.

    {info} Selecting Context Related Entities Contain New Rows results in a quicker writing proces, because no difference is computed between a full and delta delivery set. The delivery won't delete any data, only save and update.

Configuration to process deletes

The i-Refactory has built in logic to identify deleted records (records that do no longer exist).

For entities that are provided as full, all records that already exist but are no longer available are considered 'relevant for deletion'. For entities delivered as delta, the deletion indicator provided in the record will indicate 'relevant for deletion'.

The delete type indicator affects how deletions are handled.

  • Set how Deletes in the dataset will be treated. Click here for more information about removing records and the differences between logical deletes and physical deletes.

    • Logical: the deleted records will be end-dated and flagged as deleted in the CFL. The logical deleted records are shown in the 'hist' views in the GDAL but not shown in the 'current view'. By default, this option is selected. More information can be found here.
    • Physical: the context information of the deleted records is physically removed from the Logical Validation Layer. In the Central Fact Layer, the Context data of the records will be physically deleted and the relevant key in the Anchor will be marked as physically removed. More information can be found here.

Delivered entities

In the 'Delivered entities' section, you can select which entities will be delivered and submit their snapshot date.

The following delivery loading strategies are available:

  • If you choose to include all entities, you opt for a Complete Delivery.
  • If you choose to exclude some entities, you opt for a Partial Delivery.
  • If you choose to mark an entity as complete you opt for a Full Delivery of the related Entity.

  • If you choose to mark an entity as not complete, you opt for a Delta Delivery of the related Entity.

  • If you opt for a Context Based Delivery, you need to classify a foreign key relation between entities in your data model as use as context filter and select the driving entity as Delta and the dependent entity as Full.

Delivered entities
Delivered entities

To customize the options:

  • By default, all Entities are included in the delivery. You can unselect the checkbox to exclude entities, for example, for partial deliveries.
  • By default, all Entities are marked as Complete. Unselect the checkbox to mark the entity as a delta delivery.
  • Optionally set a Snapshot datetime per entity. The Snapshot datetime will be used as transaction datetime when registering facts in the Central Facts Layer. If you do not enter a specific Snapshot datetime, the transaction time will set to the system datetime of the i-refactory server.

    {info} When to use Snapshot Datetime:

    If you want strict control over the transaction timelines that manage the validity window of the provided records, you have the option to override the default behaviour of using the server datetime as the timestamp that opens new records and enddate deleted or updated records.

    Use cases to do this is to align the target system managed by i-Refactory with the data validity timelines of the source system or even the source system entity. Another use could be to load historical datawarehouse data into the system managed by i-refactory and keep the already existing timelines available.

    In all cases the snapshot datetime for a delivered entity must be after the last provided snapshot datetime.


    Example

    Suppose there is already a delivery at Time 1 and a delivery at Time 3.You want to add a delivery at transaction Time 2. You cannot submit Time 2 as Received Date because Received Date needs to be later than Time 3 (this delivery is already in the database). However, you can enter Time 2 as Snapshot Datetime and the transaction datetime in the database will correspond to Time 2.

Configure settings for delivery for the Generic Data Access Layer (CRUD)

  • Set Physical Deletes allowed when you want to allow delete actions result in physical removal as part of the CRUD transaction.

After saving this delivery, the i-refactory engine will directly activate the required processes after which it will be possible to insert, update, and delete the facts for the entities in the given Generic Data Access model. The Generic Data Access delivery will be active until stopped.

{info} If a delivery to the Logical Validation Layer (LVL) for the same Central Facts Layer is running, the GDAL Delivery will be blocked and only released when the LVL delivery has reached an end-state.

Configure settings for delivery from the File Export Layer (FILE_EXPORT)

  • Enter the details of the request into Export Specification as a JSON. Use Display Schema to learn about all the options you have.

    This example shows you an example a JSON based Export Specification:

    {
        "snapshotDatetime": "2023-09-26 10:43:14.0430000",
        "localFileSystem": {
            "rootPath": "/Users/tester/tmp/"
        },
        "filterParameters": [
            {
                "columnCode": "dataProducerName",
                "columnValue": "RAB"
            },
            {
                "columnCode": "deliveryId",
                "columnValue": "1000"
            }
        ],
        "customMetadata": {
            "obligation": 1096531987,
            "modelCode": "TPCH",
            "modelVersion": "1.0.4",
        }
    }

    At least you provide:

    • snapshotDateTime
    • rootPath of the LocalFileSystem or of the adslFileSystem
    • in case of the adslFileSystem you also need to provide the accountName and containerName Optionally you can add:
    • filterParameters as an array of columnCode and columnValue combinations
    • customMetadata where you are free to enter all kind of reference data as a JSON

    {info} filterParameters are case sensitive. By providing only columnCodes (and not the combination of entityCode/columnCode) the system automatically uses matching columns from all entities within the File Export interface as a filter.

As part of the export, the rootPath is extended with the following path: /export-<export delivery id>/schema code>/<entity code>/. The exported files are named <entity code>.parquet. By default all files are exported as parquet ( more information can be found here ).

For each delivery, a manifest file will be placed in /export-<export delivery id>/ and is named exportDelivery.json. The manifest file includes the following:

  • deliveryId of the export

  • receivedate of the export-delivery API call

  • the used FILE EXPORT interface

  • the Export Specification as defined in the delivery request

  • a list of details for each entity exported:

    namespace_id namespace_code entity_id entity_code rowcount_method rowcount message_nbr message_text
    47 tpch 19 order Approximate 2000
    47 tpch 22 order_lines Approximate 8005
  • Messages are used to communicate warnings when attribute values are restricted in length. For example, when undefined varchar(max) datatypes are used.

Manual processing of a delivery to the Logical Validation Layer

For testing purposes you can process a delivery by manually switching the state transitions. This simulates API processing of external deliveries.

  1. Click on menu > Supply > Delivery.
  2. Select the delivery you want to manually process. If there is no delivery available, you need to create Create delivery.
  3. Expand the details of a saved Delivery by clicking on the 'arrow down' icon.
    Saved delivery
    Saved delivery
  4. Select the tab Technical Staging Entity Updates.
    Technical Staging Entity Updates
    Technical Staging Entity Updates
  5. Select Toggle Batch Edit.
    Toggle Batch Edit
    Toggle Batch Edit
  6. Select all Entities by marking the checkbox in the top layer.
  7. Select a status. You can choose between:
    • IDLE: default status. i-refactory engine is idle.
    • PROCESSING: simulates loading data in the Technical Staging In Layer (TSL).
    • SUCCEEDED: simulates loading data successfully in the Technical Staging In Layer (TSL). You need to set the status to PROCESSING first, before you can switch to SUCCEEDED.
    • FAILED: simulates failure in loading data in the Technical Staging In Layer (TSL). You need to set the status to PROCESSING first, before you can switch to FAILED.
  8. Select the status PROCESSING and press Apply to all selected rows.
    Apply to all selected rows
    Apply to all selected rows
  9. Press Save all details to save the changes or Cancel all edits if you want to revert.
  10. To simulate a succeeded delivery:
    • Select the status SUCCEEDED and press Apply to all selected rows.
    • Press Save all details to save the changes or Cancel all edits if you want to revert.
      Save all details
      Save all details
    • If you go to menu > Supply > Monitor, you can see the delivery process in the monitor.
    • If you go to menu > Supply > Delivery Statistics, you can analyze the delivery statistics.
  11. To simulate a failed delivery:
    • Select the status FAILED and press Apply to all selected rows.
    • Press Save all details to save the changes or Cancel all edits if you want to revert.
      Failed delivery
      Failed delivery
    • If you go to menu > Supply > Monitor, you can see the failed delivery in the monitor.
      Failed delivery
      Failed delivery
    • If you want to solve the failed delivery, you can go choose between three options in the Monitor.
      • Retry: you can retry processing the delivery. If the status is FAILED in the TSL, you can start processing again (Go to Step 8).
      • Reject: the end-state is met directly and the delivery is no longer shown as a active delivery.
      • Cancel: all idle and scheduled processes will be cancelled. Processes in the processing stage will be continued and finished.

{warning} Cancelling Deliveries could lead to an unwanted intermediate state of your data if the cancel happens after the Technical Staging Layer (TSL). A cancel is not a rollback to a previous state. If you cancel the current Delivery, the processed and successful entity updates in these delivery are stored in the Central Facts Layer, while the idle and scheduled tasks are stopped.


{info} i-refactory assumes that you have (manually) filled the associated tables in the TSL, as this is not part of the i-refactory application.

Use of APIs to manage deliveries

You can find the API documentation on: <hostname>:3001. For example: https://localhost:3001/

acmDatadel.delivery
acmDatadel.delivery

Example json-configuration for creating a delivery to the Logical Validation Layer where only orders and orderlines are expected to be processed using the CSV-loader option:

{
    "architectureLayerCode": "HSTGIN",
    "interfaceCode": "TPC_H_HSTGIN",
    "tenantCode": "TPCH",
    "receivedDt": "2022-01-02T00:00:01",
    "sentDt": "2022-01-01T12:00:00",
    "snapshotDt": "2022-01-01T00:00:00",
    "isComplete": true,
    "constraintsToValidate": "ALL",
    "violatedConstraintMaxSampleSize": 50,
    "logicalValidationDeliveryDeleteType": "LOGICAL",
    "deliveredEntity": [
        {
            "namespaceCode": "tpc_h",
            "entityCode": "order",
            "isComplete": false
        },
        {
            "namespaceCode": "tpc_h",
            "entityCode": "order_line_item",
            "isComplete": true
        }
    ],
    "csvLoaderSpecification": {
        "uncPath": "//Mac/home/i-refact-demo-tpch/deliveredData/orderSystem/delivery-report-load",
        "codePage": "raw",
        "fieldTerminator": "|",
        "firstRowIsHeader": false,
        "fileExtension": "tbl"
    }
}

A delivery from the File Export Layer can be called using this example.

{
    "architectureLayerCode": "FILE_EXPORT",
    "interfaceCode": "TPC-H GDAL",
    "tenantCode": "TPCH",
    "exportSpecification":
    {
        "snapshotDatetime": "2023-09-26 10:43:14.0430000",
        "localFileSystem": {
        "rootPath": "/Users/tester/tmp/"
        },
        "filterParameters": [
            {
                "columnCode": "dataProducerName",
                "columnValue": "RABO"
            },
            {
                "columnCode": "deliveryId",
                "columnValue": "1000"
            }
        ],
        "customMetadata": {
            "obligation": 1096531987,
            "modelCode": "TPCH",
            "modelVersion": "1.0.4"
        }
    }
}

Constraint violation actions are applicable to certain constraint categories. Not all combinations of constraint categories and violation actions are allowed.

An attribute must have a value, whatever that value may be. It must not be NULL.

A data type of an attribute defines what value an attribute can hold. The data type specifies what type of mathematical, relational, or logical operations can be applied to it without causing an error.

An attribute datatype constraint is the most basic constraint type. It checks for the datatypes we support and have implemented.

For example, we check for string, length of string, integer, date, etc. In the following figure you can see the supported data types by PowerDesigner.

Image is omitted: Supported data types

Constraints can be violated and there are some actions that can be performed when a violation occurs. The possible actions are: EMPTY COLUMN, NO ACTION and SKIP ROW.

An attribute value constraint is an expression that is evaluated. The person who writes the expression is responsible for the correctness of it. The expression should be formulated in a positive way and lead to a Boolean answer. If the expression validates to True, than the value is correct.

Examples

  • The values in attribute X has to be bigger than 10: X > 10
  • The email address has to be in a certain pattern: email address LIKE '%_@_%.__%'

A Concept Integration Model is also a central facts model on which you place integration patterns. It is not required to create a concept integration model, but it can be very useful.

Every constraint is assigned to a constraint classification.

The main purposes of the Generic Data Access Layer (GDAL) are to provide logical perspectives for data consumption and to manage CRUD actions.

A generic data access model is a virtual data model that acts as an interface bridge between consumer applications and the central fact storage.

Every attribute is assigned to an attribute classification.

An entity record constraint checks whether an attribute meets the requirements set by another attribute belonging to the same entity.

The main purpose of the Logical Validation Layer (LVL) is to transform the data received from external data sources to fit into the logical data model structure. It is also responsible for validating deliveries. The Logical Validation Layer is also known as the Historical Staging In (HSTGIN) Layer.

The logical validation model is the representation of a single external data source in a logical format. It represent how data delivered by a specific tenant should be transformed, temporalized and validated in the {popup}logical validation layer. The logical validation model is also known as Historical Staging model (HSTGIN).

Multi-active attributes are attributes that contain a business key to provide multiple context records at the same time. For example: a customer has multiple types of phone numbers. “Home”, “Work” and “Mobile”. In that case we add a dependent entity on customer with key “Phone Nbr Type”. This is to prepare for the CFPL multi-active key on customer.

The main purpose of the Technical Staging Layer (TSL) is to create a common starting point for further data processing. It receives data delivered from external data sources and temporally stores them in a database. The input data should be in a tabular format (rows and columns).

Bi-temporal attribute is an attribute that changes over time: they follow a valid timeline. For example, a Part may have a price valid for December and a price valid for January.

Every entity is assigned to an entity classification and to a parent entity classification. The possible values for entity classification are: ALTERNATE KEY CONTEXT, ATTRIBUTE CONTEXT, GENERALIZATION,HELPER, REFERENCE CONTEXT, STABLE, STABLE DEPENDENT and STABLE INDEPENDENT

Entity Set Constraint An entity set constraint can be used to perform a check concerning values of two or more attributes that belong to different entities or to perform a check concerning the value of an attribute with respect to a set of values.

A Set Constraint Helper is a helper in the logical validation model. It is the implementation of a set constraint. The helper returns the records of an entity for a given set constraint, where the instances of this entity do not meet the definition of this set constraint.

The business requirements describe how data should be delivered for the data consumers (end users or applications) in terms of concepts, relationships between concepts and constraints to validate the data. These requirements can be described in a logical data model, for example.

A Business Rule Helper is a helper in the central facts model. It is a set-based calculation of derived facts. You need to use a Business Rule Helper if you want to make a calculation and want to keep a transaction history of the results of this calculation. You use the existing entities from the central facts model as input. The results of the helper must be materialized in 'regular' fact entities, such as Anchors and Contexts, to make them accessible in the Generic Data Access Layer.

Closed Open means that the timeline is valid from (vanaf in Dutch) the supplied valid start date until - but not including - (tot in Dutch) the supplied valid end date. In practice, this means that the start date of a valid time record is equal to the end date of the previous valid time record.

You need to create context-based entities when a set of data may be delivered within the boundaries of a parent context. A context-based entity applies when:

  • At least 2 entities are delivered.
  • A context relationship exists between these 2 entities. One entity is the parent context of the other entity.
  • The parent context entity is delivered as a delta and the child entity is delivered as a full set.

You need to create context-based entities when a set of data may be delivered within the boundaries of a parent context. A context-based entity applies when:

  • At least 2 entities are delivered.
  • A context relationship exists between these 2 entities. One entity is the parent context of the other entity.
  • The parent context entity is delivered as a delta and the child entity is delivered as a full set.

The Management Model contains the PowerDesigner objects for the Unified Anchor Modelling (UAM). When a UAM object is created, a so-called PowerDesigner replica of the corresponding Management Model object is created. This means that certain properties such as metadata columns and column stereotypes are configured in the Management Model and cannot be changed. The replication settings specify which elements of an object can be changed after creating a replica from the template object. It is possible to override the replication settings of an UAM object and change a specific property.

The temporal atomic type describes the datatype of the temporal attributes|

The main purposes of the Central Facts Layer (CFL) is to store data historically. It can also integrate data from different sources. The Central Facts Layer is also known as Central Facts Persistency Layer (CFPL)

The central facts persistence implementation model is the representation of facts in an anchorized data model with the ability to integrate multiple logical models.

In the context of i-refactory, data transformation refers to operations involved in turning raw data readily useful and closer to the business requirements.

Integration patterns are used to integrate entities from different data models. If two or more entities from different data models share the same business key, you can use the Integration Pattern named Key Root. It is a good practice to capture integration patterns in a separate model, named Concept Integration Model.

An attribute is mandatory when its value can not be empty (NULL).

A Physical Data Model (PDM) represents how data will be implemented in a specific database.

{note} The i-refactory uses four PDMs: technical staging model, logical validation model, central facts model and generic access model. Each one of these models is implemented as an additional database, which is used to store data from external and internal data sources.

Reverse engineering is the process of reconstructing a physical and/or Entity Relationship (ER) model from an existing data source. The purpose of reverse engineering is to avoid manual work as much as possible.

Architecture layer

The core of the i-refactory architecture has four layers: TSTGIN, LVL, CFL and GDAL. There are also two auxiliary layers: UCLVL and EXT.

If an entity has one or more attributes that changes over time and you want to keep track of when a attribute is valid at a certain transaction time, then you have a special case of a regular dependent entity, called bi-temporal entity. The bi-temporal entity stores historical data with two timelines. The primary key of the bi-temporal entity is composed by the primary key of the parent entity and the valid start date attribute. The attribute that changes over the valid time is called a bi-temporal attribute.

If an entity has one or more attributes that changes over time and you want to keep track of when a attribute is valid at a certain transaction time, then you have a special case of a regular dependent entity, called bi-temporal entity. The bi-temporal entity stores historical data with two timelines. The primary key of the bi-temporal entity is composed by the primary key of the parent entity and the valid start date attribute. The attribute that changes over the valid time is called a bi-temporal attribute.

A delivery agreement is a contract between a Tenant and a Logical Implementation Model or Generic Data Access model. An agreement has a duration. The delivery agreement set the architecture layer (interface) where the data should be ingested as well as the default settings to be applied to the deliveries.

A dependency mapping is a mapping between a helper (or BR helper) and a source entity used in the query of the helper. The helper and the source entity must belong to the same model.

  • Default dependency is set on entity level (source entity to helper entity)
  • To allow lineage on attribute level, via the Mapping editor, you could manually add the dependency on attribute level.

An Independent Entity is an entity that implements an Anchor for a Business Key that ‘stands alone’ e.g. that does not contain a reference to another Entity.

An Independent Entity is an entity that implements an Anchor for a Business Key that ‘stands alone’ e.g. that does not contain a reference to another Entity.

A Logical Data Model (LDM) matches the language, structure and quality of the business, regardless of the physical data implementation. The Logical Data Model reflects the business requirements.

A delivery may be considered as "untrusted" if deletes of data in the Logical Validation Layer have taken place and the processing of new deliveries cannot 100% rely (trust) on having enough statistics and data available to detect logical deletes, to determine the exact delta and to execute set based validations.

A delivery may be considered as "untrusted" if deletes of data in the Logical Validation Layer have taken place and the processing of new deliveries cannot 100% rely (trust) on having enough statistics and data available to detect logical deletes, to determine the exact delta and to execute set based validations.

A Dependent Entity is an entity that implements an Anchor for a Business Key that ‘depends’ in its existence on another Entity. A Dependent Entity contains Business Key fields of which at least one is a foreign key (FK).

A Dependent Entity is an entity that implements an Anchor for a Business Key that ‘depends’ in its existence on another Entity. A Dependent Entity contains Business Key fields of which at least one is a foreign key (FK).

The transaction time in i-refactory is different from what is commonly understood by transaction time. Transaction time is usually seen as the moment when a fact was stored in the database. In the i-refactory, the transaction time is the time, as dictated by the source system, not by the i-refactory database.

The Attribute type links the attribute to one of the existing interfaces.

Computed columns are columns whose content is computed from values in other columns in the table.

Functional date A functional date or time is a point in time and is defined by a user. An example is an order date or date of birth.

The technical model (also known as Technical Staging In model: TSTGIN) is a representation of how exactly one delivery from a specific data source will be processed in the technical staging layer.

Generalization is the process of extracting shared characteristics from two or more classes (hyponyms), and combining them into a generalized superclass (hypernym). For example: an 'employee' and a 'customer' are both 'persons'.

The Mapping Editor provides a graphical interface for creating and viewing mappings between models. It provides a global view of all the mappings related to the entities of a given model, allowing you to quickly identify those which are mapped and not mapped.

When a certain fact can change over time and you need to capture when that fact is valid in the real world, you can add a valid start date and a valid end date to the entity.

A valid time tells us in which period a record is valid. While a functional date represents just one point in time, the valid time has a begin and an end date, for example:

  • For Order item 123, a Retail price of 10.00 was valid from 2019-01-01 to 2019-06-01.
  • For Order item 123, a Retail price of 12.00 was valid from 2019-06-01 to 2020-01-01.

Alternate key is an attribute or a group of attributes whose values uniquely identify every record in an entity, but which is not the primary key

Candidate key

A candidate key consists of one or more attributes and meets the following requirements:

  • Unique: The value of the key defines uniquely one instance of a concepts. There are no double values.
  • Non-volatile: (Almost) doesn't change.
  • Minimal: Contains only the elements needed.

There are two kinds of candidate keys:

  • primary key
  • alternative key

Normalization is the process of decomposing tables in a database in order to reduce data redundancy and improve data integrity.

A strongly typed model is a model in which each all attributes have a predefined data type, for example: integers, doubles, date.

Surrogate Key A surrogate key is a system generated unique identifier that does not have any contextual or business meaning.

Business Key

A business key is an unique identifier that has business meaning and exists in the real world outside of the database. It consists of a column or a set of columns that already exists in a table. A business key is also known as a natural key

A Key Root Hub is an integration concept that must be used when the exact same business concept or independent business key occurs in different models. The Hubs for this independent business key in the different UAM models are all subtypes of the Keyroot Hub.

A relationship shows how two entities are related to one another. For example, a customer can place an order, and a order can have a customer.

Every Attribute has an atomic type (data type) which is linked to the attribute type of that attribute.

The cardinality shows how many instances of an entity can take place in a relationship.

The cardinality shows how many instances of an entity can take place in a relationship.

An enumeration consists of the list of values that a given attribute should adhere to.

{example} An order can have different statuses, such as shipped,packing,created anddone. Other statuses are not allowed.

Foreign Key

A foreign key is an attribute or a set of attributes that refers to the primary key of another entity. The original entity containing the primary key is called the 'parent' entity and the entity containing the foreign key is called the 'child' entity.

A natural key is an unique identifier that has business meaning and exists in the real world outside of the database. It consists of an column or a set of columns that already exists in a table. A natural key is also known as a business key

The primary key is an assigned key that consists of a minimal set of attributes to uniquely specify an instance of a record. The attribute or a combination of attributes should meet the following characteristics:

  • Unique: The attribute values of the key uniquely identify one instance of a concept. There are no duplicate instances.
  • Non-volatile: The key does not change.
  • Mandatory: All values are filled; there are no NULL values.

It is good practice to choose a primary key that also meet the following characteristic:

  • Safe: Doesn't contain private or sensitive information, such as a social security number.

Constraints are related to the other elements depending of the type of the constraint. Certain constraints are associated to attributes, entities, helper entities, unique keys or relationships between entities.

An attribute may be assigned to one or more entities (ex: acm_exists_ind) and an entity may have several attributes

Each layer may have one or more interfaces. The amount of interfaces depend on how many tenants and delivery agreements have been configured.

Namespace is what in the terminology of SQL Server is called database schema.|

A Delivery is a container that holds the specification of what is actually pushed to the i-refactory platform. This specification consists of a list of entities.

A Delivery is a container that holds the specification of what is actually pushed to the i-refactory platform. This specification consists of a list of entities.

Key Root A Key Root is a central repository for Business Keys. A Key Root ensures that similar records out of different data sources are identified by both the same Business Key as the Surrogated Key.

Context

A Context is a temporal table with a transaction start and end date. The Context tracks all changes of the context attributes related to a business key in the transaction time. This means that every change of an attribute value in a source system leads to a new record in the Context. The old record is end dated with the load date and the new record is start dated with the load date.

Hyponym is a term that denotes a subcategory of a more general class. For example: 'cat' and 'dog' are a hyponyms of 'animal'.

A mapping establishes relationships between concepts of separate data models. It creates a link between entities and attributes from a source model to related entities and attributes in the target model. A source model should precede the target model in the i-refactory architecture.

oasi_bk is an abbreviation for One Attribute Set Interface (OASI) with business keys. A normal view in the generic data access layer (GDAL) consists of the surrogate key, foreign key and attributes. The oasi_bk-view in the GDAL is a view where the business key(s) are also shown.

A subtype is a subgroup of an entity. You can create a subtype if a group of instances share some attributes and relationships that only exist for that group. For example, entity Customer can have a subtype Company and a subtype Person. They share the common attribute customer number, and can have some attributes of their own. Such as birth date for a Person. The entity Customer is called a supertype.

A subtype:

  • inherits all attributes of the supertype
  • inherits all relationships of the supertype
  • usually has one or more own attributes
  • can have subtypes of its own

Anchor: Independent Entity

An Independent Entity is an entity that implements an Anchor for a Business Key that ‘stands alone’ e.g. that does not contain a reference to another Entity.

Anchor: Dependent Entity

A Dependent Entity is an entity that implements an Anchor for a Business Key that ‘depends’ in its existence on another Entity.

A domain will help you to identify the types of information in your model. It defines the set of values for which a column is valid. A domain can specify a data type, length, precision, mandatoriness, check parameters, and business rules. It can be applied to multiple columns, which makes it easier to standardize data characteristics for columns in different tables.

Each interface may have one or more entities and one entity belongs to only one interface. An entity belongs to an i-refactory data model.

Each interface may have one or more entities and one entity belongs to only one interface. An entity belongs to an i-refactory data model.

A helper entity creates a derived entity and can be used when you need to transform, filter, or calculate data. The purpose of a helper differs per model:

  • Technical model: a helper is used to transform data.
  • Logical validation model: a helper is an implementation of a set constraint (Set Constraint Helper).
  • Central facts model: a helper is used for a set-based calculation of derived facts (Business Rule Helper).

HSTGIN is the abbreviation of Historical STaging IN. It is an older term to indicate the Logical Validation Model or Logical Validation Layer.

A schema is a set of database objects, such as tables, views, triggers, stored procedures, etc. In some databases a schema is called a namespace. A schema always belongs to one database. However, a database may have one or multiple schema's. A database administrator (DBA) can set different user permissions for each schema.

Each database represents tables internally as <schema_name>.<table_name>, for example tpc_h.customer. A schema helps to distinguish between tables belonging to different data sources. For example, two tables in two schema's can share the same name: tpc_h.customer and complaints.customer.

A Tenant is a delivering party for a dataset or datarecord as agreed in the Delivery Agreement.

TSTGIN is the abbreviation of Technical STaging IN. It is an older term to indicate the Technical Model or Technical Staging Layer.

An index organizes data in a way that improves the speed of data retrieval from a database. To maintain the index data structure, there is a cost of additional writes and storage space.

An index organizes data in a way that improves the speed of data retrieval from a database. To maintain the index data structure, there is a cost of additional writes and storage space.

The acronym CRUD stands for create, read, update, and delete. These are the four basic functions of persistent storage.

OLAP is a acronym for Online Analytical Processing. OLAP is category of software tools which provide analysis of data for business decisions. It uses complex queries to analyze aggregated historical data from OLTP systems.The primary objective is data analysis and not data processing.

OLTP is a acronym for Online transaction processing. OLTP captures, stores, and processes data from transactions in real time. Its primary objective is data processing and not data analysis.

A hub or independent entity is an entity that implements an Anchor for a business key that ‘stands alone’ e.g. that does not contain a reference to another entity. An independent entity contains business key fields, that show up as alternate key (AK), and the primary key (PK) is its surrogate key (ID).

A key is a combination of one or more attributes of an entity that uniquely defines one instance of that entity.