What's New in Release 3.0.0

2023-07-19

With the release of i-refactory 3.0.0 we've made a variety of stability, performance, functional and usability improvements.

New features summary

Feature Summary
Physical deletes On a logical model you now have the option to either logically or physically delete "removed" records. On a generic data access interface you have the option to "allow" for physical deletes.
Cleanup violated constraint log You can now delete the log of violated records and violated constraints for a delivery.
Trusted/untrusted setting A logical validation interface or entity can be marked as trusted or untrusted which will enforce the isComplete settings of a delivery.
Task runs We've introduced a new concept in our metadata model: TaskRun.

Physical deletes

When creating a delivery on a logical validation model we now support the option to treat "removed records" as either logical deletes or physical deletes. For deliveries on a generic data access model we now support the option to allow for physical deletes.

{warning} Physical deletes are executed in a single transaction and may have impact on system resources.

Instead of the default behaviour with logical deletes where we never physically remove data with the physical delete option enabled we do physcially remove data from the fact store. We do, however, only delete the contextual data. Business keys, registered in the anchors of the fact store will not be physically deleted but marked as such. With this feature you now have the ability to delete sensitive data if required to do so. We also still have a full audit trail of this feature which give you the ability to prove that a physical delete actual took place on a given point in time, for a given business key for a specific delivery of a third party.

{info} A delivery with physical delete enabled must be completed and persisted in the fact store before you can create a new delivery on a logical data model. This in contrast to a logical delete scenario where a new delivery can be created as soon as the data in the logical validation layer is accepted.

Cleanup violated constraint log

For a completed delivery you can now request to physically delete all registered logging in violatedRecord and violatedConstraint. The log of violations counts per constraint will not be removed.

You need to have the role of "DataManager" to be able to clean the log. The property delivery.logCleanupInfo will register the status of the cleanup process.

Trusted and Untrusted setting

In a logical validation model we store the difference between subsequent deliveries. Our logical validation layer acts as an intermediate change log buffer allowing for parallel loading of deliveries. This buffer however also enables us to optimize the data load to the central fact store. Despite the fact that delivered entities are delivered as a complete set we do have the ability to process the delivered set of records as a delta load to the central fact store thereby significantly improving the load performance to the central facts layer.

We are also capable of reasoning if the data stored in a logical validation entity can still be trusted. If we do not trust the data anymore we will simply apply the isComplete setting of the delivered entity when loading to the central fact store.

However, in some situations, and more specifically in case of deliveries with context based enddating we are sometimes not capable anymore of correctly determining the trustfullness of the registered data of a logical entity in the logical validation model. This is only true if someone deleted or modified records.

So if someone (a DBA) decides to remove data from a logical validation entity and it is known that due to this action we cannot trust promoting a full delivery on a logical entity to a delta delivery when loading to the central facts layer we introduced the capability to mark an interface or entity in a logical validation model as untrusted. If the computed conclusion on entity level is false we will never promote a full load to a delta load but will honour the full load setting at the cost of decreased performance.

Task runs

We've introduced a new concept in our metadata model: TaskRun. In our previous release we weren't capable of managing state other than an update of a target entity. With our new approach we now have the ability of executing many kinds of tasks and keep track of their state, composition and dependencies. An example of a specific non target entity related task we needed to execute in our previous release is the cleanup of intermediate storage used during the processing of a delivery. If this cleanup failed for some reason we needed to stop the iRefactory server because we couldn't register the failure state of these tasks.

Application Changes

Performance enhancements

  1. Retrieving the delivery statistics in the web app was slow with the increasing number of completed deliveries. We have added an additional index and tuned the query statement. Retrieving delivery statistics should now return results within acceptable boundaries (less than a few seconds).

  2. The performance of validating the basic constraints: primary key, datatype, mandatory, enumeration, attribute value, entity record has been improved. We typically see an improved of factor 2 to 4.

  3. We now store the intermediate results of context based filters more often. This reduces the complexity of long join paths when context based enddating should be applied but at the cost of more intermediate storage (the results are not stored in temp space). We do however expect an overall improvement of the throughput as a result of this measure.

  4. We decided in a number of cases to first calculate a difference set, store this difference set as an intermediate result and use it in next steps when processing the entity updates for a delivery. It is hard to predict in which cases this will result in higher throughput and less resource usage. With this measure we do however expect a more predictable throughput due to reduced complexity of the execution plans the database engine needed to evaluate and generate.

  5. We've improved the function of converting a database query result to a JSON document in our Rest API.

  6. The import of metadata is significantly faster, especially for completely new data models.

  7. From this release, we only store the derived constraint metadata for a logical validation model (attribute datatype, attribute mandatory, ...). This reduces the amount of unnecessary metadata in the database.
  8. The web application monitor and delivery statistics page retrieves and renders data much more efficient.
  9. We changed the indexes on the tables for basic and set constraint validation. For the basic constraint validation tables the index is now a nonunique clustered index. For the set constraint validation tables the index is now a clustered unique index. This should reduce the sort overhead involved in joining.
  10. We from now on use set based inserts in our application server regarding the maintenance of metadata. This reduces the amount of network roundtrips and results in higher throughput and reduces resource consumption.

OAuth2 server

  1. You now have the ability to create you own private/public key pair and configure the location of these key pairs when using our openID server.
  2. You now have the ability to specify an expiry time per client besides the default expiry time for all clients.
  3. The implementation is from now on based on OAuth2. In the next major release the OpenId Connect related implementation will be removed.

Bi-Temporality in a logical validation model

We have changed the behaviour of determining if a record for a bitemporal entity is a removed record or a changed record. In previous releases a bitemporal record in the logical validation model was considered removed if you did not deliver a record on the valid time start anymore (or when specifically delivered in a delta delivery). From now on we will mark records as appended when a delivered timeline overlaps with an existing timeline. This behaviour was already implemented on the central facts layer. Due to the fact that we now support physical deletes in which we need to recursively delete overlapping timelines we needed to align the behaviour of handling bitemporal context in a logical validation model with the behaviour of handling bitemporal context in a central facts layer.

Filtering rows

If a delivery is created where context based enddating applies we will filter rows delivered in a child entity which do not match with the parent context entity. For example: Given a datamodel with entities PART, SUPPLIER and PART_SUPPLIER where PART_SUPPLIER is a dependent entity on PART and SUPPLIER. And given a delivery with a delta load on PART and a full load on PART_SUPPLIER (partial delivery). And given the fact that the delta set for PART only contains a single record where PART.NBR === 1. And given the fact that the full set on PART_SUPPLIER contains a single record where PART.NBR === 2 && SUPPLIER.NBR === 1. We will filter the row in PART_SUPPLIER because it doesn't match with the delivered set on PART (no match between PART_SUPPLIER.NBR and PART.NBR).

This filter is required as it will or could result in loading errors on the fact store. And if a delivered record cannot be stored in the fact store we shouldn't store it in the logical validation model as well.

Reference lookup in central facts context entities

Despite the fact whether a reference column in a central facts entity was mapped to from a logical validation model we always executed a join lookup for each reference column.

From this release we will only generate a lookup to a parent entity if a reference column is mapped to.

Default value expressions in central facts models

If a default value expression is specified on an attribute of a central facts datamodel and no default value expression is specified on the attribute in a logical validation model which is the source of an attribute mapping we from now on do not apply the default value expression of the central facts datamodel attribute but we simply use the value as specified in the logical validation model.

Constraint violations on bitemporal data

If a constraint violation is detected on a record in a bitemporal table and this row should be skipped we will skip all other delivered bitemporal records belonging to the same business key.

The reason for doing this is that we cannot no longer trust the consistency of the delivered timelines if one of them fails.

The implicit skipped rows are not registered as constraint violations, neither in the constraint violation count, neither as a violatedRecord and neither as a violatedConstraint.

Creating an actual view for set based constraint checks

For set based constraint validation we currently append the delivered set of rows with the existing actual rows already stored in the logical validation layer. We append rows if:

  1. Rows are skipped because of a constraint violation with skipRow = true.
  2. The data provided is specified as a delta load for the specific logical entity.

In our previous release, for bitemporal context, we simply added rows with an equality lookup on the primary key.

From this release on now we will append rows for bitemporal data with an overlaps query.

Non surrogated key columns in a dependent entity

For dependent entities prior to this release it was mandatory to register the non-reference primary key columns always after the reference related primary key columns. From now you can register these primary key columns on the desired position in the anchor. The code generator will resolve the key lookup properly.

Deadlocks in SQLServer 2019 CRUD transactions

Testing our software on SQLServer 2019 resulted in deadlock issues during CRUD transactions. This is caused by SQLServer merge joins where more rows than strictly necessary are locked in different order than another concurrent request.

We've added 2 hints on these queries: inner loop join and force order. This fixed the deadlock issue.

Dependency management

From this release on we will check on inconsistencies between several application dependencies.

  1. If a meta model export file is passed as input to our import API we will check if the metadata export file is supported by the current running application version. For each application version we now keep track which metadata version is supported. If versions do not match we will raise an error.
  2. If the DDL version registered in the database does not match the required DDL version of the application server we will raise an error indicating the application server you would like to start is outdated.
  3. If for some reason registered metadata for a logical or generic data access model is not up to date with the supported metadata versions of the application server you cannot create new deliveries. Trying to do so will raise an error.

Web App Changes

Access to menu's and actions

Depending on the roles of the authenticated and authorised user menu items and actions will be enabled/disabled accordingly.

Delivery statistics

  1. Anyone with at least a DataViewer role has now access to the delivery statistics page. However, in case of constraint errors viewing the sample of records on which a constraint violated is restricted to those users with DataManager role. We consider viewing record information as a possibility to view sensitive data. For that reason only users with this role are allowed to view the details of a source record.
  2. On de the delivery statistics page you now have the ability to delete all registered violatedRecords and violatedConstraints with the "erase" button on the left hand side of each delivery. The count of constraints violations however is not deleted.
  3. Zooming and panning the graph is now more consistent.
  4. The graph with elapsed time and rows processed not takes into account the logarithmic scale and 0-values (which cannot be represented in a logarithmic scale).

Tabpages on delivery and interface

  1. We have changed the layout of creating deliveries. For a logical model and generic data access model we now have 2 separate tab pages.
  2. Settings on interface and entity are now accessible through the Interface menu. For each interface model you can now view a list of entities and set some specific properties per subtype.

Reference lookup

The look and feel of reference lookups is improved.

Rest API

GET Changed

  1. /acmDatadef/interface and /acmDatadef/deliverableInterface

    Removed properties:

    • ddlExportVersion
    • metadataExportVersion
    • dbmsImplementationVersion

    New properties:

    • metadataVersion: is automatically set in the Powerdesigner UAM modelling components and contains the semantic version of the metadata export interface.
    • modellerVersion: is automatically set in the Powerdesigner UAM modelling components and contains the semantic version of the Powerdesigner UAM Modeller release.
    • dependsOnRuntimeVersion: is automatically set in the Powerdesigner UAM modelling components and contains the semantic version of the i-refactory runtime engine dependency.
    • logicalValidationInterfaceTrusted.
  2. /acmDatadel/delivery and /acmDatadel/activeDelivery

    New properties:

    • logicalValidationDeliveryDeleteType
    • genericDataAccessDeliveryDeleteAllowed
    • logCleanupInfo
  3. /acmDatadef/baseEntity

    New properties:

    • logicalValidationEntityTrusted
    • logicalValidationEntityTrustedConclusion
  4. /acmDatalog/violatedRecord

    You need to have the DataManager role to execute this request.

PATCH Changed

  1. /acmDatadef/interface

    You can set the property logicalValidationInterfaceTrusted on a logical validation model.

  2. /acmDatacon/database

    Ability to change the value of description.

POST Changed

  1. /acmDatadel/delivery

    New properties:

    • logicalValidationDeliveryDeleteType: indicates if we should treat "removed records" logically or physically. Default value is "LOGICAL".
    • genericDataAccessDeliveryDeleteAllowed: indicates if it is allowed to physically delete records with the CRUD interface on the database. Default is False.
    • logCleanupInfo: Is the process of cleaning the log active or executed.

GET Added

  1. /acmDatalog/taskRun

    Get data for a taskRun. A taskRun is the execution of a task: a load to a technical staging table, a set of constraints validations, a load to a logical validation table, dropping temporary tables, ...

  2. /acmDatalog/taskRunDependency

    Get data for a taskRunDependency. A taskRunDependency registers the taskRuns predecessors and successors and as such represent a dependency graph.

POST Added

  1. /acmDatadel/logicalStagingDelivery(...)/cleanConstraintViolations

    Deletes all logging in violatedRecords and violatedConstraints for the given delivery.

PATCH Added

  1. /acmDatadef/baseEntity

    Ability to change the value of logicalValidationEntityTrusted.

  2. /acmDatalog/taskRun

    Set properties of a taskRun: the statusCode, rowAfter, messageNbr, messageText.

Deprecated Rest API's

The following Rest API calls are deprecated. We strongly encourage you to use the new endpoints instead.

Operation Deprecated endpoint
PATCH Deprecated:
/acmDatalog/entityUpdate/

Instead use:
/acmDatalog/taskRun
GET Deprecated:
/acmDatalog/activeLogicalStagingDeliveryEntityUpdateFromExternalSource

Instead use:
/acmDatalog/activeLogicalStagingDeliveryTaskRunExecutedByUser

NodeJS

The i-refactory runtime engine runs on NodeJS LTS version 10 and 12.

Config file changes

driverConnectionProperties configuration

As a result of upgrading to the latest version of the driver which is responsible for connecting to the SQL Server database a small change in the configuration is required with regard to the connection properties.

The configuration regarding driverConnectionProperties:

{
    "driverConnectionProperties": {
        "server": "",
        "userName": "",
        "password": ""
    }
}

Should be changed to:

{
    "driverConnectionProperties": {
        "server": "",
        "authentication" : {
            "userName": "",
            "password": ""
        }
    }
}

httpRestApi configuration

Our Rest API requires a valid OAuth2 token in each request. For security reasons this token should be encrypted. Our Rest API server needs to know the location of the public key and the signature algorithm used to encrypt the token. You could use your own OAuth2 compliant server to grant access tokens (for example Windows Active Directory). To reflect these changes we slightly changed our configuration for our Rest API.

The configuration regarding httpRestApi:

{
    "httpRestApi": {
        "enabled": true,
        "openIdPublicKey": "crypto/openId.pem",
        "https": {
            "port": 3000,
            "host": "localhost",
            "key": "crypto/ssl.key",
            "cert": "crypto/ssl.crt"
        }
    }
}

Should be changed to:

{
    "httpRestApi": {
        "enabled": true,
        "https": {
            "port": 3000,
            "host": "localhost",
            "key": "crypto/ssl.key",
            "cert": "crypto/ssl.crt"
        },
        "accessToken": {
            "publicKey": "crypto/key_public.pem",
            "signatureAlgorithm": "RS256"
        }
    }
}

{info} If you choose to use the i-refactory OAuth2 server the generated publicKey should be encrypted with the RS256 algorithm. Our authorization server always use the RS256 algorithm to encrypt tokens.

uiServer configuration

To eliminate confusion which kind of authorization server is used for our Web Application we've decided to change the property name opendId to authorizatioinServer. And we've removed the openIdResourceUri property (which is a reference to the URI for requesting an openId token) because we do not need an openId token but only an valid OAuth2 token.

The configuration regarding uiServer:

{
    "uiServer": {
        "enabled": true,
        "https": {
            "port": 3002,
            "host": "localhost",
            "key": "crypto/ssl.key",
            "cert": "crypto/ssl.crt"
        },
        "apiUrl": "https://localhost:3000",
        "openId": {
            "clientId": "i-refactory-ui",
            "authorizationEndPointUri": "https://localhost:3003/authorize",
            "tokenEndPointUri": "https://localhost:3003/token",
            "openIdResourceUri": "https://localhost:3003/openid"
        }
    }
}

Should be changed to:

{
    "uiServer": {
        "enabled": true,
        "https": {
            "port": 3002,
            "host": "localhost",
            "key": "crypto/ssl.key",
            "cert": "crypto/ssl.crt"
        },
        "apiUrl": "https://localhost:3000",
        "authorizationServer": {
            "clientId": "i-refactory-ui",
            "authorizationEndPointUri": "https://localhost:3003/authorize",
            "tokenEndPointUri": "https://localhost:3003/token"
        }
    }
}

openIdServer configuration

To eliminate confusion of the intent of the i-refactory authorization server we've decide to change the property name opendIdServer to authorizationServer.

The properties privateKey and publicKey no longer have a default value and should be specified explictly. This prevents security breaches where for example in our releases prior to 3.0.0 we created a private/public key pair and you accidently used this key pair without even knowing.

The last change is related to registering client application. A system client should be able to request an accessToken without an explicit login process. A system client should have the ability to provide it's username/password directly. In OAuth2 this grant type is called: clientCredentials. For system clients you should therefore add the clientCredentials value to the grants array.

The configuration regarding openIdServer:

{
    "openIdServer":
    {
        "enabled": true,
        "https":
        {
            "port": 3003,
            "host": "localhost",
            "key": "crypto/ssl.key",
            "cert": "crypto/ssl.crt"
        },
        "clients": [
            {
                "clientId": "i-refactory-ui",
                "clientSecret": null,
                "redirectUri": "https://localhost:3002",
                "grants": [ "authorization_code", "refresh_token" ]
            },
            {
                "clientId": "exampleNonWebClient",
                "clientSecret": "12345",
                "grants": [ "password", "refresh_token" ]
            }
        ],
        "users": [
            {
                "id": "info@i-refact.com",
                "username": "administrator",
                "password": "abcd123",
                "email": "info@i-refact.com",
                "roles": [ "DataViewer", "SystemManager", "DataManager", "DataOperator", "Developer" ]
            },
            {
                "id": "exampleUser@i-refact.com",
                "username": "exampleUser",
                "password": "12345",
                "email": "exampleUser@i-refact.com",
                "roles": [ "DataViewer", "SystemManager", "DataManager", "DataOperator", "Developer" ]
            }
        ]
    }
}

Should be changed to:

{
    "authorizationServer": {
        "enabled": true,
        "https": {
            "host": "localhost",
            "port": 3003,
            "key": "crypto/ssl.key",
            "cert": "crypto/ssl.crt"
        },
        "privateKey": "crypto/key_private.pem",
        "publicKey": "crypto/key_public.pem",
        "clients": [
            {
                "clientId": "i-refactory-ui",
                "clientSecret": null,
                "redirectUri": "https://localhost:3002",
                "grants": [
                    "authorization_code",
                    "refresh_token"
                ]
            },
            {
                "clientId": "exampleNonWebClient",
                "clientSecret": "12345",
                "grants": [
                    "password",
                    "client_credentials",
                    "refresh_token"
                ],
                "roles": [
                    "DataViewer",
                    "SystemManager",
                    "DataManager",
                    "DataOperator",
                    "Developer"
                ]
            }
        ],
        "users": [
            {
                "id": "info@i-refact.com",
                "username": "administrator",
                "password": "abcd123",
                "email": "info@i-refact.com",
                "roles": [
                    "DataViewer",
                    "SystemManager",
                    "DataManager",
                    "DataOperator",
                    "Developer"
                ]
            },
            {
                "id": "exampleUser@i-refact.com",
                "username": "exampleUser",
                "password": "12345",
                "email": "exampleUser@i-refact.com",
                "roles": [
                    "DataViewer",
                    "SystemManager",
                    "DataManager",
                    "DataOperator",
                    "Developer"
                ]
            }
        ]
    }
}

Upgrade

Upgrade Notes

Bug Fixes

Issue Summary
[IREFACTORY-1584] Cancelling a delivery while another one was active on the same logical validation model resulted in a server crash.
[IREFACTORY 1552] Deleted records are not removed from the cache in the NodeJS server.
[IREFACTORY-1479] Too many rows are added to the table for set based constraint validation from the logical validation model in case context based end dating should be applied.
[IREFACTORY-1022] Batch edit mode in the web UI should be disabled for a created delivery (you cannot edit an existing delivery).
[IREFACTORY-1427] Tedious request is not always released.
[IREFACTORY-1509] Filtering on threshold value in constraint settings in the web app doesn't work.
[IREFACTORY-1533] Loading delivery statistics is sometimes very slow.
[IREFACTORY-1540] Error when updating database description in web app.
[IREFACTORY-1705] The minimum number of connections to create in the connection pool was not honoured correctly.
[IREFACTORY-1732] Scheduler runs into already scheduled error in complex SIS datamodel.
[IREFACTORY-1645] Row count of delivered records in the delivery statistics not always correct.
[IREFACTORY-1552] Deleted metadata records in the NodeJS cache are not removed.
[IREFACTORY-1644] EntityId instead of EntityUpdateId was shown in monitor page of the web app.
[IREFACTORY-1748] Creating a new "database" record in the web app results in error.
219 Bi-temporal entity: the views for a bi-temporal entity could not be generated when the names of the attributes for the validity time line were not the same in the central facts layer and in the generic data access layer.

Constraint violation actions are applicable to certain constraint categories. Not all combinations of constraint categories and violation actions are allowed.

An attribute must have a value, whatever that value may be. It must not be NULL.

A data type of an attribute defines what value an attribute can hold. The data type specifies what type of mathematical, relational, or logical operations can be applied to it without causing an error.

An attribute datatype constraint is the most basic constraint type. It checks for the datatypes we support and have implemented.

For example, we check for string, length of string, integer, date, etc. In the following figure you can see the supported data types by PowerDesigner.

Image is omitted: Supported data types

Constraints can be violated and there are some actions that can be performed when a violation occurs. The possible actions are: EMPTY COLUMN, NO ACTION and SKIP ROW.

An attribute value constraint is an expression that is evaluated. The person who writes the expression is responsible for the correctness of it. The expression should be formulated in a positive way and lead to a Boolean answer. If the expression validates to True, than the value is correct.

Examples

  • The values in attribute X has to be bigger than 10: X > 10
  • The email address has to be in a certain pattern: email address LIKE '%_@_%.__%'

A Concept Integration Model is also a central facts model on which you place integration patterns. It is not required to create a concept integration model, but it can be very useful.

Every constraint is assigned to a constraint classification.

The main purposes of the Generic Data Access Layer (GDAL) are to provide logical perspectives for data consumption and to manage CRUD actions.

A generic data access model is a virtual data model that acts as an interface bridge between consumer applications and the central fact storage.

Every attribute is assigned to an attribute classification.

An entity record constraint checks whether an attribute meets the requirements set by another attribute belonging to the same entity.

The main purpose of the Logical Validation Layer (LVL) is to transform the data received from external data sources to fit into the logical data model structure. It is also responsible for validating deliveries. The Logical Validation Layer is also known as the Historical Staging In (HSTGIN) Layer.

The logical validation model is the representation of a single external data source in a logical format. It represent how data delivered by a specific tenant should be transformed, temporalized and validated in the {popup}logical validation layer. The logical validation model is also known as Historical Staging model (HSTGIN).

Multi-active attributes are attributes that contain a business key to provide multiple context records at the same time. For example: a customer has multiple types of phone numbers. “Home”, “Work” and “Mobile”. In that case we add a dependent entity on customer with key “Phone Nbr Type”. This is to prepare for the CFPL multi-active key on customer.

The main purpose of the Technical Staging Layer (TSL) is to create a common starting point for further data processing. It receives data delivered from external data sources and temporally stores them in a database. The input data should be in a tabular format (rows and columns).

Bi-temporal attribute is an attribute that changes over time: they follow a valid timeline. For example, a Part may have a price valid for December and a price valid for January.

Every entity is assigned to an entity classification and to a parent entity classification. The possible values for entity classification are: ALTERNATE KEY CONTEXT, ATTRIBUTE CONTEXT, GENERALIZATION,HELPER, REFERENCE CONTEXT, STABLE, STABLE DEPENDENT and STABLE INDEPENDENT

Entity Set Constraint An entity set constraint can be used to perform a check concerning values of two or more attributes that belong to different entities or to perform a check concerning the value of an attribute with respect to a set of values.

A Set Constraint Helper is a helper in the logical validation model. It is the implementation of a set constraint. The helper returns the records of an entity for a given set constraint, where the instances of this entity do not meet the definition of this set constraint.

The business requirements describe how data should be delivered for the data consumers (end users or applications) in terms of concepts, relationships between concepts and constraints to validate the data. These requirements can be described in a logical data model, for example.

A Business Rule Helper is a helper in the central facts model. It is a set-based calculation of derived facts. You need to use a Business Rule Helper if you want to make a calculation and want to keep a transaction history of the results of this calculation. You use the existing entities from the central facts model as input. The results of the helper must be materialized in 'regular' fact entities, such as Anchors and Contexts, to make them accessible in the Generic Data Access Layer.

Closed Open means that the timeline is valid from (vanaf in Dutch) the supplied valid start date until - but not including - (tot in Dutch) the supplied valid end date. In practice, this means that the start date of a valid time record is equal to the end date of the previous valid time record.

You need to create context-based entities when a set of data may be delivered within the boundaries of a parent context. A context-based entity applies when:

  • At least 2 entities are delivered.
  • A context relationship exists between these 2 entities. One entity is the parent context of the other entity.
  • The parent context entity is delivered as a delta and the child entity is delivered as a full set.

You need to create context-based entities when a set of data may be delivered within the boundaries of a parent context. A context-based entity applies when:

  • At least 2 entities are delivered.
  • A context relationship exists between these 2 entities. One entity is the parent context of the other entity.
  • The parent context entity is delivered as a delta and the child entity is delivered as a full set.

The Management Model contains the PowerDesigner objects for the Unified Anchor Modelling (UAM). When a UAM object is created, a so-called PowerDesigner replica of the corresponding Management Model object is created. This means that certain properties such as metadata columns and column stereotypes are configured in the Management Model and cannot be changed. The replication settings specify which elements of an object can be changed after creating a replica from the template object. It is possible to override the replication settings of an UAM object and change a specific property.

The temporal atomic type describes the datatype of the temporal attributes|

The main purposes of the Central Facts Layer (CFL) is to store data historically. It can also integrate data from different sources. The Central Facts Layer is also known as Central Facts Persistency Layer (CFPL)

The central facts persistence implementation model is the representation of facts in an anchorized data model with the ability to integrate multiple logical models.

In the context of i-refactory, data transformation refers to operations involved in turning raw data readily useful and closer to the business requirements.

Integration patterns are used to integrate entities from different data models. If two or more entities from different data models share the same business key, you can use the Integration Pattern named Key Root. It is a good practice to capture integration patterns in a separate model, named Concept Integration Model.

An attribute is mandatory when its value can not be empty (NULL).

A Physical Data Model (PDM) represents how data will be implemented in a specific database.

{note} The i-refactory uses four PDMs: technical staging model, logical validation model, central facts model and generic access model. Each one of these models is implemented as an additional database, which is used to store data from external and internal data sources.

Reverse engineering is the process of reconstructing a physical and/or Entity Relationship (ER) model from an existing data source. The purpose of reverse engineering is to avoid manual work as much as possible.

Architecture layer

The core of the i-refactory architecture has four layers: TSTGIN, LVL, CFL and GDAL. There are also two auxiliary layers: UCLVL and EXT.

If an entity has one or more attributes that changes over time and you want to keep track of when a attribute is valid at a certain transaction time, then you have a special case of a regular dependent entity, called bi-temporal entity. The bi-temporal entity stores historical data with two timelines. The primary key of the bi-temporal entity is composed by the primary key of the parent entity and the valid start date attribute. The attribute that changes over the valid time is called a bi-temporal attribute.

If an entity has one or more attributes that changes over time and you want to keep track of when a attribute is valid at a certain transaction time, then you have a special case of a regular dependent entity, called bi-temporal entity. The bi-temporal entity stores historical data with two timelines. The primary key of the bi-temporal entity is composed by the primary key of the parent entity and the valid start date attribute. The attribute that changes over the valid time is called a bi-temporal attribute.

A delivery agreement is a contract between a Tenant and a Logical Implementation Model or Generic Data Access model. An agreement has a duration. The delivery agreement set the architecture layer (interface) where the data should be ingested as well as the default settings to be applied to the deliveries.

A dependency mapping is a mapping between a helper (or BR helper) and a source entity used in the query of the helper. The helper and the source entity must belong to the same model.

  • Default dependency is set on entity level (source entity to helper entity)
  • To allow lineage on attribute level, via the Mapping editor, you could manually add the dependency on attribute level.

An Independent Entity is an entity that implements an Anchor for a Business Key that ‘stands alone’ e.g. that does not contain a reference to another Entity.

An Independent Entity is an entity that implements an Anchor for a Business Key that ‘stands alone’ e.g. that does not contain a reference to another Entity.

A Logical Data Model (LDM) matches the language, structure and quality of the business, regardless of the physical data implementation. The Logical Data Model reflects the business requirements.

A delivery may be considered as "untrusted" if deletes of data in the Logical Validation Layer have taken place and the processing of new deliveries cannot 100% rely (trust) on having enough statistics and data available to detect logical deletes, to determine the exact delta and to execute set based validations.

A delivery may be considered as "untrusted" if deletes of data in the Logical Validation Layer have taken place and the processing of new deliveries cannot 100% rely (trust) on having enough statistics and data available to detect logical deletes, to determine the exact delta and to execute set based validations.

A Dependent Entity is an entity that implements an Anchor for a Business Key that ‘depends’ in its existence on another Entity. A Dependent Entity contains Business Key fields of which at least one is a foreign key (FK).

A Dependent Entity is an entity that implements an Anchor for a Business Key that ‘depends’ in its existence on another Entity. A Dependent Entity contains Business Key fields of which at least one is a foreign key (FK).

The transaction time in i-refactory is different from what is commonly understood by transaction time. Transaction time is usually seen as the moment when a fact was stored in the database. In the i-refactory, the transaction time is the time, as dictated by the source system, not by the i-refactory database.

The Attribute type links the attribute to one of the existing interfaces.

Computed columns are columns whose content is computed from values in other columns in the table.

Functional date A functional date or time is a point in time and is defined by a user. An example is an order date or date of birth.

The technical model (also known as Technical Staging In model: TSTGIN) is a representation of how exactly one delivery from a specific data source will be processed in the technical staging layer.

Generalization is the process of extracting shared characteristics from two or more classes (hyponyms), and combining them into a generalized superclass (hypernym). For example: an 'employee' and a 'customer' are both 'persons'.

The Mapping Editor provides a graphical interface for creating and viewing mappings between models. It provides a global view of all the mappings related to the entities of a given model, allowing you to quickly identify those which are mapped and not mapped.

When a certain fact can change over time and you need to capture when that fact is valid in the real world, you can add a valid start date and a valid end date to the entity.

A valid time tells us in which period a record is valid. While a functional date represents just one point in time, the valid time has a begin and an end date, for example:

  • For Order item 123, a Retail price of 10.00 was valid from 2019-01-01 to 2019-06-01.
  • For Order item 123, a Retail price of 12.00 was valid from 2019-06-01 to 2020-01-01.

Alternate key is an attribute or a group of attributes whose values uniquely identify every record in an entity, but which is not the primary key

Candidate key

A candidate key consists of one or more attributes and meets the following requirements:

  • Unique: The value of the key defines uniquely one instance of a concepts. There are no double values.
  • Non-volatile: (Almost) doesn't change.
  • Minimal: Contains only the elements needed.

There are two kinds of candidate keys:

  • primary key
  • alternative key

Normalization is the process of decomposing tables in a database in order to reduce data redundancy and improve data integrity.

A strongly typed model is a model in which each all attributes have a predefined data type, for example: integers, doubles, date.

Surrogate Key A surrogate key is a system generated unique identifier that does not have any contextual or business meaning.

Business Key

A business key is an unique identifier that has business meaning and exists in the real world outside of the database. It consists of a column or a set of columns that already exists in a table. A business key is also known as a natural key

A Key Root Hub is an integration concept that must be used when the exact same business concept or independent business key occurs in different models. The Hubs for this independent business key in the different UAM models are all subtypes of the Keyroot Hub.

A relationship shows how two entities are related to one another. For example, a customer can place an order, and a order can have a customer.

Every Attribute has an atomic type (data type) which is linked to the attribute type of that attribute.

The cardinality shows how many instances of an entity can take place in a relationship.

The cardinality shows how many instances of an entity can take place in a relationship.

An enumeration consists of the list of values that a given attribute should adhere to.

{example} An order can have different statuses, such as shipped,packing,created anddone. Other statuses are not allowed.

Foreign Key

A foreign key is an attribute or a set of attributes that refers to the primary key of another entity. The original entity containing the primary key is called the 'parent' entity and the entity containing the foreign key is called the 'child' entity.

A natural key is an unique identifier that has business meaning and exists in the real world outside of the database. It consists of an column or a set of columns that already exists in a table. A natural key is also known as a business key

The primary key is an assigned key that consists of a minimal set of attributes to uniquely specify an instance of a record. The attribute or a combination of attributes should meet the following characteristics:

  • Unique: The attribute values of the key uniquely identify one instance of a concept. There are no duplicate instances.
  • Non-volatile: The key does not change.
  • Mandatory: All values are filled; there are no NULL values.

It is good practice to choose a primary key that also meet the following characteristic:

  • Safe: Doesn't contain private or sensitive information, such as a social security number.

Constraints are related to the other elements depending of the type of the constraint. Certain constraints are associated to attributes, entities, helper entities, unique keys or relationships between entities.

An attribute may be assigned to one or more entities (ex: acm_exists_ind) and an entity may have several attributes

Each layer may have one or more interfaces. The amount of interfaces depend on how many tenants and delivery agreements have been configured.

Namespace is what in the terminology of SQL Server is called database schema.|

A Delivery is a container that holds the specification of what is actually pushed to the i-refactory platform. This specification consists of a list of entities.

A Delivery is a container that holds the specification of what is actually pushed to the i-refactory platform. This specification consists of a list of entities.

Key Root A Key Root is a central repository for Business Keys. A Key Root ensures that similar records out of different data sources are identified by both the same Business Key as the Surrogated Key.

Context

A Context is a temporal table with a transaction start and end date. The Context tracks all changes of the context attributes related to a business key in the transaction time. This means that every change of an attribute value in a source system leads to a new record in the Context. The old record is end dated with the load date and the new record is start dated with the load date.

Hyponym is a term that denotes a subcategory of a more general class. For example: 'cat' and 'dog' are a hyponyms of 'animal'.

A mapping establishes relationships between concepts of separate data models. It creates a link between entities and attributes from a source model to related entities and attributes in the target model. A source model should precede the target model in the i-refactory architecture.

oasi_bk is an abbreviation for One Attribute Set Interface (OASI) with business keys. A normal view in the generic data access layer (GDAL) consists of the surrogate key, foreign key and attributes. The oasi_bk-view in the GDAL is a view where the business key(s) are also shown.

A subtype is a subgroup of an entity. You can create a subtype if a group of instances share some attributes and relationships that only exist for that group. For example, entity Customer can have a subtype Company and a subtype Person. They share the common attribute customer number, and can have some attributes of their own. Such as birth date for a Person. The entity Customer is called a supertype.

A subtype:

  • inherits all attributes of the supertype
  • inherits all relationships of the supertype
  • usually has one or more own attributes
  • can have subtypes of its own

Anchor: Independent Entity

An Independent Entity is an entity that implements an Anchor for a Business Key that ‘stands alone’ e.g. that does not contain a reference to another Entity.

Anchor: Dependent Entity

A Dependent Entity is an entity that implements an Anchor for a Business Key that ‘depends’ in its existence on another Entity.

A domain will help you to identify the types of information in your model. It defines the set of values for which a column is valid. A domain can specify a data type, length, precision, mandatoriness, check parameters, and business rules. It can be applied to multiple columns, which makes it easier to standardize data characteristics for columns in different tables.

Each interface may have one or more entities and one entity belongs to only one interface. An entity belongs to an i-refactory data model.

Each interface may have one or more entities and one entity belongs to only one interface. An entity belongs to an i-refactory data model.

A helper entity creates a derived entity and can be used when you need to transform, filter, or calculate data. The purpose of a helper differs per model:

  • Technical model: a helper is used to transform data.
  • Logical validation model: a helper is an implementation of a set constraint (Set Constraint Helper).
  • Central facts model: a helper is used for a set-based calculation of derived facts (Business Rule Helper).

HSTGIN is the abbreviation of Historical STaging IN. It is an older term to indicate the Logical Validation Model or Logical Validation Layer.

A schema is a set of database objects, such as tables, views, triggers, stored procedures, etc. In some databases a schema is called a namespace. A schema always belongs to one database. However, a database may have one or multiple schema's. A database administrator (DBA) can set different user permissions for each schema.

Each database represents tables internally as <schema_name>.<table_name>, for example tpc_h.customer. A schema helps to distinguish between tables belonging to different data sources. For example, two tables in two schema's can share the same name: tpc_h.customer and complaints.customer.

A Tenant is a delivering party for a dataset or datarecord as agreed in the Delivery Agreement.

TSTGIN is the abbreviation of Technical STaging IN. It is an older term to indicate the Technical Model or Technical Staging Layer.

An index organizes data in a way that improves the speed of data retrieval from a database. To maintain the index data structure, there is a cost of additional writes and storage space.

An index organizes data in a way that improves the speed of data retrieval from a database. To maintain the index data structure, there is a cost of additional writes and storage space.

The acronym CRUD stands for create, read, update, and delete. These are the four basic functions of persistent storage.

OLAP is a acronym for Online Analytical Processing. OLAP is category of software tools which provide analysis of data for business decisions. It uses complex queries to analyze aggregated historical data from OLTP systems.The primary objective is data analysis and not data processing.

OLTP is a acronym for Online transaction processing. OLTP captures, stores, and processes data from transactions in real time. Its primary objective is data processing and not data analysis.

A hub or independent entity is an entity that implements an Anchor for a business key that ‘stands alone’ e.g. that does not contain a reference to another entity. An independent entity contains business key fields, that show up as alternate key (AK), and the primary key (PK) is its surrogate key (ID).

A key is a combination of one or more attributes of an entity that uniquely defines one instance of that entity.