In our Design Time process the Logical DataModel drives the set-up of most physical datamodels. Each physical model is a representation of the data that resides in one of the technical layers of our Runtime environment. Understanding of this linkage is a key element of our solution.
This layer is a separate database that represents the physical data delivery of the related tenant. The main purpose of this layer is to create a common starting point for further processing. It is expected this layer holds data in columns and rows.
This layer contains data from the last delivery and must be truncated at the start of each new delivery. The status of the delivery is managed through i-refactory API's. We therefore assume an external process orchestrates deliveries by means of our API's.
You can use this Layer to:
{info} This all to prepare the step from Technical Staging towards the Logical Validation Layer (LVL). Preparing in this layer avoid problems in LVL. In order to retain full lineage, all datafixes and enrichments must be captured in computed columns or derived entities.
The Logical Validation Layer (LVL) is the second layer in the chain and is initiated after the data has been fully transmitted to the Technical Staging In layer. The LVL layer is a separate database that services the following functions :
A delivery gets “accepted with finding” when there are constraint violations present in the data, without exceeding any constraint threshold. A constraint threshold determines the number of occasions in which a constraint can be violated before the delivery receives the “rejected” flag. Each constraint has a separate configurable threshold that describes the maximal allowed occurrences before a delivery is rejected, and an action that describes what to do when a violation occurs. In case the delivery is “accepted” or “accepted with findings” the data is prepared for handover to the next layer.
The actions that can be selected variates over the constraint types. For instance, a primary key constraint will not have the option to select ‘Empty column’ or ‘No action’, since it is for a database impossible to have an invalid or ‘NULL’ value as primary key. The possible actions are skip row, empty column or no action.
The data validation process determines the quality of the data by evaluating the constraints that are captured in the model. The constraints are grouped into a constraint type and are separated over two stages. At the end of each stage an intermediate evaluation is performed to determine whether any threshold exceeded.
The first stage covers the validation types that can be performed without requiring data from outside the delivered dataset itself. The validations that are performed during this stage are:
The output of the first phase of validations is initially stored in a temporary table. This temporary table is used to register the information of the witnessed violations in the ACME database and to determine whether the delivery should be rejected for exceeding a threshold. Only when the none of the threshold is exceeded the data is approved and transferred to the corresponding ‘UCLVL_BC’. These tables have an additional column ‘ACM_SKIP_ROW’ that indicates for which records the ‘skip row’ action is performed. The ‘no action’ and ‘empty column’ actions are column specific and the output of this action is already processed to its final state.
The second stage contains the constraints that requires additional data from outside the provided dataset. Missing data that is required to perform the validation is retrieved from the main dataset and appended to the validation dataset. For example, a delivery with data that references to a table that is only set once during the initially delivery can only be validated when this data is appended to the validation dataset. After all the missing datapoints are appended to the validation dataset the constraints are validated. The constraints that are validated during this phase are:
The result set of the second phase of validations is initially stored in a temporary table and used for violation registration and classification of whether the delivery should be rejected for exceeding a threshold. Only when none of the threshold is exceeded the data is transferred to the corresponding ‘UCLVL_SC’. These tables contain the final product of the two validation phases.
In case a delivery is accepted with or without findings the final step of updating the LVL dataset and transferring the data to the next layer (CFPL) is initiated. This process is performed by comparing the dataset stored in the UCLVL_SC tables with the current from the LVL, to construct a delta script that only contains the changes. This script is afterwards used to update the LVL itself and in placed in the queue for the next layer (CFPL). This allows the LVL and the CFPL to work concurrent and independent from each other. The LVL clears the data from the UCLVL_BC and UCLVL_SC and the whole process repeats for the next delivery.
{warning} Deliveries with key violations, data-type conversion violations or with validation issues above treshold will not be further processed and require mediation.
{editorial} KJD Link to missing file: ../data-logistics/metadata.md
For deliveries with active validation, validation results are stored in our metadata
.
This layer is a separate database that services the following functions:
The CFPL database is automatically populated by the I-Refactory load processor, that will generate the SQL code from the Logical Validation Layer to the Central Facts Layer from the metadata.
This is the access layer that makes the data as persisted in the CFL layer accessible for data consumption in various perspectives:
This layer is by default virtual and this layer also supports consuming data at the same time this layer is refreshed. This layer also manages CRUD updates to the CFL layer by means of generated CRUD-triggers within the current perspective database views.
This architecture layer groups together all tasks related to the file based exchange :