In this section we explain what a technical model is and why you need it. You will also find an overview of the steps you need to follow to create a technical model.
The technical staging model specifies how a delivery of a specific data source will be processed in the Technical Staging Layer (TSL). The TSL creates a common starting point for further processing of the data and a technical staging model acts as a means to create a standard technical interface for a single external data source.
A technical staging model is composed of entities and attributes, which are created according to the structure of how data is delivered. In the current release, data must be delivered in a tabular format (rows and columns). Nested data and other complex data structures are not accepted. The file type is not important, i.e, it can be a text file or a spreadsheet, as long as the data is tabular.
In a technical staging model, you can specify how raw data should be transformed to better fit the business requirements. These transformations could also be executed in the logical validation layer, but doing this in the technical layer facilitates the creation of mappings between the logical validation model and the technical staging model.
It is important to understand that the customer is responsible for creating (and executing) the ETL process that loads the data into the TSL. The i-refactory has no knowledge of the actual source data (csv, XML, JSON, other database, etc.) or how this data is loaded into the TSL - it must only be informed of the status of this "external ETL process". The process must start by requesting a delivery to i-refactory, and as soon as the data load in the TSL is completed it must inform the i-refactory of the status of the data load.
{info} Before i-refactory starts to process a data delivery, the following steps should be taken:
- The source data (in a tabular format) is pushed by an external party to i-refactory. If necessary, external parties' data must first be transformed to a tabular format.
- Tables in the technical staging layer of that specific data supplier hold the latest delivery data. For each new delivery, the corresponding tables require to be emptied first as part of your loading actions.
- The loading process must be done by the customers themselves or with the help from an i-refact consultant. Interaction with i-refactory is supported through APIs.
As an external partner is responsible for loading data in the i-refactory, it is possible that the delivered data has poor quality. Poor data quality issues may impact on the quality of the data produced by i-refactory and may lead to errors during data processing. You can use filters or data transformations in the TSL to deal with data quality issues.
{example} When a certain fact can change over time and you need to capture when that fact is valid in the real world, you can add a valid start date and a valid end date to the entity. This allows the creation of a valid timeline. The valid dates should be provided in an ISO 8601 date format (YYYY-MM-DD) and preferably in a
date
ordatetime2(7)
datatype format. And the valid timeline should be provided in a Closed Open interval, otherwise this will lead to an error in the Central Facts Layer due to overlapping timelines.If date values are provided without following these instructions then you need to filter or fix the data in the TSL to avoid errors during the further stages of data processing.
There are few steps that you can follow to create a technical staging model. It is important to keep im mind that the structure (entities) of a technical staging model depends on the structure (tables) of how data will be delivered by the external party.
After you've created and checked the technical staging model:
The main purpose of the Logical Validation Layer (LVL) is to transform the data received from external data sources to fit into the logical data model structure. It is also responsible for validating deliveries. The Logical Validation Layer is also known as the Historical Staging In (HSTGIN) Layer.
A schema is a set of database objects, such as tables, views, triggers, stored procedures, etc. In some databases a schema is called a namespace
. A schema always belongs to one database. However, a database may have one or multiple schema's. A database administrator (DBA) can set different user permissions for each schema.
Each database represents tables internally as <schema_name>.<table_name>
, for example tpc_h.customer
. A schema helps to distinguish between tables belonging to different data sources. For example, two tables in two schema's can share the same name: tpc_h.customer
and complaints.customer
.
A Physical Data Model (PDM) represents how data will be implemented in a specific database.
{note} The i-refactory uses four PDMs: technical staging model, logical validation model, central facts model and generic access model. Each one of these models is implemented as an additional database, which is used to store data from external and internal data sources.