In this section we introduce our solution to manage data logistics and explain how it works. You will also find an overview of the main components of the i-refactory.
In few words, we can say that i-refactory
is a solution to manage data logistics based on metadata configurations. Instead of manually creating software and data orchestration flows you can completely focus on gathering the relevant metadata / configuration data. Generating software and dataflows is a core capability of i-refactory
.
The i-refactory solution design has a strong analogy with a physical distribution centre for physical goods where as a data purchaser you typically create a contract with one or more data suppliers. You have the ability to create detailed specifications of the technical format of the data supplied, the logical format and how to integrate data over different data suppliers. For an in depth explanation of these specifications see our Data Definition Metadata Guide.
As in the physical world of logistics where you have purchase orders you need to create a data delivery order each time a data supplier should supply raw data. In our metadata store this is a delivery
. A delivery
contains the actual specification of the data that will be delivered, like a package slip. For example only a subset of the agreed upon raw data is delivered (the rest will follow in another delivery the next day).
When the actual goods are delivered a typical process of inspecting, validating, unpacking, cataloging and storing the delivered goods in the warehouse is executed. In our approach we call this data logistics
. Similar to a physical logistics process, the i-refactory application automatically takes care of generating the required code, orchestrating the data flow (a directed acyclic graph of processing tasks) and executing them on the SQL Server database. The laborious and complex manual work of creating code, creating a proper execution flow and keeping track of state is completley automated resulting in an efficiënt and effective flow of handling newly received raw data.
When the processing of a delivery is successfully completely we consider the raw data as persisted. The raw data finds it's way in a semantically rich, quality assured immutable temporal data store. The fact values are stored together with a comprehensive set of attached metadata such that facts semantics are known as well as other relevant metadata context such as the owner, creator, applied constraint rules, applied transformations etcetera.
To conclude this introduction our metadata based approach leads to transparency in your data, with communicable and agreed upon specifications in an efficient and effective way.
A high level overview of the main components of i-refactory
is given in the figure below.
The i-refactory modeller is our solution to create data models. We have extended SAP Powerdesigner with additional functions to enable you to design a dataflow capable of transforming and validating raw data
to an integrated, high quality and semantically rich fact store
. You can find more details about this component in the Overview i-refactory Modeller
The i-refactory server is the core component from i-refactory. It is responsible for metadata maintenance, data logistics orchestration and code generation. You can find more details about this component in the Overview i-refactory Server.
Another way to connect to our i-refactory server is with our i-refactory web application. The web application is a browser application with which you can execute most of the Rest API functions of our i-refactory server in an intuitive way. You can find more details about this component in the Overview i-refactory Web App.
i-refactory uses SQL Server to store the data created, received and maintained by i-refactory. Four logical domains are used: Metadata repository, Raw Data, Facts and Access.
The i-refactory solution keeps track of quite some metadata, which is stored in SQL Server and can be easily accessible through database views and our Rest API. The metadata is versioned, which means you have the ability to see the metadata change over time.
Each time a delivery is created the supplier is responsible of providing the required raw data
. We ingest this raw data
into the raw data
domain from where we apply the required validations and transformations.
The data stored in the raw data
domain is volatile. It only exists for the purpose of transforming and validating.
When the raw data
complies with the specifications we store it in the immutable fact store
. This store is optimized for this specific purpose.
The fact store is a persistent store and should not be tampered with. It is controlled and maintained with the i-refactory server. The fact store is implemented according to the corresponding Central Facts Model and it is described using the Unified Anchor Model.
{info} We do have the capability of physically removing data from the
fact store
but this is beyond the scope of this introductory document.
We believe that data storage (i.e. the fact store
) should be seperated from data access. Therefore, we provide access to the facts in the fact store
via the access
logical domain.
In the SQL Server database, this is implemented with the use of views, stored procedures, and functions.
Data consumers will, therefore, be granted access to the facts for data consumption through this domain.
The primary role of the OAuth2 server is to authenticate and authorize access. To obtain a valid OAuth2 token you typically create a https request to an OAuth2 compliant service, which will return an encrypted OAuth2 compliant access token.
Our i-refactory Rest API server is secured with OAuth2 tokens. In each request to our i-refactory server a valid OAuth2 token should be provided. OAuth2 is an open standard.
This encrypted access token should be provided in each request to the i-refactory server. The i-refactory server will validate the token for each request and will check if the required roles for the request have been granted. If the token is valid the i-refactory server will execute the request.
It is expected that a compliant OAuth2 service is already available, for example through Microsoft Active Directory.
If you don't have an OAuth2 service we can provide you with a compliant lightweight OAuth2 server.
The core of i-refactory is formed by four different architecture layers in the data logistics process. Each layer is designed to solve one or more concerns and has an associated Physical Data Model (PDM).
Layer | Concern | Model |
---|---|---|
Technical Staging Layer (TSL) | Common starting point for data processing | Technical staging model (TSTGIN) |
Logical Validation Layer (LVL) | Adherence to the constraints of the logical data model Enable parallel processing |
Logical validation model (LVM / HSTGIN) |
Central Facts Layer (CFL) | Integration of different data sources Persists facts: store historical data |
Central facts model (CFM) |
Generic Data Access Layer (GDAL) | Provides logical perspectives for data consumption Manages CRUD updates |
Generic data access model (GDAL model) |
Click here for more information about the layers
Because of the sheer diversity in which raw data
can be delivered or should be extracted, we have chosen for a customer-driven solution approach. Some customers already adopt solutions for extracting data from their source systems and some need assistance. We will specifically look into what is required to be able to deliver the raw data to the i-refactory solution and are capable of assisting and implementing the proper solution.
Our fact store philosophy is based on "write once and read many times". A fact remains a fact despite the structure and/or format with which is should be consumed.
Out of the box we have created a number of virtual data interfaces in our access
store that enable you to easily consume the stored facts.
Examples of these interfaces are:
Current time view
: This is a virtual view built on top of all the registered facts and it consists of facts that are valid according to the current system time.History view
: This virtual view presents a complete timeline of all the changes registered for each logical entity (for example: an order, a customer, a product, ...).Point in time view
: With this virtual view you have the ability to show the facts for a given point in time. For example: what was the state of the facts for Order with number 1 two weeks ago?i-refactory also offers an interface to create, update, or delete facts in a transactional way. We call this the CRUD view
. Typically, this feature can be used to easily create and maintain reference data or the ability to correct facts, which were provided from an external supplier.
{info} Our
access
is virtual but facts could be materialized to other formats if this is required to improve the user experience of consuming facts.
However, data can be consumed in many formats, with many different tools each with their specific quirks.
For that reason, we follow the same approach as for raw data delivery. We check with our customers what is needed to optimally consume the registered facts and implement these solutions as such.
A Physical Data Model (PDM) represents how data will be implemented in a specific database.
{note} The i-refactory uses four PDMs: technical staging model, logical validation model, central facts model and generic access model. Each one of these models is implemented as an additional database, which is used to store data from external and internal data sources.