Follow the generic instructions for upgrading the metadata repository as described in Upgrading the i-refactory repository model in SQLServer.
This upgrade script will:
Create the following database columns:
acm_uam_datadel.s_delivery_properties.csv_loader_specification
Modify the following columns:
acm_uam_datalog.s_entity_update_status.message_text
to nvarchar(max)acm_uam_datalog.s_interface_update_status.message_text
to nvarchar(max)acm_uam_datalog.s_task_run_status.message_text
to nvarchar(max)Create the following index(es):
CREATE NONCLUSTERED INDEX [ix01] ON [acm_uam_datalog].[s_task_run_properties] ([l_delivery_id])
The upgrade is executed within a transaction. An error will result in a rollback. Contact i-Refact support in case of unexpected failures.
During the startup of the i-refactory server the following changes will be applied automatically:
irefactory_generated_ak_idx
if such an index does not exist.irefactory_generated_ak_idx
prior before upgradingCreating the additional indexes upon restart of the i-refactory engine may take quite some time. You could opt for creating these addtional indexes prior before restarting the i-refactory server.
The following SQL script generates a create index statement. You can apply filters for which model or namespace you want to generate these index statements.
Copy/paste the content of the column create_index_statement
to a new window and simply run the statements or wrap them in a stored procedure.
with context_entities as
(
select e.id
, i.code as interface_code
, i.database_code
, e.namespace_code
, e.code
, e.key_id_primary
from acm_datadef_oasi_bk.entity e
join acm_datadef_oasi_bk.interface i
on i.id = e.interface_id
join acm_datadef.[unique_key] k
on k.id = e.key_id_primary
where e.architecture_layer_code='CFPL'
and e.classification_code not in ('STABLE INDEPENDENT', 'STABLE DEPENDENT')
and exists
(
select 1
from acm_datadef.unique_key k
join acm_datadef.key_attribute ka
on ka.key_id = k.id
join acm_datadef_oasi_bk.attribute a
on a.id = ka.attribute_id
and a.classification_code='TRANSACTION TIMELINE START'
where k.id = e.key_id_primary
)
)
, index_columns as
(
select e.id
, string_agg( '[' + a.code + ']', ',') WITHIN GROUP (ORDER BY coalesce(ka.ordinal_nbr,999)) as index_columns
from context_entities e
join acm_datadef_oasi_bk.attribute a
on a.entity_id=e.id
left
join acm_datadef.key_attribute ka
on ka.attribute_id= a.id
and ka.key_id = e.key_id_primary
where a.classification_code != 'TRANSACTION TIMELINE START'
and (ka.id is not null or a.classification_code='TRANSACTION TIMELINE END')
group by e.id
)
select e.interface_code
, e.database_code
, e.namespace_code
, e.code as entity_code
, 'IF NOT EXISTS
(
SELECT *
FROM [' + e.database_code + '].[sys].[indexes]
WHERE name = ''irefactory_generated_ak_idx'' AND object_id = OBJECT_ID(''[' + e.database_code + '].[' + e.namespace_code + '].[' + e.code + ']'')
) AND EXISTS
(
SELECT 1 WHERE OBJECT_ID(''[' + e.database_code + '].[' + e.namespace_code + '].[' + e.code + ']'') IS NOT NULL
)
BEGIN
create unique nonclustered index irefactory_generated_ak_idx on [' + e.database_code + '].[' + e.namespace_code + '].[' + e.code + '](' + ic.index_columns +');
END;
' as create_index_statement
from context_entities e
join index_columns ic
on ic.id = e.id
where e.interface_code = 'TPC_H_CFPL'
;
This is an example for a single entity for which an additional index should be created if it does not yet exist:
IF NOT EXISTS
(
SELECT *
FROM [IR_CFPL].[sys].[indexes]
WHERE name = 'irefactory_generated_ak_idx' AND object_id = OBJECT_ID('[IR_CFPL].[tpc_h].[h_brand_properties]')
)
AND EXISTS
(
SELECT OBJECT_ID('[IR_CFPL].[tpc_h].[h_brand_properties]')
)
BEGIN
create unique nonclustered index irefactory_generated_ak_idx on [IR_CFPL].[tpc_h].[h_brand_properties]([id],[acm_end_dt]);
END;
When creating a delivery you now can specify to load CSV files automatically into the technical staging tables.
You can set values in the property csvLoaderSpecification
. When this property is set the i-refactory engine will automatically load supplied CSV files from the location specified in uncPath
into the appropriate technical staging tables for this specific delivery.
The property csvLoaderSpecification
has the following schema:
csvLoaderSpecificationschema = {
$id: '/acmDatadel/delivery/csvLoaderSpecification',
type: 'object',
properties: {
uncPath: { type: 'string' },
rowTerminator: { type: 'string', default: '0x0A' },
fieldTerminator: { type: 'string', default: ',' },
codePage: { type: 'string', default: '65001', nullable: false },
firstRowIsHeader: { type: 'boolean', default: true },
datafileType: { enum: ['char', 'widechar'], default: 'char' },
fileExtension: { type: 'string', default: 'csv' },
},
additionalProperties: false,
required: ['uncPath'],
}
For a detailed description of these properties please check the OpenAPI schema specification. If you set a value (the only required value is uncPath) the i-refactory engine will automatically load the data, otherwise you need to load the data into the appropriate tables with your own tooling.
The i-refactory engine will immedatiately start loading the relevant files in parallel when auto-load is applicable. If a required file cannot be read or if a file cannot be loaded with BULK INSERT the task will be set to FAILED (which you can monitor in the web app). The i-refactory engine does not move, remove or otherwise write to the specified uncPath.
You have the ability to REJECT the delivery while still in the loading phase. In this case already running technical load tasks will be completed but other technical load tasks will not be scheduled or processed. The delivery will be set in a REJECTING state and eventually to REJECTED.
This is an example of a delivery with csvLoaderSpecification properties set:
{
"architectureLayerCode": "HSTGIN",
"interfaceCode": "TPC_H_HSTGIN",
"tenantCode": "TPCH",
"receivedDt": "2018-01-01T00:00:01",
"sentDt": "2018-01-01T00:00:00",
"snapshotDt": "2018-01-01T00:00:00",
"isComplete": true,
"constraintsToValidate": "ALL",
"violatedConstraintMaxSampleSize": 50,
"logicalValidationDeliveryDeleteType": "LOGICAL",
"deliveredEntity": [],
"csvLoaderSpecification": {
"uncPath": "//Mac/home/i-refact-demo-tpch/deliveredData/orderSystem/delivery-report-load",
"codePage": "raw",
"fieldTerminator": "|",
"firstRowIsHeader": false,
"fileExtension": "tbl"
}
}
After succesfull acceptance of the delivery the monitor shows the generated load tasks to the technical staging tables. In the example monitor you can see some have them are succeeded, processing and scheduled. The number of parallel load tasks dependents on the concurrency settings.
You can look into the load task and check the generated SQLServer code and csvLoaderSpecification settings:
/**
* i-refactory version : 3.7.0
* Generated for entityUpdate(auditId) : 16
* SourceArchitetureLayer : EXT
* SourceInterface : TPC_H_TSTGIN
* SourceEntity : IR_TSTGIN.tpc_h.LINEITEM
* SourceEntityClassification : STABLE DEPENDENT
* TargetArchitetureLayer : TSTGIN
* TargetInterface : TPC_H_TSTGIN
* TargetEntity : IR_TSTGIN.tpc_h.LINEITEM
* TargetEntityClassification : STABLE DEPENDENT
* CsvLoaderSpecification.uncPath : /var/opt/home/i-refact-demo-tpch/deliveredData/orderSystem/delivery-report-load
* CsvLoaderSpecification.rowTerminator : 0x0A
* CsvLoaderSpecification.fieldTerminator : |
* CsvLoaderSpecification.codePage : raw
* CsvLoaderSpecification.firstRowIsHeader : false
* CsvLoaderSpecification.datafileType : char
* CsvLoaderSpecification.fileExtension : tbl
**/
TRUNCATE TABLE "IR_TSTGIN"."tpc_h"."LINEITEM";
BULK INSERT "IR_TSTGIN"."tpc_h"."LINEITEM"
FROM '/var/opt/home/i-refact-demo-tpch/deliveredData/orderSystem/delivery-report-load/LINEITEM.tbl'
WITH (TABLOCK, DATAFILETYPE='char', FORMAT='CSV', FIRSTROW=1, FIELDTERMINATOR='|', ROWTERMINATOR='0x0A', CODEPAGE='raw')
;
Making use of the autoload option has some restriction:
csvLoaderSpecification
on SQLServer version 2017 or 2016 a functional error is returned.You only need to specify uncPath. The other properties will be initialized with their default values:
If your CSV files are encoded in another codepage, have a different field or row separator you need to set the appropriate values when creating the delivery.
If you your files are UTF-16 encoded files they should have a byte order mark.
You also need to set datafileType
to widechar
and change the setting of rowTerminator
to 0x0A00
or another rowTerminator setting.
Although the SQLServer documentation specified it complies to the RFC 4180 standard there is an issue when UTF-16 files are provided. The last line of a file needs to have a rowTerminator (end of file only does not work). If the last line does not have a rowTerminator this last line will not be loaded.
Loading CSV files on SQLServer running on Linux has additional limitations:
raw
.The Tedious driver which we use to connect to SQLServer supports the following connection types:
If files needs to be accessed on for instance a fileshare there are limitations in how to access these files.
To allow for delegated access to another server a connection based on Kerberos is required. Unfortunately the Tedious driver does not support Kerberos based authentication. We are currently investigating the ability to connect through ODBC with Kerberos authentication support. For bulk loading data we will then use the ODBC based connection while still relying on the Tedious driver for the rest of the application.
The i-refactory engine stores data over time. For the GDAL current views (CRUD) and GDAL business key views which read the data in the fact store we need to apply a filter to only retrieve the latest version which is not logically removed.
The generated query in the view made use of a self join to determine which versioned record is the actual (or latest version). This query is quite expensive (due to the fact it needed a join).
In this release we changed the query such that it uses the value of acm_end_dt
to get the latest version.
Queries (especially when retrieving large sets) now on average consume 75% less CPU, the amount of logical reads is reduced with 90% and the elapsed time is reduced with 78%.
The read performance improvements do come at a small cost of additional storage and write throughput. An additional index is automatically created on each context table which tracks changes over time. This index is automatically created upon importing a fact based model in case this index does not yet exist.
Issue | Summary |
---|---|
datatype truncation | Applying where conditions on datetime columns in the Rest API resulted in truncation to scale 1 when a more precise scale value was supplied (ie milliseconds). |
flush transaction errors | In case SQLServer reported a primary key violation error it also reports the record that caused this violation. When this reported record contains multibyte characters the i-refactory engine was not capable of storing the error message resulting in a flush transaction error. The error message property in the database was of datatype varchar and is now changed to nvarchar where applicable. |
Issue | Summary |
---|---|
Auxiliary key in Generic Data Access Model | Attributes for independent auxiliary relations were added to the auxiliary key when a dependent auxiliary reference was added afterwards. This issue is now fixed. |
Name to code conversion | The name to code conversion was applied for key columns in a dependent anchor which were not a reference to another anchor. This results in unwanted overrules of the code convention of a column. |
Export issues | Export to a temporary directory did not work as expected regarding the timestamp added and used in scripts. |