Discover BI: ETL

Showing posts with label ETL. Show all posts

Sunday, November 28, 2021

Guidewire Datahub Architecture Explained

Guidewire Datahub is a P&C enterprise data warehouse solution built on top of the three key Guidewire Insurance Suite systems namely Guidewire Policycenter, Billingcenter and Claimcenter. It is a ETL framework that has been built using SAP BODS ETL tool and a java framework named DIAL for SCD Type 1 and Type 2 implementation.

Focused on Property and Insurance clients, Datahub provides an operational datastore which provides a single version of truth and enabling clients to execute cross analytical queries across systems. The system also provides you capability to source data from external data systems. The layers of the architecture enables data quality and auditing to make sure that the data you have in target is clean and audited.

Guidewire Datahub Basic Architecture

The first layer in the architecture are the sources, which can be either one of the three sources from PC, BC or CC, it can even be an external source or a legacy system. In this layer the data from the sources are loaded into a ftp source and from the ftp source to the relational database where Guidewire Datahub resides.

The second layer is the Guidewire Datahub layer which is a relational dbms and can be in either Oracle Database or a SQL Server database. In this layer data is captured at an entity level in both Type 1 and Type 2. All the data cleansing, quality checks, balancing an auditing happens at this layer. This layer is loaded and managed using the SAP BODS or SAP Data Services tool.

The next layer is where you built out your star schema diagrams based on your reporting needs, this layer will have your facts and dimensions and will be used by IBM Congnos Analytics to cater to your reporting needs.

This forms the basic architecture of the Guidewire Datahub suite. Please do let me know if you would like me to cover more topics on the Guidewire Datahub suite.

Monday, July 27, 2020

Difference between AS-WAS and AS-IS reporting in a Data Warehouse

Every DWH/ETL developer should be well versed about this concept, I was a couple of days back explaining the same to a person new in our team. That's when it struck me, why not a post on the same, so here goes. I will try to explain the same, giving an example.

Imagine a customer making monthly payments to a bank loan each month from January 2020 till May 2020, where at the end of the month of March, he changed his Address, he moved to a new county/district. This information will be captured in Customer dimension in EDW as below,

So you have your first customer record, which expires on 31st March, and a new record,which is currently active from 1st April 2020 till high end date.

Now imagine we have a transaction table as below with the 5 payment transactions as discussed before from January 2020 till May 2020, so customer has made 3 payments at his first address and the rest 2 at his new address.

Let us discuss now the AS-IS and AS-WAS concepts with this dataset, most EDW/Reporting systems use "AS-WAS" concept for grouping, as per this, suppose you are grouping the payment amount based on DISTRICT column in "AS-WAS", the result set would look as below,

So as you can infer from the result set above, in AS-WAS grouping, the snapshot as of Transaction date gets reported, i.e when T1, T2 and T3 payments were made, the customer was at Trivandrum, and later when T4 and T5 payments were made, he moved to Cochin.

However in "AS-IS" reporting, all payments are reported against the current active record of the customer, hence the full amount will be reported against Cochin in this case, see result set below to understand,

In AS-IS reporting, the measures reported against dimension attributes as of current date.

Sunday, March 17, 2019

Datastage - ExecuteCommandActivty - ParamValue/LimitValue not Appropriate [Solved]

Being new to Datastage, I had myself pulling my hair out with this issue.

Scenario: We have a file with a date in linux server which would act as the Extract Date for one of our ETL sources.

So say we have this file, extractdate.txt in a specific folder in Linux Server and would contain a datetimestamp value like '01-JAN-2019 03:47:30 AM'.

We would then read from this flat file using the 'Execute Command' activity and pass it on to a parameter (ExtractDate) in the next sequential job activity load in a Sequencer Job. Something like below,

Datastage Sequencer Job

Issue: The problem with the job now was that I was not able to successfully set the parameter in JobActivity with the output from the Execute Command Activity Step.

Code in ExecuteCommand Activity was a simple cat command as below,

cat /location/extractdate.txt

The output of the same, I am trying to write into a parameter in the Job_Activity object property as below,

While I try to execute with the above setup, I am facing the error below,

Error calling DSSetParam(prmCNIRTP), code=-4 [ParamValue/Limitvalue is not appropriate]

Solution: The solution is to rap the value expression in Job Activity as below instead of how I have given in the screenshot above,

EReplace(Execute_Command_0.$CommandOutput,@FM,"")

This solved it for me. The above statement removes any special characters in the flat file. Special characters are highly likely when you read from flat files.

References:

https://tutel.me/c/programming/questions/15089213/paramvaluelimitvalue+is+not+appropriate+error+while+using+trim+function+in+datastage

Wednesday, January 16, 2019

Script to terminate BODS Job on condition

Scenario: We have a metadata table for extract runs, before running an extract job, we need to check whether the previous run was successful. This can be done in Data Services using scripting.

Solution:

Use the following script

if ( sql('DS_Datastore_Name','SELECT EXTRACT_STATUS FROM EXTRACT_RUN WHERE EXTRACT_NAME ='Extract 1') != 'Completed' )
begin
raise_exception_ext( 'Job terminated due to error in previous run. Correct the previous load and re-run again',1);
end

In the above code, we write a script where in we use an if condition and inside the if condition with the help of a SQL function, we write code to check if the previous run was completed or not. If not completed, it goes inside the if condition where we have written the below code,

raise_exception_ext( 'Job terminated due to error in previous run. Correct the previous load and re-run again',1);

This code will make sure that a exception is raised and there by failing the job.

Wednesday, May 30, 2018

SAP Data Services SQL Transform with variable - Invalid Pseudocolumn error [Solved]

Scenario: I had a scenario where in I needed to achieve a lead, lag column logic in BODS and had to do with a SQL Transform. Here I have a filter on date, i.e filter CreateDateTime column less than extract date. So I have a variable $PV_ExtractDate where in I pass my extract date value.

which would make my query look something as below,

SELECT COL_A, COL_B FROM SALES WHERE CreateDateTime < $PV_ExtractDate

The above syntax in SQL Transform was creating an error for me as the value was not being passed to the variable.

Error: SQL submitted to ODBC data source <SAMPLE> resulted in error <[Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid pseudocolumn "$PV_ExtractDate".>

Cause: BODS doesn't recognize $PV_ExtractDate as a variable value in SQL Transform, this can be corrected by using the following syntax,

SELECT COL_A, COL_B FROM SALES WHERE CreateDateTime < {$PV_ExtractDate}

Note: It is advised not to go with a SQL Transform, unless no workaround is feasible in SAP BODS. This is because you are writing code that is specific to a particular database, hence if the source system is changed to a new type of database an issue might occur.

Sunday, March 18, 2018

gen_row_num_by_group function in SAP BODS not creating sequence properly [SOLVED]

Issue: A week back, I had written a post on how we used the gen_row_num_by_group function create a sequence of numbers based on a particular column in BODS. I ran into an issue sometime later, where the sequences were not getting properly generated.

Cause: This is because the input source was not properly sorted. When you use this particular function based on a column, you should note that the input to the query transform should be properly sorted first. This is very important, if you miss this then the data will not be properly sorted and hence create wrong sequence numbers.

Resolution: The way to resolve this issue is to have your input data sorted. For example, if you are creating a sequence based on ClassId column as below,

gen_row_num_by_group(ClassId)

You should in this case make sure that the data coming to the query transform be grouped on this particular column, i.e ClassId in this example. This makes sure the sequence gets generated accordingly.

Please do write to us, if you found this useful.

Thursday, March 15, 2018

Found erroneous expression <> in SAP BODS Designer [SOLVED]

Issue: SAP BODS shows no error on validation, but comes up with this 'Found erroneous expression <> ' error at times.

Found erroneous expression error in SAP Data Services Designer

Cause: This is because your workflow, or dataflow might not have proper calls to variables/parameters that are being used. The variables may not be properly referenced. The value of the parameter probably is not being passed from the parent object(job,workflow) to the child object(workflow, dataflow).

Resolution: Check for parameters used and whether they are correctly being referenced from the child object. Solved the problem for me.

References

http://www.forumtopics.com/busobj/viewtopic.php?p=1036501

Tuesday, February 27, 2018

Dynamically Generate a Target Flat File Format from Dataflow in SAP BODS Designer

Scenario: To generate a flat file format from inside a data-flow rather than typing each column, datatype and size.

Solution: Being new to SAP BODS designer, I found it really hard to type out each column in a flat file format, along with the datatype and it's corresponding size. It can become a tiresome process.

The easiest solution is to go to one of your query transforms and on the schema out pane(right pane), right click on the parent of your 'Schema out' as in screenshot below,

SAP BODS - Create File Format Generation Option ( Enlarge )

Now from the list that opened up, select 'Create File Format'. This will open up a flat file format window for you with same column names, datatypes and lengths as in the schema out part of your query.

You can then save the same, edit the properties and use in your BODS dataflow for data loading.

Friday, February 23, 2018

Generate a number sequence grouped on a column/key in SAP BODS

Scenario: Create a sequence number grouped or based on another key/column value from the source result set.

eg: consider the following data-set

Now in the target dataset/table, we require a new column which would sequentially assign a number starting from 1 for each student under a particular class. The result set then would look something like below,

Now we have this new column (StudentSeqNbr) which sequentially numbers each student under a particular ClassId. The aim is to create such a sequential column using SAP BODS.

Solution: SAP BODS has an inbuilt function which will let you create such a sequence and can be mapped inside a Query transform.

Gen Rownum function in SAP BODS( Click to enlarge )

Using this function and the below syntax, a sequence can be created for the scenario above.

gen_row_num_by_group(ClassId)

Syntax:

gen_row_num_by_group(ColumnName)

UPDATE: The input should be sorted for this function to work properly to create a sequence.