SCD2 in Informatica

SCD2  means Slowly Changing Dimension Type 2, It will store current and total historical data.This Type 2 method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. With Type 2, we have unlimited history preservation as a new record is inserted each time a change is made.
Implement SCD2 through informatica
Here we take an example.
Create source table CUST with below query.

Source Data   
  • The Type 2 method tracks historical data by creating multiple records for a given natural key (CUST_ID) in the dimensional tables with separate surrogate keys (PM_PRIMARYKEY).For this create table with below query.

  • Here active_dt and inactive_dt used to indentify history data.
  • PM_PRIMARYKEY is surrogate key which is used to identify each and every record uniquely in target.
  • Before implementing we need to identify the attributes needs to be consider for history maintain.
  • In this example we will consider if any change in ADDRESS or CITY OR STATE.So if any change in ADDRESS or CITY or STATE then we need to insert new record and inactivate old record.
Creation of mapping:
  • Import Source table CUST and Target table CUST_D.
  • Create mapping with the name of m_scd_type_2.
  • Drag the source in to the mapping designer.
  • Create lookup table CUST_D.
  • Create one input port in lkp_CUST_D table with name in_CUST_D with data type double. And add condition CUST_ID=in_CUST_ID.
  • Connect CUST_ID from source to in_CUST_ID lookup table.
  • Create expression transformation and drag CUST_ID, CUST_NM, ADDRESS, CITY, STATE from Source qualifier to expression. In the same why drag CUST_NM, ADDRESS, CITY, STATE from lkp_CUST_D table to expression. Change attributes names in expression to identify source and lookup attributes as shown in diagram. 

  • In lookup transformation apply filter to retrieve only active records. This you can do it in lookup sql override.
Now create two output ports in expression transformation.
  • Out_DUMMY_DATE-- > which is used to populate ACTIVE_DT, INACTIVE_DT, INSERT_DT, UPDATE_DT attributes in target. Assign SYSDATE for this attribute. Out_FALG-- > which is used to flag record for insert new/insert update or update/Inactivate record. 
  • To flag a record we need to check three conditions Source record present in target or not.For this if PM_PRIMARYKEY which is coming from source is null that means the record with CUST_ID coming from source not present in target. This type of records needs to be inserted intarget directily. Are there any changes in attributes from source and Target?
  • For this if PM_PRIMARYKEY is not null means the CUST_ID coming from source present in target. Now we need to validate is there any changes in CUST_NM, ADDRESS, CITY, and STATE between source and target data. If any change in data then need to insert this record in target at the same time need to inactivate record present in target. If record coming from source present in target and no changes between source and target attributes then filter those records.
  • Write this condition in out_FLAG expression editor.
  • From the above condition if any record flagged as 1 that means it new record which is coming from source and this is not available in target.
  • If any record flagged as 2 that means it’s already exists in target and there is a difference in attributes from source and target.
  • If any record flagged as 3 that means this record present in source and target, there is no difference between attributes.
  • Now create router transformation and drag the following attributes from expression to router transformation.
  • Lkp_PM_PRIMARYKEY, src_CUST_ID, src_CUST_NM, src_ADDRESS, src_CITY, src_STATE, out_DUMMY_DT and out_FLAG. 
  • Create two groups in router one for insert and another one for update. In insert group will pass both new and changed records for insert.
  • Update group only to pass the record which we need to inactivate records. 
  • Now connect INSERT group from router to Target. Connect the flowing attributes from INSERT group to Target src_CUST_ID, src_CUST_NM, src_ADDRESS, src_CITY, src_STATE,out_DUMMAY_DT to respective fields in target. Connect out_DUMMY_DT field from router to INSERT_DT, UPDATE_DT and ACTIVE_DT attributes.
  • Create sequence transformation and connect nextval from seq transformation to target PM_PRIMARYKEY attribute in INSERT pipe line as shown in below screen. 
  • Now drag lkp_PM_PRIMARYKEY, out_DUMMAY_DT from UPDATE group of router transformation and connect to update strategy transformation. 
Now in update strategy transformation set the property as DD_UPDATE. 
  •  Finally the mapping will be like below. 
The  data will be available on CUST table is  
Assuming this is the first time we are running mapping so there won’t be any data in target. 

If you are running mapping on 2nd Jul data in target looks like below. 
  • So for all this records inactive date is null. Means all are active records.
  • After this run assume that data changed in source on 2nd. Changed data in source looks like below.
  • In the above data for first two records there are no changes after last refresh so there is now change in update date. But for record with CUST_ID 80003 CITY, STATE changed from previous day to today. So update date changed from 1St Jul to 2nd Jul.
  • If you run same job on 3rd Jul, then target data looks like below.
  • In target today two records get inserted. One is new record which is with CUST_ID 80004. And another record which is changed record 80003.
  • So in target for CUST_ID 80003 we have two records one is inactivated (PM_PRIMARYKEY=3) and another one is active record (PM_PRIMARYKEY=5). From this you can identify for over period of time what is the active record for particular customer.

Powered by Blogger.