Talend SCD Type 2 Manual

In this section, we show you how to design Talend SCD Type 2 or Slowly Changing Dimension manually. It is one of the important interview questions one might face.

For this Talend SCD type 2 manual demo, we used the same source and destination table that we used previously. Please refer to the Talend SCD example article.

The following are the list of steps required to perform the Talend SCD type 2 manually.

  • We need SKey or Surrogate Key. Every time a new record insert into a table, SKey has to increment. So, we need the highest or maximum SKey value from the destination table.
  • We have to insert new records into the Emp_SCD2 table.
  • Next, compare the new record with the existing table record to check whether it performs a new insert or update. So, we need to perform a lookup.
  • After every update of the old value, we have to freshly insert the new record with the same EmpID and new SKey value.
  • Next, we have to update the scd_end date of the old record with the current date.

Talend SCD Type 2 Manual Example

First, we used one tDBInput and written the SQL Select statement query to get the Maximum surrogate key value. Here, we used the ISNULL function to replace the NULL return value with zero. When you are inserting the very first record into the table, SELECT MAX(SKey) FROM Emp_SCD2 returns NULL value. To prevent this, we used the ISNULL along with the MAX function.

Configure Source Connection and query table

Click on the Edit Schema and add the following column of integer type. It is because the return value of the above select statement is a single value.

Edit Schema and add a single column of int

Let me create a context variable to store the Maximum SKey value. By this, we can use this context variable in any component within this job. I suggest you add the default value is zero, which we did afterward.

Create one Context variable

Next, we added the tJavaRow component and assigned the input row value to the context variable. Here is where we store the maximum surrogate key value in the Talend context variable.

Talend SCD Type 2 Manual 4

On the subject OK trigger, let me add another tDBInput. This component is the actual input source table.

Talend SCD Type 2 Manual 5

Drag and drop the tMap and connect the tDBInput_1 main row to tMap. Next, add the tDBOutput to insert or load records into the destination table. While connecting tMap with tDBOuput, it will ask you to assign a name for that output. Here, we have given Fresh_Insert as the name.

Add tMap Output

Let me leave the tDBOuput settings to default because we change them later.

Talend SCD Type 2 Manual 7

Double-click on the tMap to open the properties window. Please click on the Automap! Button to map the columns.

Talend SCD Type 2 Manual 8

scd_start means current time or record insert time. So, let me use the Talend getCurrentDate function to get the current date.

scd_end date might change based on the organization. For the Talend SCD type 2 Manual demo purpose, we are using 31-12-2999. Please don’t forget to use the Talend parseDate function to parse or convert the string to the Date data type.

Talend SCD Type 2 Manual 9

There is a sequencing method under the Talend Numeric section to insert sequential numbers from 1 to N at specified steps, the same as the SQL Auto Increment. We want to insert the sequential numbers from the maximum surrogate key so that we can place the context.SKey as the second argument.

Talend SCD Type 2 Manual Sequencing Method

We forgot to add one to the Select statement so, let’s add the same. Select Max(SKey) gives you the maximum value, and the sequence will start from the maximum value, but it has to start from the next value. Remember, you can also change the sequence function code to add one number.

Talend SCD Type 2 Manual 11

Let me change the action to Insert and check the schema. Next, we are running this Talend SCD type 2 Manual job to check whether we are inserting new records.

Configure tDBOutput

Within the SQL Management Studio, you can see the records with the Surrogate key, scd_start, and end date.

Table Records

Until now, we have been performing a simple insert. To understand whether it is a new record or an old record with new values, we need to perform the lookup. So, let me add the Emp_SCD2 table as the lookup table to the tMap component. I mean, we are implementing a join.

Talend SCD Type 2 Manual 14

Double-click on the tMap, and add EmpID as the common key column. Next, change the Join Model to Inner Join, and match Model to Unique Math. By this, we can identify the old and new records.

Talend SCD Type 2 Manual 15

Click on the Customize icon, and change the Catch lookup inner join option from false to true. It will capture the inner join fail records so that we can pass them to another output.

Talend SCD Type 2 Custom 16

Let me add another tDBOutput connected to the Emp_SCD2 table, and assign the tMap output name as Insert Existing with New columns.

Assign the tMap to tDBOutput

Within the tMap properties, we used the Auto map to connect all the columns. Next, we change the scd_start to the Current date, scd_end to 2999, and SKey to sequential values. Everything is the same as the Fresh_Insert option.

Talend SCD Type 2 Custom 18

Please click on the Expression and add the following Expression. Here, the equals method compares the old Income value with the New Income value. ! means if both are not equal, then perform this insertion.

Talend SCD Type 2 Custom 19

Check all the options of the tDBOutput, and run the Talend SCD type 2 Manual job. Before running this job, we changed the Income of EmpID 3 to 666666.

Configure the tDBOutput of the custom

You can see that it has inserted the new record with updated income.

Result Table

However, we have to update the scd_end date of the old record. For this, we added one more tDBOutput that points to the Emp_SCD2 table. Next, we connected it to tMap and assigned the Update_Dates as the output name.

Talend SCD Type 2 Custom 22

Within the tMap, change the scd_end date to the current date for the Key and EmpID. Please don’t forget to add the following Expression to check which record is changed and which one to replace.

Talend SCD Type 2 Custom 23

Please change the section on data to update and check the columns using the Edit Schema button.

Talend SCD Type 2 Custom 24

We completed the Talend SCD type 2 Manual job creation. Let me truncate the Emp_SCD2 table and restore the default values of the Talend_Unite table. From the below Talend SCD type 2 screenshot, you can see the 15 records inserted freshly into the table.

Talend SCD Type 2 Custom 25

You can see the same in the SQL table.

Destination table

Let me change the income of the EmpID 15 to 999999 and run the job. You can see that one record inserted with a new value, and one update has happened.

Talend SCD Type 2 Custom 27

Within the SQL Management studio, you can see the new record and old record with an updated date.

Result set

Until now, we have been working with one column change. However, you can use logical or || to capture multiple changes. To demonstrate the same, we used the Sales column. Please change the Expression of both Update_dates and Insert Exsiting_with_NewValues.

Talend SCD Type 2 Custom 29

Let me change the Sales value of one record and the Income value of another record and run the Talend SCD type 2 Manual job.

Please see the output.

Final Result