Talend SCD

The Talend SCD also called Slowly Changing Dimension, can be accomplished by the tDBSCD field. We can use Talend SCD to keep track of changes made on a table or keep the history of table data.

To demonstrate the Talend SCD, we are using the table that we got from the tUnite job. So, please refer to the same to understand the source.

NOTE: We have also created a few tables to store the Talend SCD output in the Database. For all the tables, the basic structure is the same as the source table and added few extra columns like SKey (int), scd_version(int), scd_active(int), scd_start(date), and scd_end(date) columns.

Talend SCD or tDBSCD Example

First, drag and drop the tDBConnection, tDBCommit, and tDBOutput from the Palette to the Job design space. Next, we have selected the Talend_Unite table from the Repository schema. Please refer to the Connect Talend to SQL article.

Configure the Server and query the Table

Please drag and drop the tDBSCD from the DB Common folder to the Job Design. There is a DB specific one as well, and you can use that.

Talend tDBSCD too for SCD 3

Please select the Database you are working with. Here, we choose the Microsoft SQL Server and click the Apply button.

Next, connect the tDBInput with Talend tDBSCD. From the below screenshot, you can see the fields you must fill in the Component tab. If you have the existing connection, use the same otherwise, please fill in all those details manually or using context groups.

Talend tDBSCD too for SCD Components tab

Here, we are using the existing connection that we established in the tDBConnections field. So, we selected the tDBConnection_1 from the Component list.

Talend tDBSCD too uses existing connection for SCD

Next, click on the SCD Editor button, which will open the following Talend window. All the available columns are placed in the Unused section.

Talend SCD Type 1 Example

The Talend SCD Type 1 or Slowly Changing Dimension Type 1 doesn’t keep the history.

Source Keys: Please place the key column or primary key column in this section.

Component Editor

Surrogate Keys: If you have any numeric column representing the Surrogate key, please use that one. Here, we are creating a new column named SKey, and its value will be either

  • Input filed: Any input column
  • Auto Increment: The value will automatically increment.
  • Routine: You can also use routines.
  • Table max + 1: Table maximum value plus one.

Let me select the last option for this Talend SCD type 1. Next, drag and drop the columns that you want to track for the changes. In this example, we want to track the changes in all the columns, so we added all the column names in the Type 1 fields section.

Talend tDBSCD tool Component Editor for SCD

Until now, we haven’t specified the Target table where we want to keep the history or SCD Type 1 result. So, please click the Browse button and choose the table from the window. Here, we are selecting the Emp_SCD1 table that we created earlier.

Configure Destination Table

Click on the Edit Schema button to check the fields. Here, SKey is the surrogate key that we used.

Edit Schema to check Columns

Let us run the Talend SCD Type 1 Job.

Run the Talend tDBSCD tool for SCD Job

Let me open Management Studio to see the result by opening the SQL Emp_SCD1 table.

Check the table

As you can see from the above image, it dumped the records from the Talend_Unite (TU) table to this table along with the Skey column.

Let me change a few values of the TU table and run the Talend SCD Type 1 Job.

Find the Find the changes

From the below screenshot, you can see the changes.

updated table

Please use the SCD Type 1 if you don’t want to keep the data history.

Talend SCD Type 2 Example

The Talend SCD Type 2 or Slowly Changing Dimension Type 2 keeps the history of the changes. Keeping the history means it will keep the old row as it is, and add an extra row with new values. To find out the difference between old and new values, it adds two additional columns called scd_start and scd_end date.

  • scd_start: By default, it assigns the starting time of the Job or the date of the first entry of the record.
  • scd_end: For the first entry, the value will be NULL. Whenever the changes are made to this row, this value will update the present date. Next, a new row will be created with the current date as scd_start and Null as scd_end date.

Please click the OK button to complete the Talend SCD type 2 settings.

Talend tDBSCD tool for SCD type 2

Let me change or select the Emp_SCD2 table from the Repository. Next, click on the Edit Schema button to see the extra columns create by SCD type 2.

Talend tDBSCD tool for SCD Schema settings

Let me run the Talend SCD type 2 Job and see the result. Here, you can see the extra columns where scd_end is NULL.

Null Dates

Here, we are changing the Education values and Income values. Next, run the Talend SCD type 2 Job.

Change the Original

Now you can see the two new rows with the updated values. Notice that the scd_end date of both these original rows has been updated to the current date. It means the yearly Income of EmpID 3 was 50000 for scd_start to scd_end, and it was updated to 111111 on 2020-05-29. Remember, we did all this at the same time, so you might not notice the date difference. However, if you allow a day or change the input date format to include time, then you can see the change.

Talend tDBSCD tool for SCD updates

Within the Talend SCD Type 2, we have two more Versioning options. They are 

  • Version (scd_version): It assigns integer value 1 to n, where 1 = first inserted row, 2 = first change of the row, 3 = second change, etc.
  • Active (scd_active): It assigns one to the active or latest row and 0 to the previous or old row.

You can use either one of them or both based on the requirement. Here, we are using both, so we checked the version and active.

Talend tDBSCD tool for SCD Versioning

Next, we chose a new Emp_SCD2Version table from the Repository. Please click on the Edit schema to see the new columns added by the SCD type 2 versioning. Click OK and then run the Talend SCD2 versioning job.

Talend tDBSCD tool for SCD Versioning Schema

The following screenshot shows the data that we got primarily.

Type 2 Primary Table

We change the EmpID FirstName to Suresh and the EmpID 8th occupation to Support. Next, run the Talend SCD Type 2 version job.

Type 2 Table Changes

From the below screenshot, you can see the output. Here, the version number of the new rows is 2, and the scd_active of the old row becomes zero, and the new one becomes 1. By seeing the version, you can understand the number of times that value has changed over some time. And by seeing the scd_active, which one is the latest one?

Type 2 Result Table after changes

Let me show you what happens when we again change the EmpID 5 row. (Education = Degree). See, SCD Type 2 added one more row with version number 3 and scd_active as 1.

Type 2 Version Final Table

Talend SCD Type 3 Example

The Talend SCD Type 3 or Slowly Changing Dimension Type 3 keeps the history of the last two changes. I mean the current value and the previous value.

To keep the history, we have to drag the required fields into the Type 3 fields section.

By dragging the fields, Talend will create a current (actual columns), and previous values. To demonstrate the Talend SCD Type 3, we are using the Occupation, Sales, and Yearly Income columns.

Talend tDBSCD tool for SCD Type 3 Component Editor

Next, we chose a new Emp_SCD3 table from the Repository. Please click on the Edit schema to see the new column SKey added by the SCD type 3. Please run the Talend SCD3 job.

Talend tDBSCD tool for SCD Type 3 Scheme Editor

The result of the Talend SCD Type 3 will create three extra columns for previous Occupation, Income, and Sales. Another column for the SKey.

Type 3 First Generated Table

Here, we made a few changes to those three columns, as we showed below. Next, run the Talend SCD type 3 job.

Type 3 changes in the original table

The result shows the New record in the original column, the previous value in the previous_Occupation column, etc.

Talend tDBSCD tool for SCD Type 3 Changes in the table

In the above Talend SCD example, we used only a few columns for the SCD Type 3. For this, we kept the remaining columns in the Unused section. To get those columns, please place them in the Type 0 fields. All the columns placed under this section remain unchanged.

Handling Talend tDBSCD tool for SCD Type 3 remaining columns

To accommodate all the columns, we selected another table called Emp_SCD3AllColumns. You can check the schema as well.

Talend tDBSCD tool for SCD Type 3 unused columns schema

Let me run this Talend SCD type 3 for all the columns. Let’s make some changes to the original table and run the SCD Type 3 job. If you notice, we changed the LastName and Education columns as well.

Type 3 Result Table Changes

Now you can see all the columns in the table. Notice that the LastName and Education column’s values remain unchanged.

Type 3 Result Table