Talend SCD

The Talend SCD, also called Slowly Changing Dimension, can be accomplished by the tDBSCD field. We can use Talend SCD to keep track of changes made on a table or keeping the history of table data.

To demonstrate the Talend SCD, we are using the following table that we got from the tUnite job.

Talend SCD 1

NOTE: We have also created few tables to store the Talend SCD output in the Database. For the all the tables, the basic structure is same as the source table, and added few extra columns like SKey (int), scd_version(int), scd_active(int), scd_start(date), and scd_end(date) columns.

Talend SCD or tDBSCD Example

First, drag and drop the tDBConnection, tDBCommit, and tDBOutput from Palette to Job design space. Next, we have selected the Talend_Unite table from the Repository schema. Please refer to the Connect Talend to SQL article.

Talend SCD 2

Please drag and drop the tDBSCD from DB Common folder to Job design. There is a DB specific one as well, and you can use that.

Talend SCD 3

Please select the Database you are working with. Here, we are choosing the Microsoft SQL Server and click the Apply button.

Talend SCD 4

Next, connect the tDBInput with Talend tDBSCD. From the below screenshot, you can see the fields that you have to fill in the Component tab. If you have the existing connection, use the same otherwise, please fill all those details manually or using context groups.

Talend SCD 5

Here, we are using the an existing connection that we established in the tDBConnections field. So, we selected the tDBConnection_1 from the Component list.

Talend SCD 6

Next, click on the SCD Editor button will open the following Talend window. All the available columns placed in the Unused section.

Talend SCD 7

Talend SCD Type 1 Example

The Talend SCD Type 1 or Slowly Changing Dimension Type 1 doesn’t keep the history.

Source Keys: Please place the key column or primary key column in this section.

Talend SCD 8

Surrogate Keys: If you have any numeric column representing the Surrogate key, please use that one. Here, we are creating a new column name SKey, and its value will be either

  • Input filed: Any input column
  • Auto Increment: The value will automatically increment.
  • Routine: You can also use routines.
  • Table max + 1: Table maximum value plus one.

Let me select the last option fo this Talend SCD type 1.

Talend SCD 9

Next, drag and drop the columns that you want to track for the changes. In this example, we want to track the changes of all the columns, so we added all the column names in the Type 1 fields section.

Talend SCD 10

Until now, we haven’t specified the Target table where we want to keep the history or SCD Type 1 result. So, please click the Browse button and choose the table from the window. Here, we are selecting the Emp_SCD1 table that we created earlier.

Talend SCD 11

Click on the Edit Schema button to check the fields. Here, SKey is the surrogate key that we used.

Talend SCD 12

Let us run the Talend SCD Type 1 Job.

Talend SCD 13

Let me open Management Studio to see the result by opening the SQL Emp_SCD1 table.

Talend SCD 14

As you can see from the above image, it dumped the records from Talend_Unite (TU) table to this table along with the Skey column.

Let me change a few values of the TU table and run the Talend SCD Type 1 Job.

Talend SCD 15

From the below screenshot, you can see the changes.

Talend SCD 16

Please use the SCD Type 1 if you don’t want to keep the data history.

Talend SCD Type 2 Example

The Talend SCD Type 2 or Slowly Changing Dimension Type 2 keeps the history of the changes. Keeping the history means it will keep the old row as it is, and adds an extra row with new values. To find out the difference between old and new values, it adds two additional columns called scd_start and scd_end date.

  • scd_start: By default, it assigns the starting time of Job or date of the first entry of the record.
  • scd_end: For the first entry, the value will be NULL. Whenever the changes made to this row, this value will update the present date. Next, a new row will create with the current date as scd_start and Null as scd_end date.

Please click the OK button to complete the Talend SCD type 2 settings.

Talend SCD 17

Let me change or select the Emp_SCD2 table from Repository. Next, click on the Edit Schema button to see the extra columns create by SCD type 2.

Talend SCD 18

Let me run the Talend SCD type 2 Job and see the result. Here, you can see the extra columns where scd_end is NULL.

Talend SCD 19

Here, we are changing the Education values and Income value. Next, run the Talend SCD type 2 Job.

Talend SCD 20

Now you can see the two new rows with the updated values. Notice that the scd_end date of both these original rows has updated to the current date. It means, Yearly Income of EmpID 3 was 50000 for scd_start to scd_end, and it has updated to 111111 on 2020-05-29. Remember, we did all this at the same time, so you might not notice the date difference. However, if you allow a day or change the input date format to include time, then you can see the change.

Talend SCD 21

Within the Talend SCD Type 2, we have two more Versioning options. They are 

  • Version (scd_version): It assigns integer value 1 to n, where 1 = first inserted row, 2 = first change of the row, 3 = second change, etc.
  • Active (scd_active): It assigns one to the active or latest row and 0 to the previous or old row.

You can use either one of them or both based on the requirement. Here, we are using both, so we checked version and active.

Talend SCD 22

Next, we chose a new Emp_SCD2Version table from the Repository. Please click on the Edit schema to see the new columns added by the SCD type 2 versioning. Click OK and then run the Talend SCD2 versioning job.

Talend SCD 23

The following screenshot shows the data that we got primarily.

Talend SCD 24

We change the EmpID FirstName as Suresh and EmpID 8th occupation as Support. Next, run the Talend SCD Type 2 version job.

Talend SCD 25

From the below screenshot, you can see the output. Here, the version number of the new rows is 2, and the scd_active of the old row becomes zero, and the new one becomes 1. By seeing the version, you can understand the number of times that value has changed over some time. And by seeing the scd_active, which one is the latest one.

Talend SCD 26

Let me show you what happens when we again change the EmpID 5 row. (Education = Degree).

Talend SCD 27

See, SCD Type 2 added one more row with version number 3 and scd_active as 1.

Talend SCD 28

Talend SCD Type 3 Example

The Talend SCD Type 3 or Slowly Changing Dimension Type 3 keeps the history of the last two changes. I mean the current value and the previous value.

To keep the history, we have to drag the required fields into the Type 3 fields section.

By dragging the fields, Talend will create a current (actual columns), and previous values. To demonstrate the Talend SCD Type 3, we are using the Occupation, Sales, and Yearly Income columns.

Talend SCD 29

Next, we chose a new Emp_SCD3 table from the Repository. Please click on the Edit schema to see the new column SKey added by the SCD type 3. Please run the Talend SCD3 job.

Talend SCD 30

And the result of the Talend SCD Type 3 is

Talend SCD 31

Here, we made a few changed for those three columns, as we showed below. Next, run the Talend SCD type 3 job.

Talend SCD 32

The result shows the New record in the original column, the previous value in the previous_Occupation column, etc.

Talend SCD 33

In the above Talend SCD example, we used only a few columns for the SCD Type 3. For this, we kept the remaining columns in the Unused section. To get those columns, please place them in the Type 0 fields. All the columns placed under this section remain unchanged.

Talend SCD 34

To accommodate all the columns, we selected another table called Emp_SCD3AllColumns. You can check the schema as well.

Talend SCD 35

Let me run this Talend SCD type 3 for all the columns

Talend SCD 36

Let do some changes to the original table and run the SCD Type 3 job. If you notice, we changed the LastName and Education columns as well.

Talend SCD 37

Now you can see all the columns in the table. Notice that the LastName and Education column’s values remain unchanged.

Talend SCD 38