The Talend SCD, also called Slowly Changing Dimension, can be accomplished by the tDBSCD field. We can use Talend SCD to keep track of changes made on a table or keeping the history of table data.
To demonstrate the Talend SCD, we are using the following table that we got from the tUnite job.
NOTE: We have also created few tables to store the Talend SCD output in the Database. For the all the tables, the basic structure is same as the source table, and added few extra columns like SKey (int), scd_version(int), scd_active(int), scd_start(date), and scd_end(date) columns.
Talend SCD or tDBSCD Example
First, drag and drop the tDBConnection, tDBCommit, and tDBOutput from Palette to Job design space. Next, we have selected the Talend_Unite table from the Repository schema. Please refer to the Connect Talend to SQL article.
Please drag and drop the tDBSCD from DB Common folder to Job design. There is a DB specific one as well, and you can use that.
Please select the Database you are working with. Here, we are choosing the Microsoft SQL Server and click the Apply button.
Next, connect the tDBInput with Talend tDBSCD. From the below screenshot, you can see the fields that you have to fill in the Component tab. If you have the existing connection, use the same otherwise, please fill all those details manually or using context groups.
Here, we are using the an existing connection that we established in the tDBConnections field. So, we selected the tDBConnection_1 from the Component list.
Next, click on the SCD Editor button will open the following Talend window. All the available columns placed in the Unused section.
Talend SCD Type 1 Example
The Talend SCD Type 1 or Slowly Changing Dimension Type 1 doesn’t keep the history.
Source Keys: Please place the key column or primary key column in this section.
Surrogate Keys: If you have any numeric column representing the Surrogate key, please use that one. Here, we are creating a new column name SKey, and its value will be either
- Input filed: Any input column
- Auto Increment: The value will automatically increment.
- Routine: You can also use routines.
- Table max + 1: Table maximum value plus one.
Let me select the last option fo this Talend SCD type 1.
Next, drag and drop the columns that you want to track for the changes. In this example, we want to track the changes of all the columns, so we added all the column names in the Type 1 fields section.
Until now, we haven’t specified the Target table where we want to keep the history or SCD Type 1 result. So, please click the Browse button and choose the table from the window. Here, we are selecting the Emp_SCD1 table that we created earlier.
Click on the Edit Schema button to check the fields. Here, SKey is the surrogate key that we used.
Let us run the Talend SCD Type 1 Job.
Let me open Management Studio to see the result by opening the SQL Emp_SCD1 table.
As you can see from the above image, it dumped the records from Talend_Unite (TU) table to this table along with the Skey column.
From the below screenshot, you can see the changes.
Please use the SCD Type 1 if you don’t want to keep the data history.
Talend SCD Type 2 Example
The Talend SCD Type 2 or Slowly Changing Dimension Type 2 keeps the history of the changes. Keeping the history means it will keep the old row as it is, and adds an extra row with new values. To find out the difference between old and new values, it adds two additional columns called scd_start and scd_end date.
- scd_start: By default, it assigns the starting time of Job or date of the first entry of the record.
- scd_end: For the first entry, the value will be NULL. Whenever the changes made to this row, this value will update the present date. Next, a new row will create with the current date as scd_start and Null as scd_end date.
Please click the OK button to complete the Talend SCD type 2 settings.
Let me change or select the Emp_SCD2 table from Repository. Next, click on the Edit Schema button to see the extra columns create by SCD type 2.
Let me run the Talend SCD type 2 Job and see the result. Here, you can see the extra columns where scd_end is NULL.
Here, we are changing the Education values and Income value. Next, run the Talend SCD type 2 Job.
Now you can see the two new rows with the updated values. Notice that the scd_end date of both these original rows has updated to the current date. It means, Yearly Income of EmpID 3 was 50000 for scd_start to scd_end, and it has updated to 111111 on 2020-05-29. Remember, we did all this at the same time, so you might not notice the date difference. However, if you allow a day or change the input date format to include time, then you can see the change.
Within the Talend SCD Type 2, we have two more Versioning options. They are
- Version (scd_version): It assigns integer value 1 to n, where 1 = first inserted row, 2 = first change of the row, 3 = second change, etc.
- Active (scd_active): It assigns one to the active or latest row and 0 to the previous or old row.
You can use either one of them or both based on the requirement. Here, we are using both, so we checked version and active.
Next, we chose a new Emp_SCD2Version table from the Repository. Please click on the Edit schema to see the new columns added by the SCD type 2 versioning. Click OK and then run the Talend SCD2 versioning job.
The following screenshot shows the data that we got primarily.
We change the EmpID FirstName as Suresh and EmpID 8th occupation as Support. Next, run the Talend SCD Type 2 version job.
From the below screenshot, you can see the output. Here, the version number of the new rows is 2, and the scd_active of the old row becomes zero, and the new one becomes 1. By seeing the version, you can understand the number of times that value has changed over some time. And by seeing the scd_active, which one is the latest one.
Let me show you what happens when we again change the EmpID 5 row. (Education = Degree).
See, SCD Type 2 added one more row with version number 3 and scd_active as 1.
Talend SCD Type 3 Example
The Talend SCD Type 3 or Slowly Changing Dimension Type 3 keeps the history of the last two changes. I mean the current value and the previous value.
To keep the history, we have to drag the required fields into the Type 3 fields section.
By dragging the fields, Talend will create a current (actual columns), and previous values. To demonstrate the Talend SCD Type 3, we are using the Occupation, Sales, and Yearly Income columns.
Next, we chose a new Emp_SCD3 table from the Repository. Please click on the Edit schema to see the new column SKey added by the SCD type 3. Please run the Talend SCD3 job.
And the result of the Talend SCD Type 3 is
Here, we made a few changed for those three columns, as we showed below. Next, run the Talend SCD type 3 job.
The result shows the New record in the original column, the previous value in the previous_Occupation column, etc.
In the above Talend SCD example, we used only a few columns for the SCD Type 3. For this, we kept the remaining columns in the Unused section. To get those columns, please place them in the Type 0 fields. All the columns placed under this section remain unchanged.
To accommodate all the columns, we selected another table called Emp_SCD3AllColumns. You can check the schema as well.
Let me run this Talend SCD type 3 for all the columns
Let do some changes to the original table and run the SCD Type 3 job. If you notice, we changed the LastName and Education columns as well.
Now you can see all the columns in the table. Notice that the LastName and Education column’s values remain unchanged.