In this article, we are going to explain how to remove duplicates using Sorter Transformation in Informatica with an example. For this example, we are going to use the below show data
From the above screenshot, you can observe that we have 60398 records with many duplicates. Our task is to select distinct records or remove duplicates using sorter transformation in Informatica.
Remove Duplicates Using Sorter Transformation in Informatica
Before we start removing duplicates using sorter, connect to Informatica repository service. For this, we have to provide the Informatica Admin Console credentials. So, Please provide the appropriate Username and Password and click on the Connect button.
TIP: Here, provide the Admin Username and password that you specified while installing the Informatica Server.
Step 1: Creating Source Definition
Once you connected successfully, Please navigate to Source Analyzer and define our Sources. In this example, we are using the already created SQL table (Internet Sales) as our source definition. Please refer Database Source in Informatica to understand the steps involved in creating source definition
Step 2: Creating a Target Definition
Please navigate to Target Designer and define the Target(s). In this example, we are using the already created SQL table (Fact Internet Sales) as our target definition. Please refer Create Informatica Target table using Source Definition to understand the steps involved in creating a target definition
Step 3: Remove Duplicates using Sorter Transformation in Informatica Mapping
To create a new mapping, Please navigate to the Mappings menu in Menu Bar and select the Create.. option. It will open the Mapping Name window. Here, you have to write a unique name for this Mapping and click the OK button.
TIP: Please refer Informatica Mapping article to understand the steps involved in creating Mapping.
Next, Drag and drop the [Internet Sales] source definition from Sources folder to the mapping designer and create a sorter transformation. The sorter transformation requires some data to remove the duplicates. So, we have to connect the Source definition with the transformation using the Autolink.. option.
TIP: Please refer to Sorter Transformation in Informatica article to understand the steps involved in configuring Sorter transformation.
Double click on the Sorter transformation to configure the properties. From the below screenshot, you can observe the list of available properties in Properties tab. In this example, we want to remove duplicate records so, checkmark the Distinct property.
TIP: This will act as SQL SELECT DISTINCT Statement
Next, If you observe the Ports tab, Sorter Transformation is sorting all the input columns in ascending order. Remember, you don’t have to do anything here.
Click the OK button to finish configuring the properties.
Next, Drag and drop the [Duplicate Sorter Transformation in Informatica] target definition from Targets folder to the mapping designer and connect the Sorter Transformation with the target definition. Please use the Autolink.. option to connect them.
Before we close the Mapping, Let us validate the mapping by going to Mapping Menu bar and selecting the Validate option.
Step 4: Creating a Workflow
Once we finish creating the Mapping we have to create the workflow to remove Duplicates using Sorter Transformation in Informatica. PowerCenter Workflow manager provides two approaches to create a workflow.
In this example, we created the Workflow manually. Once we created the workflow, our next step is to create a session.
NOTE: We strictly recommend to refer Informatica Workflow article to understand the steps involved in creating Workflow manually.
Step 4(a): Creating Session
There are two types of sessions in Informatica:
- Non-reusable Session Task: Please refer Session in Informatica article
- Reusable Session Task: Please refer Reusable Session in Informatica article
For this example, we are going to create Non-reusable Session. To create Non-reusable Session in Informatica, Please navigate to Tasks Menu and select the Create option to open the Create Task window. Here you have to choose the Session as Task type (default) and enter a unique name for the session.
Once you click on the Create button, a new window called Mappings will open. Here we have to select the Mapping you want to associate with this session. From the below screenshot, you can observe that we are choosing the Mapping that we created earlier (in Step 3).
Although we have to configure sources, Target, and some common properties, we are not going to explain them here. Please refer to Session in Informatica article to understand the properties.
From the below screenshot, you can observe that the Remove Duplicates Using Sorter Transformation in Informatica workflow is a valid one. Now, let us start the Workflow by navigating to the Workflows menu and selecting the Start Workflow option.
Once you select the Start Workflow option, the Informatica PowerCenter Workflow monitor will be opened to monitor the workflow. From the below screenshot, you can observe that our workflow executed without any errors.
Let us open the SQL Server Management Studio to check whether we successfully remove duplicates / transferred the distinct records from Adventure Works DW to the Informatica target database.
From the above screenshot, you can observe that there are only 158 distinct records out of 60398. Let see the same by writing the Distinct query in SQL