In this article, we are going to explain, How to remove duplicates using Sorter Transformation in Informatica with an example. For this example, we are going to use the below show data
From the above screenshot, you can observe that we have 60398 records with many duplicates. Our task is to select distinct records or remove duplicates using sorter transformation in Informatica.
Remove Duplicates Using Sorter Transformation in Informatica
Before we start doing anything, First connect to Informatica repository service. In order to connect with the Repository service, we have to provide the Informatica Admin Console credentials. So, Please provide the appropriate Username and Password and click on Connect button as shown below.
TIP: Here you have to provide the Admin Username and password that you specified while installing the Informatica Server.
Step 1: Creating Source Definition
Once you connected successfully, Please navigate to Source Analyzer and define our Sources. In this example, we are using the already created SQL table (Internet Sales) as our source definition. Please refer Database Source in Informatica to understand the steps involved in creating source definition
Step 2: Creating a Target Definition
Please navigate to Target Designer and define the Target(s). In this example, we are using the already created SQL table (Fact Internet Sales) as our target definition. Please refer Create Informatica Target table using Source Definition to understand the steps involved in creating a target definition
Step 3: Creating Mapping
In order to create a new mapping, Please navigate to Mappings menu in Menu Bar and select the Create.. option. This will open the Mapping Name window as shown below. Here, you have to write a unique name for this mapping and click OK button.
TIP: Please refer Informatica Mapping article to understand the steps involved in creating Mapping.
Next, Drag and drop the [Internet Sales] source definition from Sources folder to the mapping designer and create sorter transformation. In order to remove duplicates, sorter transformation requires some data so, we have to connect the Source definition with the transformation using the Autolink.. option.
TIP: Please refer Sorter Transformation in Informatica article to understand the steps involved in configuring Sorter transformation.
Double click on the Sorter transformation to configure the properties. From the below screenshot you can observe the list of available properties in Properties tab. In this example, we want to remove duplicate records so, checkmark the Distinct property.
TIP: This will act as SQL SELECT DISTINCT Statement
Next, If you observe the Ports tab, Sorter Transformation is sorting all the input columns in ascending order. Remember, you don’t have to do anything here.
Click OK button to finish configuring the properties.
Next, Drag and drop the [Duplicate Sorter Transformation in Informatica] target definition from Targets folder to the mapping designer and connect the Sorter Transformation with the target definition. Please use the Autolink.. option to connect them.
Before we close the Mapping, Let us validate the mapping by going to Mapping Menu bar and selecting the Validate option.
Step 4: Creating a Workflow
Once we finish creating the Mapping we have to create the workflow for it. PowerCenter Workflow manager provides two approaches to create a workflow.
In this example, we created the Workflow manually. Once we created the workflow, our next step is to create a session.
NOTE: We strictly recommend to refer Informatica Workflow article to understand the steps involved in creating Workflow manually.
Step 4(a): Creating Session
There are two types of sessions in Informatica:
- Non-reusable Session Task: Please refer Session in Informatica article
- Reusable Session Task: Please refer Reusable Session in Informatica article
For this example, we are going to create Non-reusable Session. In order to create Non-reusable Session in Informatica, Please navigate to Tasks Menu and select the Create option to open the Create Task window. Here you have to select the Session as Task type (default) and enter a unique name for the session.
Once you click on the Create button, a new window called Mappings will be opened. Here we have to select the mapping you want to associate with this session. From the below screenshot you can observe that we are selecting the mapping that we created earlier (in Step 3).
Although we have to configure sources, Target and some common properties, we are not going to explain them here. Please refer Session in Informatica article to understand the properties.
From the below screenshot you can observe that the Remove Duplicates Using Sorter Transformation in Informatica workflow is a valid one. Now, Let us start the Workflow by navigating to Workflows menu and selecting the Start Workflow option.
Once you select the Start Workflow option, Informatica PowerCenter Workflow monitor will be opened to monitor the workflow. From the below screenshot you can observe that our workflow is executed without any errors.
Let us open the SQL Server Management Studio to check whether we successfully remove duplicates / transferred the distinct records from Adventure Works DW to the Informatica target database.
From the above screenshot, you can observe that there are only 158 distinct records out of 60398. Let see the same by writing the Distinct query in SQL
Thank You for Visiting Our Blog