The Fuzzy Lookup Transformation in SSIS is used to replace the wrongly typed words with correct words. Unlike Lookup, the Fuzzy Lookup transformation in SSIS uses fuzzy matching to find one or more close matches in the reference table and replace the source data with reference data.
The Fuzzy Lookup Transformation in SSIS is important in real-time. For example, while entering the product information, sometimes, we may enter the data with spelling mistakes. While doing Lookup Transformation, due to these wrongly typed words, we can’t match the source data with a lookup table.
In these situations, SSIS Fuzzy Lookup Transformation will look for the nearest matching right word and replace the wrong value with the correct word. Please remember the following points before working with SSIS Fuzzy lookup transformation.
- It only uses OLE DB Connection Manager to establish a connection with the Reference table.
- This transformation uses the Reference table to compare the wrong word in source data with the correct word in the reference table.
- Reporting Service allows us to use only DT_WSTR and DT_STR Data type columns for Fuzzy lookup Matching.
- This transformation creates temporary tables and indexes in the SQL Server database at runtime.
TIP: Please refer to Fuzzy Grouping Transformation to understand the grouping technique in SQL Server integration services
Fuzzy Lookup Transformation in SSIS Example
In this example, we are going to show how this Fuzzy lookup transformation works on the source data. The below screenshot is our source data.
Reference table is
STEP 1: Open BIDS and Drag and drop the data flow task from the toolbox to control flow. Next, rename it Fuzzy Lookup Transformation in SSIS.
Double click on it, and it will open the data flow tab.
STEP 2: Drag and drop OLE DB Source from toolbox to data flow region. Double click on the OLE DB Source in the data flow region will open the connection manager settings and provides an option to select SQL Table or statement.
Here we selected the Employees database and the previously shown table as our OLE DB source database and table.
STEP 3: Click on the columns tab to verify the columns. In this tab, we can uncheck the unwanted columns also.
STEP 4: Double click on the Fuzzy Lookup Transformation will open the Editor. In the reference table tab, we have to configure the connection manager and also select the table. For this, this tab has multiple options.
- OLE DB Connection Manager: Fuzzy Lookup Transformation only supports OLE DB Connection Manager to establish a connection with the Reference table. So, create a connection using the OLE DB Connection Manager.
- Generate new index: SSIS Fuzzy Lookup transformation will generate a new index on the referenced table to use.
- Reference table name: From the drop-down list, select the reference table from the OLE DB connection
- Store new index: If you want to save the new index on the table, select this option.
- New index name: If you have chosen the Store New index option, write the index name here.
- Maintain stored index: If you want the SQL Server to maintain the newly created index using the Store New index option, checkmark this option.
- Use existing index: If you want to use the existing (already created) index on the table, then select this option.
- Name of an existing index: Please select the existing index from the drop-down list
From the above screenshot, you can observe that we configured the OLE DB connection manager to the Employees database and then selected the Fuzzy Lookup Table as a reference table.
STEP 5: Checkmark the columns if you wish to pass through and Join the source column (Country Name) to the reference table column name (again, Country Name in this example). If you want to change the output column, then change the alias name as we turned in the below screenshot.
If you require both changed and unchanged country name columns in the destination folder, checkmark the pass-through option country name column
STEP 6: In the advanced tab, we can configure the similarity threshold by dragging the bar pointer to the required limit. Available options in the Advanced tab are as follows:
- A maximum number of matches to output per lookup: Please specify the Maximum number of matches the Transformation may return. The default is 1.
- Similarity threshold: The similarity threshold ranges between 0 and 1, where 1 is an exact match. The SSIS Fuzzy Lookup Transformation Editor provides a slider to adjust the similarity between 0 and 1. If the similarity threshold is closer to 1, the source column should match more accurately to reference data.
Click ok to finish configuring the Fuzzy Lookup Transformation in SSIS. Now drag and drop the OLE DB destination into the data flow region.
The Fuzzy Lookup Transformation Editor generated two extra columns along with the existing columns
- _Similarity: The similarity score is between 0 and 1. It will show how accurate the source data matched with the fuzzy lookup data. For instance, 0.80 means source data is 80% matched with a destination.
- _Confidence: Confidence Score shows how much confident Fuzzy lookup Transformation is about the best match in the fuzzy lookup table. For instance, 0.50 means Fuzzy lookup Transformation is 50% sure that Australia is the best match for Australi.
STEP 7: Now, we have to provide the Server, database, and table details of the destination. So double click on the OLE DB Destination and provide the required information.
Here we selected the following database as our destination database and selected the destination table.
STEP 8: Click on the Mappings tab to check whether the Fuzzy lookup source columns are exactly mapped to the destination columns.
Click ok to finish designing the Fuzzy Lookup transformation in the SSIS package. Let us run the package.
Let’s open the Management Studio and select the required table to check the transformation results.
SELECT [Cid] ,[Changed CountryName] ,[_Similarity] ,[_Confidence] ,[_Similarity_CountryName] FROM [Fuzzy Lookup Transformation Output]