The Fuzzy Lookup Transformation in SSIS is used to replace the wrongly typed words with correct words. Unlike Lookup Transformation, the Fuzzy Lookup transformation in SSIS uses fuzzy matching to find one or more close matches in the reference table and replace the source data with reference data.
The Fuzzy Lookup Transformation in SSIS is an important transformation in real-time. For example, while entering the product information, sometimes, we may enter the data with spelling mistakes. While doing Lookup Transformation, due to these wrongly typed words, we can’t match the source data with a lookup table. In these situations, SSIS Fuzzy Lookup Transformation will look for the nearest matching right word and replace the wrong value with the correct word. Please remember the following points before working with SSIS Fuzzy lookup
- SSIS Fuzzy Lookup Transformation only use OLE DB Connection Manager to establish a connection with the Reference table
- This transformation uses the Reference table to compare the wrong word in source data with the correct word in the reference table.
- SSIS Fuzzy Lookup allows us to use only DT_WSTR and DT_STR Data type column for Fuzzy Matching.
- This transformation creates temporary tables and indexes in the SQL Server database at runtime.
TIP: Please refer Fuzzy Grouping Transformation to understand the fuzzy grouping technique in SQL Server integration services
Fuzzy Lookup Transformation in SSIS Example
In this example, we are going to show how this SSIS Fuzzy lookup transformation works on the source data. The below screenshot is our source data
Fuzzy Transformation Lookup reference table is
STEP 1: Open BIDS and Drag and drop the data flow task from the toolbox to control flow. Next, rename it as Fuzzy Lookup Transformation in SSIS.
Double click on it and it will open the data flow tab.
STEP 2: Drag and drop OLE DB Source from toolbox to data flow region. Double click on OLE DB source in the data flow region will open the connection manager settings and provides an option to select SQL Table or SQL statement.
Here we selected the Employees database and previously shown Fuzzy Source table as our OLE DB source database and table.
STEP 3: Click on the columns tab to verify the columns. In this tab, we can uncheck the unwanted columns also.
Drag and drop the Fuzzy Lookup Transformation in the SSIS toolbox to the data flow region and connect the OLE DB source output arrow to it.
STEP 4: Double click on the SSIS Fuzzy Lookup Transformation will open the Fuzzy Lookup Transformation Editor. In the reference table tab, we have to configure the connection manager and also select the fuzzy lookup table. For this, this tab has multiple options.
- OLE DB Connection Manager: SSIS Fuzzy Lookup Transformation only supports OLE DB Connection Manager to establish a connection with the Reference table. So, create a connection using the OLE DB Connection Manager.
- Generate new index: SSIS Fuzzy Lookup transformation will generate a new index on the referenced table to use for the Fuzzy lookup.
- Reference table name: From the drop-down list select the reference table from the OLE DB connection
- Store new index: If you want to save the new index on the Fuzzy lookup table, select this option.
- New index name: If you have chosen the Store New index option, write the index name here.
- Maintain stored index: If you want the SQL Server to maintain the newly created index using the Store New index option, checkmark this option.
- Use existing index: If you want to use the existing (already created) index on the fuzzy lookup table then select this option
- Name of an existing index: Please select the existing index from the drop-down list
From the above screenshot, you can observe that we configured the OLE DB connection manager to Employees database and then selected the Fuzzy Lookup Table as a reference table.
STEP 5: Checkmark the columns if you wish to pass through and Join the source column (Country Name) to the reference table column name (again Country Name in this example). If you want to change the output column, then change the alias name as we turned in the below screenshot
If you required both changed and unchanged country name columns in the destination folder, checkmark the pass-through option country name column
STEP 6: In the advanced tab, we can configure the similarity threshold by dragging the bar pointer to the required limit. Available options in the Advanced tab are as follows:
- Maximum number of matches to output per lookup: Please specify the Maximum number of matches the SSIS Fuzzy Lookup Transformation may return. The default is 1.
- Similarity threshold: The similarity threshold ranges between 0 and 1, where 1 is an exact match. The SSIS Fuzzy Lookup Transformation Editor provides a slider to adjust the similarity between 0 and 1. If the similarity threshold is closer to 1, the source column should match more accurately to reference data.
Click ok to finish configuring the Fuzzy lookup Transformation in SSIS. Now drag and drop the OLE DB destination into the data flow region.
The SSIS Fuzzy lookup Transformation Editor generated two extra columns along with the existing columns
- _Similarity: The similarity score is between 0 and 1. It will show how accurate the source data matched with the fuzzy lookup data. For instance, 0.80 means source data is 80% matched with a destination.
- _Confidence: Confidence Score shows how much confident Fuzzy lookup Transformation is about the best match in the fuzzy lookup table. For instance, 0.50 means Fuzzy lookup Transformation is 50% sure that Australia is the best match for Australi.
STEP 7: Now, we have to provide the Server, database, and table details of the destination. So double click on the OLE DB Destination and provide the required information
Here we selected the [SSIS Tutorials] database as our destination database and selected the destination table.
STEP 8: Click on the Mappings tab to check whether the SSIS Fuzzy lookup source columns exactly mapped to the destination columns.
Click ok to finish designing the Fuzzy Lookup transformation in SSIS package. Let us run the package
Let’s open the SQL Server Management Studio and select the required table to check the results
SELECT [Cid] ,[Changed CountryName] ,[_Similarity] ,[_Confidence] ,[_Similarity_CountryName] FROM [SSIS Tutorials].[dbo].[Fuzzy Lookup Transformation Output]