The Fuzzy Lookup Transformation in SSIS is used to replace the wrongly typed words with correct words. Unlike Lookup Transformation, Fuzzy Lookup transformation in SSIS uses fuzzy matching to find one or more close matches in the reference table and replace the source data with reference data.
The Fuzzy Lookup Transformation in SSIS is very important transformation in real-time. For example, while entering the product information sometimes we may enter the data with spelling mistakes. While doing Lookup Transformation, due to this wrongly typed words we can’t match the source data with lookup table. In these situations SSIS Fuzzy Lookup Transformation will look for the nearest matching correct word and replace the wrong value with correct word. Please remember the following points before working with SSIS Fuzzy lookup
- SSIS Fuzzy Lookup Transformation only use OLE DB Connection Manager to establish connection with the Reference table
- This transformation uses Reference table to compare wrong word in source data with the correct word in the reference table.
- SSIS Fuzzy Lookup allows us to use only DT_WSTR and DT_STR Data type column for Fuzzy Matching.
- This transformation creates temporary tables and indexes in the SQL Server database at runtime.
TIP: Please refer Fuzzy Grouping Transformation to understand the fuzzy grouping technique in SQL Server integration services
Fuzzy Lookup Transformation in SSIS Example
In this example we are going to show, how this SSIS Fuzzy lookup transformation works on the source data. Below screenshot is our source data
Fuzzy Transformation Lookup reference table is
STEP 1: Open BIDS and Drag and drop the data flow task from the toolbox to control flow and rename it as Fuzzy Lookup Transformation in SSIS.
Double click on it and it will open the data flow tab.
STEP 2: Drag and drop OLE DB Source from toolbox to data flow region. Double click on OLE DB source in the data flow region will open the connection manager settings and provides an option to select SQL Table or SQL statement.
Here we selected the Employees database and previously shown Fuzzy Source table as our OLE DB source database and table.
STEP 3: Click on columns tab to verify the columns. In this tab we can uncheck the unwanted columns also.
Drag and drop the Fuzzy Lookup Transformation in SSIS toolbox to data flow region and connect the OLE DB source output arrow to it.
STEP 4: Double click on the SSIS Fuzzy Lookup Transformation will open the Fuzzy Lookup Transformation Editor. In the reference table tab, We have to configure the connection manager and also select the fuzzy lookup table. To do so, this tab has multiple options.
- OLE DB Connection Manager: SSIS Fuzzy Lookup Transformation only supports OLE DB Connection Manager to establish connection with the Reference table. So create connection using OLE DB Connection Manager.
- Generate new index: SSIS Fuzzy Lookup transformation will generate new index on the reference table to use for the Fuzzy lookup.
- Reference table name: From the drop down list select the reference table from the OLE DB connection
- Store new index: If you want to save the new index on the Fuzzy lookup table then select this option.
- New index name: If you have chosen Store New index option then write the index name here.
- Maintain stored index: If you want the SQL Server to maintain the newly created index using Store New index option then check mark this option.
- Use existing index:If you want to use the existing (already created) index on the fuzzy lookup table then select this option
- Name of an existing index: Please select the existing index from the drop down list
From the above screenshot you can observe that, We configured the OLE DB connection manager to Employees database and then selected the Fuzzy Lookup Table as a reference table.
STEP 5: Check mark the columns if you wish to pass through and Join the source column (Country Name) to the reference table column name (again Country Name in this example). If you want to change the output column then change the alias name like we changed in the below screenshot
If you required both changed and unchanged country name columns in the destination folder, check mark the pass through option country name column
STEP 6: In the advanced tab, we can configure the similarity threshold by dragging the bar pointer to required threshold. Available options in the Advanced tab are as follows:
- Maximum number of matches to output per lookup: Please specify the Maximum number of matches the SSIS Fuzzy Lookup Transformation may return. The default is 1.
- Similarity threshold: Similarity threshold ranges between 0 and 1 where 1 is exact match. The SSIS Fuzzy Lookup Transformation Editor provides slider to adjust the similarity between 0 and 1. If similarity threshold is closer to 1 then source column should match more accurately to reference data.
Click ok to finish configuring the Fuzzy lookup Transformation in SSIS. Now drag and drop the OLE DB destination into the data flow region.
The SSIS Fuzzy Transformation Editor generated 2 extra columns along with the existing columns
- _Similarity: Similarity score is between 0 and 1. It will show you, How much accurate the source data is matched with the fuzzy lookup data. For instance, 0.80 means source data is 80% matched with destination.
- _Confidence: Confidence Score show you, How much confident Fuzzy lookup Transformation is about the best match in fuzzy lookup table. For instance, 0.50 means Fuzzy lookup Transformation is 50% confident that, Australia is best match for Australi.
STEP 7: Now we have to provide Server, database and table details of the destination. So double click on the OLE DB Destination and provide the required information
Here we selected the [SSIS Tutorials] database as our destination database and selected the destination table.
STEP 8: Click on Mappings tab to check whether the source columns are exactly mapped to the destination columns.
Click ok to finish designing the Fuzzy Lookup transformation in SSIS package. Let us run the package
Let’s open the SQL Server Management Studio and select the required table to check the results
SELECT [Cid] ,[Changed CountryName] ,[_Similarity] ,[_Confidence] ,[_Similarity_CountryName] FROM [SSIS Tutorials].[dbo].[Fuzzy Lookup Transformation Output]