The Fuzzy Grouping Transformation in SSIS is used to replace the wrongly typed words with correct words. For example, if 98 people typed the country name as India and two people types as Indi, then SSIS Fuzzy Grouping Transformation will replace the Indi with India.
Before we start configuring the SSIS Fuzzy Grouping Transformation, let us see some important properties of this:
- This transformation uses the Grouping technique to replace the wrong word in source data with the correct word.
- This transformation allows us to use only DT_WSTR and DT_STR Data type column for Fuzzy Matching, and Exact matching can apply to any data type except DT_TEXT, DT_NTEXT, and DT_IMAGE.
- It creates temporary tables and indexes in the SQL Server database at runtime.
- This transformation will only use the OLE DB Connection Manager to establish a connection to store the temporary tables and indexes.
- To configure the transformation, you must select the Match Type (Fuzzy or Exact) for an input columns
TIP: Fuzzy Grouping Transformation in SSIS has one input and one output. It does not support an error output.
This SSIS transformation is the same as the Fuzzy Lookup Transformation. However, SSIS Fuzzy Grouping Transformation does not require any reference table to correct the data. It will use the grouping technique to check for the wrongly typed words (type mistakes) and correct them.
Fuzzy Grouping Transformation in SSIS Example
In this example, we are going to show you how to configure Fuzzy Grouping Transformation in SQL Server Integration Services to correct the wrongly typed words (Typo mistakes)
Data we are going to use for this SSIS Fuzzy Grouping Transformation is:
STEP 1: Open BIDS and Drag and drop the data flow task from the toolbox to control flow and rename it as Fuzzy Grouping Transformation in SSIS.
Double click on it, and SSIS will open the data flow tab.
STEP 2: Drag and drop OLE DB Source, Fuzzy Grouping transformation from the SSIS toolbox to the data flow region
STEP 3: Double click on OLE DB source in the data flow region will open the connection manager settings and provides an option to select SQL Table or space to write our SQL statement.
Here we selected the Employees database as our source database and [Fuzzy Source] as SQL table.
STEP 4: Click on the columns tab to verify the columns. In this tab, we can uncheck the unwanted columns.
STEP 5: Right-click on the SSIS Fuzzy Grouping Transformation will open the Fuzzy Grouping Transformation Editor window to configure it.
Within the Connection Manager tab, we have to configure the OLE DB connection Manager setting. Within this source, SSIS Fuzzy Grouping Transformation will create a temporary table and indexes to perform the Fuzzy Grouping operation.
STEP 6: Within the Columns Tab, We have to configure the match Type. You will need to select the columns that you want to group as Fuzzy Match Type and other columns as Exact match. For this example, the Country name is a string, and we want to find the wrong values in this column. So, change the match type to Fuzzy, and Cid is the Int value, so the match type is Exact.
Two parameters are important when you are Configuring a fuzzy grouping Transformation. They are Numerals and Comparison Flags.
From the below screenshot, you can see the available options in this.
- Ignore case: If we check mark this option, Fuzzy Grouping will ignore the case. Both XYZ and xyz will be the same.
- Ignore kana type: This option of SSIS Fuzzy Grouping ignores the difference between the Japanese hiragana and katakana.
- Ignore non spacing characters: If we check mark this option then Fuzzy Grouping will ignore the difference between the diacritics and character
- Ignore character width: If we checkmark this option, SSIS Fuzzy Grouping ignores the difference between single-byte character and double-byte character.
- Ignore symbols: SSIS Fuzzy Grouping will ignore the difference between the letters and symbols (white spaces, punctuations, currency symbols, and mathematical symbols). For example, *xy treated the same as xy
- Sort punctuation as symbols: Fuzzy Grouping will all the punctuation symbols (except apostrophe and hyphen) write before the letters. For example, .xyz will sort before the xyz
In this option, We have to specify the significance of starting and ending numerals while comparing the column data. For example, if leading numerals are significant, “93 New lands Street” will not be grouped with “99 New lands Street”
|Neither||Numerals at the starting and ending positions ignored while grouping.|
|Leading||Numerals at the starting position will ignore while grouping.|
|Trailing||Numerals at the ending positions ignored while grouping.|
|LeadingAndTrailing||Numerals at the starting and ending positions will not ignore while grouping.|
STEP 7: Within the Advanced Tab, we have to configure the Similarity Threshold. The similarity threshold ranges between 0 and 1, where 1 is an exact match. The SSIS Fuzzy Grouping Transformation Editor provides a slider to adjust the similarity between 0 and 1. If the similarity threshold is closer to 1, the source column should match more accurately to reference data.
In this example, We are doing Fuzzy Grouping on Country Name and find the fuzzy match. If we give a Similarity threshold as 0.76, the string column values should match more than 76%, then only it will treat as a valid record.
The SSIS Fuzzy Grouping Transformation produces additional columns along with existing columns, and they are:
- Input key column name: Provide the Unique name for Input Key Column. key_inis the default name.
- Output key column name: Provide the Unique name for Output Key Column. key_Outis the default name.
- Similarity score column name: Provide the Unique name for the similarity score column. _score is a value between 0 and 1. It will indicate the similarity of the input row to the canonical row.
For the time being, we left them to default values and Click ok to finish configuring the SSIS Fuzzy Grouping Transformation. Now drag and drop the OLE DB destination into the data flow region.
STEP 8: Now, we have to provide the Server, database, and table details of the destination. So double-click on the OLE DB Destination and provide the required information
Here we selected the [SSIS Tutorials] database from SQL Server as our destination database and [Fuzzy Grouping Transformation Output] table as our destination table.
STEP 9: Click on the Mappings tab to check whether the source columns exactly mapped to the destination columns.
STEP 10: Click ok to finish designing the SSIS Fuzzy Grouping Transformation package. Let us run the package