The Fuzzy Grouping Transformation in SSIS is used to replace the wrongly typed words with correct words. This transformation is same as Fuzzy Lookup Transformation but Fuzzy Grouping Transformation does not require any reference table to correct the data. It will use the grouping technique to check for the wrongly typed words (type mistakes) and correct them. For example, if 98 people typed country name as India and 2 people types as Indi then Fuzzy Grouping Transformation will replace the Indi with India.
Before we start configuring the Fuzzy Grouping Transformation, let us see some important properties of this:
- This transformation uses Grouping technique to replace the wrong word in source data with the correct word.
- This transformation allows us to use only DT_WSTR and DT_STR Data type column for Fuzzy Matching and Exact matching can be applied to any data type except DT_TEXT, DT_NTEXT, and DT_IMAGE.
- It creates temporary tables and indexes in the SQL Server database at runtime.
- This transformation will only use OLE DB Connection Manager to establish connection to store the temporary tables and indexes.
- To configure the transformation, you must select the Match Type (Fuzzy or Exact) for an input columns
TIP: Fuzzy Grouping Transformation has one input and one output. It does not support an error output.
Fuzzy Grouping Transformation in SSIS Example
In this example we are going to show you, How to configure Fuzzy Grouping Transformation in SQL Server Integration Services to correct the wrongly typed words (Typo mistakes)
Data we are going to use for this Fuzzy Grouping Transformation is:
STEP 1: Open BIDS and Drag and drop the data flow task from the toolbox to control flow and rename it as Fuzzy Grouping Transformation in SSIS 2014.
Double click on it and it will open the data flow tab.
STEP 2: Drag and drop OLE DB Source, Fuzzy Grouping transformation from toolbox to data flow region
STEP 3: Double click on OLE DB source in the data flow region will open the connection manager settings and provides an option to select SQL Table or space to write our SQL statement.
Here we selected the Employees database as our source database and [Fuzzy Source] as SQL table.
STEP 4: Click on columns tab to verify the columns. In this tab we can uncheck the unwanted columns also.
TIP: If we don’t want any column then there is no point to add it in to your SQL command.
STEP 5: Right click on the Fuzzy Grouping Transformation will open the Fuzzy Grouping Transformation Editor window to configure it.
Within the Connection Manager tab we have to configure the OLE DB connection Manager setting. Within this source, Fuzzy Grouping Transformation will create temporary table and indexes to perform the Fuzzy Grouping operation.
STEP 6: Within the Columns Tab, We have to configure the match Type. You will need to select the columns that you want to group as Fuzzy Match Type and other columns as Exact match. For this example, Country name is string and we want to find the wrong values in this column. So, change the match type to Fuzzy and Cid is the Int value so the match type is Exact.
There are two parameters which are very important when you are Configuring fuzzy grouping Transformation and they are: Numerals and Comparison Flags.
From the below screenshot you can see the available options in this.
- Ignore case: If we check mark this option then Fuzzy Grouping will ignore the case. Both XYZ and xyz will be same.
- Ignore kana type: If we check mark this option then Fuzzy Grouping will ignore the difference between the hiragana and katakana in Japanese.
- Ignore non spacing characters: If we check mark this option then Fuzzy Grouping will ignore the difference between the diacritics and character
- Ignore character width: If we check mark this option then Fuzzy Grouping will ignore the difference between single-byte character and double-byte character.
- Ignore symbols: If we check mark this option then Fuzzy Grouping will ignore the difference between the letters and symbols (white spaces, punctuations, currency symbols and mathematical symbols). For example, *xy is treated same as xy
- Sort punctuation as symbols: If we check mark this option then Fuzzy Grouping will all the punctuation symbols (except apostrophe and hyphen) write before the letters. For example .xyz will sort before the xyz
In this option, We have to specify the significance of starting and ending numerals while comparing the column data. For example, if leading numerals are significant, “93 New lands Street” will not be grouped with “99 New lands Street”
|Neither||Numerals at the starting and ending position will be ignored while grouping.|
|Leading||Numerals at the starting position will be ignored while grouping.|
|Trailing||Numerals at the ending position will be ignored while grouping.|
|LeadingAndTrailing||Numerals at the starting and ending position will not be ignored while grouping.|
STEP 7: Within the Advanced Tab we have to configure the Similarity Threshold. Similarity threshold ranges between 0 and 1 where 1 is exact match. The Fuzzy Grouping Transformation Editor provides slider to adjust the similarity between 0 and 1. If similarity threshold is closer to 1 then source column should match more accurately to reference data. In this example, We are doing Fuzzy Grouping on Country Name and find the fuzzy match. If we give Similarity threshold as 0.76 the string column values should match more than 76% then only it will treat as a valid record.
The Fuzzy Grouping Transformation produces additional columns along with existing columns and they are:
- Input key column name: Provide the Unique name for Input Key Column. key_inis the default name.
- Output key column name: Provide the Unique name for Output Key Column. key_Outis the default name.
- Similarity score column name: Provide the Unique name for the similarity score column. _score is a value between 0 and 1. This will indicate the similarity of the input row to the canonical row.
For the time being we left them to default values and Click ok to finish configuring the Fuzzy Grouping Transformation. Now drag and drop the OLE DB destination into the data flow region.
STEP 8: Now we have to provide Server, database and table details of the destination. So double-click on the OLE DB Destination and provide the required information
Here we selected the [SSIS Tutorials] database from SQL Server as our destination database and [Fuzzy Grouping Transformation Output] table as our destination table.
STEP 9: Click on Mappings tab to check whether the source columns are exactly mapped to the destination columns.
STEP 10: Click ok to finish designing Fuzzy Grouping Transformation package. Let us run the package
Thank you for Visiting Our Blog