Tutorial Gateway

  • C
  • C#
  • Java
  • Python
  • SQL
  • MySQL
  • Js
  • BI Tools
    • Informatica
    • Talend
    • Tableau
    • Power BI
    • SSIS
    • SSRS
    • SSAS
    • MDX
    • R Tutorial
    • Alteryx
    • QlikView
  • More
    • C Programs
    • C++ Programs
    • Go Programs
    • Python Programs
    • Java Programs

Fuzzy Grouping Transformation in SSIS

by suresh

The Fuzzy Grouping Transformation in SSIS is used to replace the wrongly typed words with correct words. For example, if 98 people typed the country name as India and two people types as Indi, then SSIS Fuzzy Grouping Transformation will replace the Indi with India.

Before we start configuring the SSIS Fuzzy Grouping Transformation, let us see some important properties of this:

  • This transformation uses the Grouping technique to replace the wrong word in source data with the correct word.
  • This transformation allows us to use only DT_WSTR and DT_STR Data type column for Fuzzy Matching, and Exact matching can apply to any data type except DT_TEXT, DT_NTEXT, and DT_IMAGE.
  • It creates temporary tables and indexes in the SQL Server database at runtime.
  • This transformation will only use the OLE DB Connection Manager to establish a connection to store the temporary tables and indexes.
  • To configure the transformation, you must select the Match Type (Fuzzy or Exact) for an input columns

TIP: Fuzzy Grouping Transformation in SSIS has one input and one output. It does not support an error output.

This SSIS transformation is the same as the Fuzzy Lookup Transformation. However, SSIS Fuzzy Grouping Transformation does not require any reference table to correct the data. It will use the grouping technique to check for the wrongly typed words (type mistakes) and correct them.

Fuzzy Grouping Transformation in SSIS Example

In this example, we are going to show you how to configure Fuzzy Grouping Transformation in SQL Server Integration Services to correct the wrongly typed words (Typo mistakes)

Data we are going to use for this SSIS Fuzzy Grouping Transformation is:

SSIS Fuzzy Grouping Source Data

STEP 1: Open BIDS and Drag and drop the data flow task from the toolbox to control flow and rename it as Fuzzy Grouping Transformation in SSIS.

SSIS Fuzzy Grouping Transformation 1

Double click on it, and SSIS will open the data flow tab.

STEP 2: Drag and drop OLE DB Source, Fuzzy Grouping transformation from the SSIS toolbox to the data flow region

SSIS Fuzzy Grouping Transformation 2

STEP 3: Double click on OLE DB source in the data flow region will open the connection manager settings and provides an option to select SQL Table or space to write our SQL statement.

SSIS Fuzzy Grouping Transformation 3

Here we selected the Employees database as our source database and [Fuzzy Source] as SQL table.

STEP 4: Click on the columns tab to verify the columns. In this tab, we can uncheck the unwanted columns.

SSIS Fuzzy Grouping Transformation 4

STEP 5: Right-click on the SSIS Fuzzy Grouping Transformation will open the Fuzzy Grouping Transformation Editor window to configure it.

Within the Connection Manager tab, we have to configure the OLE DB connection Manager setting. Within this source, SSIS Fuzzy Grouping Transformation will create a temporary table and indexes to perform the Fuzzy Grouping operation.

SSIS Fuzzy Grouping Transformation 5

STEP 6: Within the Columns Tab, We have to configure the match Type. You will need to select the columns that you want to group as Fuzzy Match Type and other columns as Exact match. For this example, the Country name is a string, and we want to find the wrong values in this column. So, change the match type to Fuzzy, and Cid is the Int value, so the match type is Exact.

Fuzzy Grouping Transformation in SSIS 6

Two parameters are important when you are Configuring a fuzzy grouping Transformation. They are Numerals and Comparison Flags.

Comparison Flags

From the below screenshot, you can see the available options in this.

  • Ignore case: If we check mark this option, Fuzzy Grouping will ignore the case. Both XYZ and xyz will be the same.
  • Ignore kana type: This option of SSIS Fuzzy Grouping ignores the difference between the Japanese hiragana and katakana.
  • Ignore non spacing characters: If we check mark this option then Fuzzy Grouping will ignore the difference between the diacritics and character
  • Ignore character width: If we checkmark this option, SSIS Fuzzy Grouping ignores the difference between single-byte character and double-byte character.
  • Ignore symbols: SSIS Fuzzy Grouping will ignore the difference between the letters and symbols (white spaces, punctuations, currency symbols, and mathematical symbols). For example, *xy treated the same as xy
  • Sort punctuation as symbols: Fuzzy Grouping will all the punctuation symbols (except apostrophe and hyphen) write before the letters. For example, .xyz will sort before the xyz
Fuzzy Grouping Transformation in SSIS 7

Numerals

In this option, We have to specify the significance of starting and ending numerals while comparing the column data. For example, if leading numerals are significant, “93 New lands Street” will not be grouped with “99 New lands Street”

ValueDescription
NeitherNumerals at the starting and ending positions ignored while grouping.
LeadingNumerals at the starting position will ignore while grouping.
TrailingNumerals at the ending positions ignored while grouping.
LeadingAndTrailingNumerals at the starting and ending positions will not ignore while grouping.
Fuzzy Grouping Transformation in SSIS 8

STEP 7: Within the Advanced Tab, we have to configure the Similarity Threshold. The similarity threshold ranges between 0 and 1, where 1 is an exact match. The SSIS Fuzzy Grouping Transformation Editor provides a slider to adjust the similarity between 0 and 1. If the similarity threshold is closer to 1, the source column should match more accurately to reference data.

In this example, We are doing Fuzzy Grouping on Country Name and find the fuzzy match. If we give a Similarity threshold as 0.76, the string column values should match more than 76%, then only it will treat as a valid record.

Fuzzy Grouping Transformation in SSIS 9

The SSIS Fuzzy Grouping Transformation produces additional columns along with existing columns, and they are:

  • Input key column name: Provide the Unique name for Input Key Column. key_inis the default name.
  • Output key column name: Provide the Unique name for Output Key Column. key_Outis the default name.
  • Similarity score column name: Provide the Unique name for the similarity score column. _score is a value between 0 and 1. It will indicate the similarity of the input row to the canonical row.

For the time being, we left them to default values and Click ok to finish configuring the SSIS Fuzzy Grouping Transformation. Now drag and drop the OLE DB destination into the data flow region.

STEP 8: Now, we have to provide the Server, database, and table details of the destination. So double-click on the OLE DB Destination and provide the required information

Fuzzy Grouping Transformation in SSIS 10

Here we selected the [SSIS Tutorials] database from SQL Server as our destination database and [Fuzzy Grouping Transformation Output] table as our destination table.

STEP 9: Click on the Mappings tab to check whether the source columns exactly mapped to the destination columns.

Fuzzy Grouping Transformation in SSIS 11

STEP 10: Click ok to finish designing the SSIS Fuzzy Grouping Transformation package. Let us run the package

Fuzzy Grouping Transformation in SSIS 12

Placed Under: SSIS

  • What is SSIS
  • Install SQL Server Data Tools
  • Learn SSIS in 28 Days
  • Create a SSIS Project
  • Create SSIS Package
  • SSIS Connection Manager
  • OLE DB Connection Manager
  • SSIS ADO Connection Manager
  • ADO.NET Connection Manager
  • SSIS Cache Connection Manager
  • SSIS Excel Connection Manager
  • SSIS File Connection Manager
  • SSIS FTP Connection Manager
  • SSIS SMO Connection Manager
  • Source Assistance in SSIS
  • Flat File Source in SSIS
  • OLE DB Source in SSIS
  • Excel Source in SSIS
  • ADO.NET Source in SSIS
  • SSIS FLAT FILE Destination
  • SSIS OLE DB Destination
  • SSIS ADO.NET Destination
  • SSIS Transformations
  • SSIS Audit Transformation
  • SSIS Aggregate Transformation
  • SSIS Aggregate Advanced Mode
  • SSIS Cache Transformation
  • SSIS Character Map
  • SSIS Conditional Split
  • SSIS Copy Column
  • SSIS Data Conversion
  • SSIS Derived Column
  • SSIS Export Column
  • SSIS Fuzzy Grouping
  • SSIS Fuzzy Lookup
  • SSIS Import Column
  • SSIS Lookup Introduction
  • Lookup – OLE DB Connection
  • SSIS Lookup in Full Cache Mode
  • SSIS Lookup – Case Sensitivity
  • SSIS Merge Transformation
  • SSIS Merge Join Transformation
  • SSIS Merge Join – Left Outer Join
  • SSIS Merge – Right Outer Join
  • SSIS Merge Join – Full Outer Join
  • SSIS Multicast Transformation
  • SSIS OLEDB Command
  • OLEDB Command – Delete Data
  • OLEDB Command-Update Data
  • SSIS Percentage Sampling
  • SSIS Pivot Transformation 2008
  • SSIS Pivot Transformation
  • SSIS Row Count Transformation
  • SSIS Row Sampling
  • Script Component as Source
  • Script Component as Destination
  • SSIS Script as Transformation
  • SSIS Sort Transformation
  • SSIS SCD Type 0
  • SSIS SCD Type 1
  • SSIS SCD Type 2
  • SSIS Term Lookup
  • SSIS Term Extraction Intro
  • Term Extraction – Extract Nouns
  • SSIS Extract Noun Phrases
  • Extract Nouns & Noun Phrases
  • Term Extraction – Exclusion Tab
  • SSIS Unpivot Transformation
  • SSIS Union All Transformation
  • SSIS For Loop Container
  • SSIS ForEach File Enumerator
  • SSIS ForEach SMO Enumerator
  • SSIS ForEach Variable
  • SSIS Foreach NodeList
  • Foreach ADO.NET Schema Rowset Enumerator
  • SSIS Bulk Insert Task
  • SSIS Data Profiling Task
  • Execute T-SQL Statement Task
  • SSIS Execute SQL Task Intro
  • SSIS Execute SQL Task Example
  • Execute SQL Task- Single Rowset
  • Execute SQL Task – Full Row Set
  • SSIS Execute Package Task
  • Execute Packages in SQL Server
  • Execute Packages in File System
  • SSIS Execute Package Project Reference
  • SSIS File System Task
  • SSIS File System Task- Copy Files
  • File System Task-Copy Directory
  • File System Task – Delete Files
  • File System Task – Delete Folder
  • File System Task -Move Directory
  • SSIS File System Task -Move File
  • File System – Move Multiple files
  • File System Task – Rename File
  • File System Task – Set Attributes
  • SSIS FTP TASK
  • SSIS Create Local Directory
  • SSIS Create Remote Directory
  • SSIS FTP Task Send Files
  • SSIS FTP – Send Multiple Files
  • SSIS FTP Task Delete Local Files
  • FTP TASK Delete Local Directory
  • FTP Task Delete Remote files
  • SSIS Delete Remote Directory
  • SSIS FTP Task Receive Files
  • SSIS FTP Receive Multiple Files
  • SSIS Script Task
  • Transfer SQL Server Objects Task
  • Transfer SQL Table Structures
  • Transfer SQL Tables with Data
  • Transfer SQL Stored Procedures
  • Transfer User Defined Functions
  • Transfer SQL Views in SSIS
  • SSIS Web Service Task
  • SSIS XML Task-Validate XML File
  • Transform XML File using XSLT
  • XML Task-XML files Differences
  • Create SSIS Catalog
  • Package Deployment using BIDS
  • Deploy Package Using SQL
  • Deploy using SQL Server Wizard
  • SSIS Breakpoints
  • SSIS Checkpoints
  • SSIS Error Handling
  • SSIS Event Handlers
  • SSIS Transactions
  • SSIS Logging
  • SSIS Parameters
  • SSIS Package Configuration
  • Configure using SQL Server
  • Config using Registry Entry
  • Conf with Environment Variable
  • SSIS XML Configuration File
  • XML Configuration File Part 2
  • SSIS Package Protection Level
  • SSIS Incremental Load
  • Incremental Load Example 2
  • SSIS Remove Double Quotes

Copyright © 2021 · All Rights Reserved by Suresh

About Us | Contact Us | Privacy Policy