As a Data warehouse
Administrator or ETL developer You will often come a cross to manage the Slowly
Changing Dimensions (SCD).
In this postI am going to provide you
the steps and guidance needed to manage Slowly Changing Dimension with Slowly
Changing Dimension Transformation in data flow task with an example.
Slowly Changing Dimensions (SCD)
Dimension is a term in data
management and data warehousing. It's the logical groupings of data such as Designation,
geographical location, customer or product information. With Slowly Changing
Dimensions (SCDs), data changes slowly rather than changing on a time-based,
regular schedule.
There are different type of SCD Types. Few are
SCD Type 1
SCD Type 2
SCD Type 1 :- Change the data in the dimension table.
-Easy to implement
-but the history is lost
e.g. Employee Designation Changes
In
Type 1 Slowly Changing Dimension, the new information simply overwrites the
original information. In other words, no history is kept.
Consider
the following Customer Dimension table:
After
Christina moved from Illinois to California, the new information replaces the
new record, and we have the following table:
SCD
Type 2 - Add new records/ rows to the dimension table
that contain the new data.
-leaves
history as it is.
-but
this causes the table to grow fast
In
Type 2 Slowly Changing Dimension, a new record is added to the table to
represent the new information. Therefore, both the original and the new record
will be present. The new record gets its own primary key.
Consider
the following Customer Dimension table:
After
Christina moved from Illinois to California on 15-JAN-2003, we add the new
information as a new row into the table:
SCD Type 3 - Add
new fields/ columns to the dimension
-helps keep part of the history
-the size of the table is not affected
-used when changes occur rarely
In
Type 3 Slowly Changing Dimension, there will be two columns to indicate the
particular attribute of interest, one indicating the original value, and one
indicating the current value.
Consider
the following Customer dimension:
After Christina moved from Illinois to California on 15th, the original information gets updated.
If Christina moved back to Illinois from California on 25th:
Using
Slowly Changing Dimension Transformation
Let’s
first create a Employee table, which will have source data and add some data to
it; as you can notice below, I have added the EmployeeCode column, which is a
primary key at source and will work as a business key.
CREATE TABLE [dbo].[EMPLOYEE_SRC](
[EmployeeCode] CHAR(8) PRIMARY KEY,
[Designation] [varchar](50) NULL,
[Location] [varchar](50) NULL,
)
Insert below two rows to the table.
INSERT INTO [dbo].[EMPLOYEE_SRC] ([EmployeeCode], [Designation], [Location])
VALUES
('E0000001', 'Engineer', 'USA'),
('E0000002', 'Engineer', 'USA')
Now creates the Dimension table to capture the historical data.
CREATE TABLE [dbo].[DimEmployee](
[EmployeeId] [int] IDENTITY(1,1) NOT NULL,
[EmployeeCode] CHAR(8),
[Designation] [varchar](50) NULL,
[Location] [varchar](50) NULL,
[EffectiveDate] [date] NULL,
[ExpirationDate] [date] NULL,
[CurrentFlag] [char](1) NULL,
CONSTRAINT [PK_DimEmployee] PRIMARY KEY CLUSTERED ([EmployeeId] ASC)
)
Then you can create the SSIS project using BIDS. Then Drag and drop the Data Flow task to Control Flow tab. Then go to the Data Flow task tab and Create OLE DB Source and connect with [EMPLOYEE_SRC] table. Refer the below screenshot.
Then Drag and Drop the Slowly Changing Dimension component to Data Flow Tab. Then Connect with OLE DB Source.
Double click On the Slowly Changing Dimension component and click on the Next Button.
Click Next.Set EmployeeCode as Business key.
Give the Slowly Dimension Columns and Click Next as Below screen.
Set the Fixed and Changing Attribute. Click Next.
In the Historical Attribute Option Select the 'Use start and end dates to identify current and expired records'. Give relevant fields as below. Click Next.
Unchecked the Enable inferred member support check box.Click Next
Click the Finish button on the Slowly Changing Dimension Wizard.
Here
is what you will see in the data flow task:
Then Execute the Package and Inserted two rows will be available in the Dimension Table.
Only Two rows available in the Database.
Database Output.
Then Execute the Below Update scrip to updated the Employee (E0000001) in the Above table.
UPDATE [dbo].[EMPLOYEE_SRC]
SET [Designation] = 'Senior Engineer'
WHERE [EmployeeCode] = 'E0000001'
Again Execute the Package and output will be as below.only one row will be transfor from Employee table to Employee Dimension.
Database Output.
So it has captured history data and present data also.
But, this mechanism has
some limitation. Main limitation is
It can be very slow
especially for larger dimensions as there is no caching of the lookup data.
In my Next Post i'll Explain how to over come this limitation.
MSBI : SQL Server Integration Service (SSIS) - Slowly Changing Dimension Transformation
Reviewed by Pubudu Dewagama
on
11:26:00 PM
Rating:
ReplyDeleteThrough this post, I know that your good knowledge in playing with all the pieces was very helpful. I notify that this is the first place where I find issues I've been searching for. You have a clever yet attractive way of writing on Msbi online training
Msbi online training Hyderabad
Msbi online training India
Msbi online course
Msbi certification training
ReplyDeleteInteresting blog which attracted me more.Spend a worthful time.keep updating more.
This is important thing because when we are running a business, first we have to make our focusing area and what we are ready for serve among the people.
Thank you and i am looking forward your more information here
Visit:Best power bi training in Hyderabad