Problem with SCD — Neal Waterstreet

Problem

A common (and very well documented) best practice for performance is to specify the columns in a SQL statement instead of using SELECT *. This applies to SSIS in the Data Flow Task, but there are other reasons to avoid this too.

At a recent client the current member row was being recreated daily for a SCD 2 dimension. The data in the source rarely changed, so it was obvious that something was wrong.

My first step was to look at the code, including the Conditional Split task. All of the logic there looked correct – all of the comparisons appeared to be functioning as intended and NULL values were being accounted for in the code.

I then checked some of the columns to determine if there was a problem with spaces by using the RTRIM() and LTRIM() functions. That reminded me that there is still not a TRIM() function. All of this still looked to be functioning correctly.

Finally I checked the Source and Destination Connection Managers and found they were configured to use tables as a source. The end result of this was that the current member rows were not filtered in the Destination and all rows were incuded.

Solution

I changed the Destination SQL statement to select only the current member rows selected in the WHERE clause. This resolved the issue and the SCD 2 no longer created duplicate rows.

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Leave a Reply Cancel reply