How to Organize Data Using an Unpivot Function

When you are constructing your data warehouse you will likely encounter data from multiple data sources, some of which have data in undesirable formats. For example, some database tables might be populated from Excel sheets that weren’t exactly designed to fit cleanly into a table, much less a data warehouse table. A budgeting Excel sheet that needs to be imported may look something like this:

Image

This budget data will display as an identical table and structure if no transformations are made during the import. As you can see, this is not an extensible design and it definitely doesn’t belong in a data warehouse without some alterations. As the budget years are added the ETL process will need to be changed to handle the new columns. This will also grow the table to an unmanageable size over a period of many years.

An easy solution to this dilemma is to use the UNPIVOT function to restructure the data and organize it in a manner that can be used in your data warehouse. After you have extracted the data in this budget Excel sheet into a staging table you will be able to reorganize it. Using our basic table example the UNPIVOT function would be:

Image

The PIVOT function will take the rows in a table and pivot them into columns. So intuitively, the UNPIVOT function will do the opposite; it takes the columns in a table and pivots them into rows. After running the above UNPIVOT function your data set will become:Image

This seems easy enough, but remember each column that is to be pivoted into a row must have matching data types and sizes. When your data types and sizes don’t match exactly you must first convert all of your columns in your select statement to have a matching data type before running the UNPIVOT function. Mismatched data types could be the result of the source data being incorrectly assigned inconsistent data types. The easiest way that I have found to accomplish this is to first use a common table expression (CTE) to convert the data and then use your CTE as the source for your selection in your UNPIVOT function. 

Using the above example let’s assume that the 2012 data is a varchar(25) and the other yearly data is a float data type. The new UNPIVOT script would look like:

Image

This method is easy to read and understand, so for coding clarity I like to use it. I’m sure there are multiple ways to solve this type of issue so please feel free to comment and share any other ways that you have found that work well for you.

I hope this has been helpful, have a great day!

Exactly what is a Slowly Changing Dimension (SCD)?

Just the sound of the phrase “slowly changing dimension“ can sound quite intimidating when uttered in a technical meeting, but you might be shocked at how simple this concept really is. In fact, it has nothing to do with shifts or alterations in the universe at all. J

If you are involved in data warehousing you should have a firm grasp of SCDs as they will be necessary to structure your data in a meaningful way that takes advantage of the capabilities of a data warehouse; specifically when you wish to store historical data rather than simply overwriting a record. So, let’s break it down:

Slowly Changing Dimensions: Dimensions in your DW that include attributes that will gradually change over a period of time.

There are three types:

Type I – Overwrite old values: The old record is completely over written by the new record. Type I is used when you have no desire to store the historical attributes of the record.

Type II – Create an additional record: The old record will be kept and a new record will be inserted. Type II is used when you wish to store the historical attributes of the record.

Type III – Create new fields: This type creates new fields in the record to maintain the history and the current values of the specific attributes in the record.

Below are examples of how to use each type. In this example the price of a product is changing.

Type I: The old values are over written

Product price in 2011

Product price in 2012

Type II: A new record is created.

Product price in 2011

Product price in 2012

As you can see in the above example there are a few problems with the structure of the table. When using a type II SCD you will need to add a couple of basic fields to effectively track the historical records. There are several possibilities, but the idea is to be able to differentiate between the two records. Also you may notice that there are two identical IDs for this record. That is OK. This is still the same product so a common identifier is necessary. We will be able to narrow down our selection criteria by choosing the active record. A more realistic type II record would look like the below.

Product price in 2012

Type III: New fields are inserted

Product price in 2011

Product price in 2012

There are many scenarios in business where each of these SCD types will be useful. The most common are types I and II because type III can only store a limited amount of historical data.

I hope that this explanation has helped to simplify your understanding of how to use slowly changing dimensions in data warehousing.