Business Intelligence:
Dimensional modeling is the design concept used by many data warehouse designers to build their data warehouse. Dimensional model is the underlying data model used by many of the commercial OLAP products available today in the market. In this model, all data is contained in two types of tables called Fact Table and Dimension Table.
Fact table contains the measurements or metrics or facts of business processes. If your business process is Sales, then a measurement of this business process such as "monthly sales number" is captured in the fact table. In addition to the measurements, the only other things a fact table contains are foreign keys for the dimension tables.
Context of the measurements are represented in dimension tables. You can also think of the context of a measurement as the characteristics such as who, what, where, when, how of a measurement (subject ). In your business process Sales, the characteristics of the 'monthly sales number' measurement can be a Location (Where), Time (When), Product Sold (What).
The Dimension Attributes are the various columns in a dimension table. In the Location dimension, the attributes can be Location Code, State, Country, Zip code. Generally the Dimension Attributes are used in report labels, and query constraints such as where Country='USA'. The dimension attributes also contain one or more hierarchical relationships.
Before designing
your data warehouse, you need to decide what this data warehouse
contains. Say if you want to build a data warehouse containing monthly
sales numbers across multiple store locations, across time and across
products then your dimensions are:
Location
Time
Product
Each dimension table contains data for one dimension. In the above example you get all your store location information and put that into one single table called Location. Your store location data may be spanned across multiple tables in your OLTP system (unlike OLAP), but you need to de-normalize all that data into one single table.
In Online Transaction Processing (OLTP), the database is designed to achieve efficient transactions such as INSERT and UPDATE. This is very different from the OLAP design.
Unlike OLAP, normalization is very important to reduce duplicates and also cut down on the size of the data. our OLTP schema may look like this:
|
Field Name |
Type |
|
Loc_Id |
INTEGER (4) |
|
Loc_Code |
VARCHAR (5) |
|
Loc_Name |
VARCHAR (30) |
|
State_Id |
INTEGER (4) |
|
Country_Id |
INTEGER (4) |
|
Field Name |
Type |
|
Sate_Id |
INTEGER (4) |
|
State_Name |
VARCHAR (50) |
|
Field Name |
Type |
|
Country_Id |
INTEGER (4) |
|
Country_Name |
VARCHAR (50) |

In order to query for all locations that are in country 'USA' we will have to join these three tables. The SQL will look like:
SELECT * FROM Locations, States, Countries where Locations.State_Id = States.State_Id AND Locations.Country_id=Countries.Country_Id and Country_Name='USA'
Dimensional modeling allows only one table per dimension. But your OLTP data spans across multiple tables as described. So we need de-normalize the OLTP schema and export into your Dimension Tables.
For example, for the location dimension, you achieve this by joining the three OLTP tables and inserting the data into the single Location table.
Your Location Table will
look like this:
Location Dimension Table Schema
|
Field Name |
Type |
|
Dim_Id |
INTEGER (4) |
|
Loc_Code |
VARCHAR (4) |
|
Name |
VARCHAR (50) |
|
State_Name |
VARCHAR (20) |
|
Country_Name |
VARCHAR (20) |
All Dimension tables contain a key column called the dimension key. In this example Dim_Id is our dimension Id. This is the unique key into our Location dimension table.
The actual data in your
Location Table may look like this
Location Dimension Table Data
|
Dim_Id |
Loc_Code |
Name |
State_Name |
Country_Name |
|
1001 |
IL01 |
Chicago Loop |
Illinois |
USA |
|
1002 |
IL02 |
Arlington Hts |
Illinois |
USA |
|
1003 |
NY01 |
Brooklyn |
New York |
USA |
|
1004 |
TO01 |
Toronto |
Ontario |
Canada |
|
1005 |
MX01 |
Mexico City |
Distrito Federal |
Mexico |
You may notice that some of the information is repeated in the above dimension table. The State Name and Country Name are repeated through out the table. You may feel that this is waste of data space and against the normalization principles. But in dimensional modeling this type of design makes the querying very optimized and reduces the query times. Also we will learn later that in a typical data warehouse, the dimension tables make up only 10 to 15 % of the storage as the fact table is by far the largest table and takes up the rest of the storage allocation.
After de-normalization,
your Time table will look like this:
Time Dimension Table Schema
|
Field Name |
Type |
|
Dim_Id |
INTEGER (4) |
|
Month |
SMALL INTEGER (2) |
|
Month_Name |
VARCHAR (3) |
|
Quarter |
SMALL INTEGER (4) |
|
Quarter_Name |
VARCHAR (2) |
|
Year |
SMALL INTEGER (2) |
The actual data in your
Time Table may will look like this:
Time Dimension Data
|
TM _Dim_Id |
TM _Month |
TM _Month_Name |
TM _Quarter |
TM _Quarter_Name |
TM_Year |
|
1001 |
1 |
Jan |
1 |
Q1 |
2003 |
|
1002 |
2 |
Feb |
1 |
Q1 |
2003 |
|
1003 |
3 |
Mar |
1 |
Q1 |
2003 |
|
1004 |
4 |
Apr |
2 |
Q2 |
2003 |
|
1005 |
5 |
May |
2 |
Q2 |
2003 |
After de-normalization,
your Product table will look like this:
Product Dimension Table Schema
|
Field Name |
Type |
|
Dim_Id |
INTEGER (4) |
|
SKU |
VARCHAR (10) |
|
Name |
VARCHAR (30) |
|
Category |
VARCHAR (30) |
In this table PR_Dim_Id
is our dimension
Id. This is the unique key into our Product dimension table.
The actual data in your Product Table may look like this:
Product Dimension Table Data
|
Dim_Id |
SKU |
Name |
Category |
|
1001 |
DOVE6K |
Dove Soap 6Pk |
Sanitary |
|
1002 |
MLK66F# |
Skim Milk 1 Gal |
Dairy |
|
1003 |
SMKSAL55 |
Smoked Salmon 6oz |
Meat |
Fact table contains the actual business process measurements or metrics called facts. Usually these facts are numeric.
These facts are generally Additive.
Some times the facts are semi additive such as balances
Some times they are non additive such as unit price
In the above example where you are building a data warehouse of monthly sales in dollars, your fact table will contain the actual sales numbers, one row per month. In addition to the data itself, you will have the foreign keys for the various dimensions in this row.
Granularity or Grain of Fact Table
The level of detail of
the fact table is known as the grain of the fact table. In this example the
grain of your fact table is monthly sales number per location per product.
Your Fact Table will look like this
MonthlySales Fact Table Schema
|
Field Name |
Type |
|
TM_Dim_Id |
INTEGER (4) |
|
PR_ Dim_Id |
INTEGER (4) |
|
LOC_ Dim_Id |
INTEGER (4) |
|
Sales |
INTEGER (4) |
In this table the
combination of all three dimension table foreign keys make up the primary key
in the fact table. (TM_Dim_Id, PR_ Dim_Id, LOC_ Dim_Id) is our primary key.
This is the unique key into our Sales fact table.
The actual data in your MonthlySales Table will look like this:
MonthlySales Fact Table Data
|
TM_Dim_Id |
PR_ Dim_Id |
LOC_ Dim_Id |
Sales |
|
1001 |
1001 |
1003 |
435677 |
|
1002 |
1002 |
1001 |
451121 |
|
1003 |
1001 |
1003 |
98765 |
|
1001 |
1004 |
1001 |
65432 |
What Is A Star Schema?
If you carefully look at our new dimensional modeled schema, it will look like this:

Advantages of Star Schema
· Star Schema is very easy to understand, even for non technical business managers
· Star Schema provides better performance and smaller query times
· Star schema is easily extensible and will handle future changes easily
A Typical SQL Query Template for the Sales Schema will look like:
--- Select the
measurements that you want to aggregate using SUM clause
SELECT P.Name,
SUM(F.Sales)
--- JOIN the FACT
table with Dimension Tables
FROM Sales F, Time T, Product P, Location L
WHERE F.TM_Dim_Id = T.Dim_Id
AND F.PR_Dim_Id = P.Dim_Id
AND F.LOC_Dim_Id = L.Dim_Id
--- Constrains the
Dimension Attributes
AND T.Month='Jan' AND T.Year='2003' AND L.Country_Name='USA'
-- finally the 'group by'
clause identifies the aggregation level. In this example you are aggregating
-- all sales within a product category.
GROUP BY P.Category

You can tell from the above report that all your dimensions are the row and
column headings in the report. The facts are the numeric numbers in the
report.
Advantages of Star Schema
· Star Schema is very easy to understand, even for non technical business managers
· Star Schema provides better performance and smaller query times
· Star schema is easily extensible and will handle future changes easily
If we did not de-normalize our dimensions into one table each, then the schema would have looked like this

You can see, this looks like a snow flake, hence this type of schema is called Snowflake Schema. General rule of thumb is keep away from snow flake schemas as even though they may save you some space, they will cost a lot in terms of query times.
1. First identify the business process.
In this step you will determine what is your business process that your data warehouse represents. This process will be the source of your metrics or measurements.
2. Identify the Grain
You will determine what does one row of fact table mean. In the previous example you have decided that your grain is 'monthly sales per location per product'.
3. Identify the Dimensions
Your dimensions should be descriptive (SQL VARCHAR or CHARACTER) as much as possible and confirm to your grain.
4. Finally Identify the facts
In this step you will identify what are your measurements (or metrics or facts). The facts should be numeric and should confirm to the grain defines in step 2.