Sunday 8 March 2015

Kimball vs Inmon data warehouse models

This article is copied from search business intelligence blog

Bill Inmon’s enterprise data warehouse approach (the top-down design): A normalized data model is designed first. Then the dimensional data marts, which contain data required for specific business processes or specific departments are created from the data warehouse.

Ralph Kimball’s dimensional design approach (the bottom-up design): The data marts facilitating reports and analysis are created first; these are then combined together to create a broad data warehouse.

Kimball vs Inmon data warehouse models
Which one to pick ?

Insurance: It is vital to get the overall picture with respect to individual clients, groups, history of claims, mortality rate tendencies, demography, profitability of each plan and agents, etc. All aspects are inter-related and therefore suited for the Inmon’s approach.

Marketing: This is a specialized division, which does not call for enterprise warehouse. Only data marts are required. Hence, Kimball’s approach is suitable.

CRM in banks: The focus is on parameters such as products sold, up-sell and cross-sell at a customer-level. It is not necessary to get an overall picture of the business. For example, there is no need to link a customer’s details to the treasury department dealing with forex transactions and regulations. Since the scope is limited, you can go for Kimball’s method. However, if the entire processes and divisions in the bank are to be linked, the obvious choice is Inmon’s design vs. Kimball’s.

Manufacturing: Multiple functions are involved here, irrespective of the budget involved. Thus, where there is a systemic dependency as in this case, an enterprise model is required. Hence Inmon’s method is ideal.

SQL joins explained

It is always a confusing topic for SQL people. Joins are explained in an understandable manner by Coding Horror blog. I have copied the same content here so that I can read my own blog whenever I get confused with Joins.
Assume we have the following two tables. Table A is on the left, and Table B is on the right. We'll populate them with four records each.
id name       id  name
-- ----       --  ----
1  Pirate     1   Rutabaga
2  Monkey     2   Pirate
3  Ninja      3   Darth Vader
4  Spaghetti  4   Ninja

Let's join these tables by the name field in a few different ways and see if 
we can get a conceptual match to those nifty Venn diagrams. 
SELECT * FROM TableA
INNER JOIN TableB
ON TableA.name = TableB.name

id  name       id   name
--  ----       --   ----
1   Pirate     2    Pirate
3   Ninja      4    Ninja


Inner join produces only the set of records that match in both Table A and Table B.


SELECT * FROM TableA
FULL OUTER JOIN TableB
ON TableA.name = TableB.name

id    name       id    name
--    ----       --    ----
1     Pirate     2     Pirate
2     Monkey     null  null
3     Ninja      4     Ninja
4     Spaghetti  null  null
null  null       1     Rutabaga
null  null       3     Darth Vader


Full outer join produces the set of all records in Table A and Table B,
with matching records from both sides where available.
 If there is no match, the missing side will contain null.


SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name

id  name       id    name
--  ----       --    ----
1   Pirate     2     Pirate
2   Monkey     null  null
3   Ninja      4     Ninja
4   Spaghetti  null  null

Left outer join produces a complete set of records from Table A, with the
matching records (where available) in Table B. If there is no match,
the right side will contain null.



SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name
WHERE TableB.id IS null

id  name       id     name
--  ----       --     ----
2   Monkey     null   null
4   Spaghetti  null   null

To produce the set of records only in Table A, but not in Table B,
we perform the same left outer join, then exclude the records
we don't want from the right side via a where clause



SELECT * FROM TableA
FULL OUTER JOIN TableB
ON TableA.name = TableB.name
WHERE TableA.id IS null
OR TableB.id IS null

id    name       id    name
--    ----       --    ----
2     Monkey     null  null
4     Spaghetti  null  null
null  null       1     Rutabaga
null  null       3     Darth Vader

To produce the set of records unique to Table A and Table B, we perform the same 
full outer join, then exclude the records we don't want from both sides via a where clause



There's also a cartesian product or cross join, which as far as I can tell, can't be expressed as a Venn diagram:
SELECT * FROM TableA
CROSS JOIN TableB
This joins "everything to everything", resulting in 4 x 4 = 16 rows, far more than we had in the original sets. If you do the math, you can see why this is a very dangerous join to run against large tables.

Dimensional Modelling, Facts and Measurements

Dim=Dimension tale, Fact= Fact table

Dimensional modeling

Dimensional modeling always uses the concepts of facts, and dimensions. Facts are typically numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts. For example, sales amount is a fact; timestamp, product, register#, store#, etc. are elements of dimensions. Dimensional models are built by business process area, e.g. store sales, inventory, claims, etc.
More at Wikipedia

What is fact table?

A fact table is a table that contains the measures of interest. For example, sales amount would be such a measure. This measure is stored in the fact table with the appropriate granularity. For example, it can be sales amount by store by day. In this case, the fact table would contain three columns: A date column, a store column, and a sales amount column.

What is dimensional table?

Dimensional table maintains information that represents attributes of the business, contains relatively static data. These tables are connected to fact table through foreign key. These are smaller than fact tables.
  • A fact is a measurement.
  • Facts are based on dimensions.
  • Facts are usually additive but may not be true all the time.
  • Dimension table contains business elements.
  • These tables supply the measurements to the fact tables.
  •  Multiple fact tables may reference the dimension tables.
As shown in the diagram the fact table contains measurement fields along with foreign keys pointing to the dimensions