Wednesday, March 7, 2018

Dynamically Create Data Warehouse (DWH) Dimension - Star Schema Dimension

Today I am sharing a procedure by which we can easily create a dimension for a data warehouse. this is very much dynamic and you know dynamic things are not so easy BUT I have tried to create something user-friendly.

first of all, we have to know about DWH dimension types. it can be FLAT or PARENT-CHILD (hierarchical).

NOTE: This Dynamic procedure is only to create PARENT-CHILD (hierarchical) dimension. if you try you can also create a part of the FLAT dimension.

FLAT Dimension: which dimension table contains flat information with some ID for main/PK column and other important columns. see below table ...

PRODUCT_ID
PRODUCT
MARKET_ID
MARKET
MANUFACTURER
P000001
PARA
M000001
A8A
ACI
P000002
NAPA
M000002
A7A
SQARE

PARENT-CHILD / Hierarchical Dimension: which dimension table contains all code in one column as PK and maintain a relationship with other attribute using parent code as a parent-child hierarchy. see below ..

PRODUCT_CODE
PARENT_CODE
DESCRIPTION
LEVEL_NO
LEVEL_NAME
MNUFACTURAR
N000001
N000001
TOTAL MARKET
1
TOTAL

M000001
N000001
A8A
2
MARKET

M000002
N000001
A7A
2
MARKET

P000001
M000001
PARA
3
PRODUCT
ACI
P000002
M000002
NAPA
3
PRODUCT
SQARE

NOTE: This Dynamic procedure is only to create PARENT-CHILD / Hierarchical dimension. if you try you can also create a part of the FLAT dimension.

All is you have to pass some parameters. A parameter can be a SQL or CASE statements.

Below is the code of this PROC -- also you can download the PROC (click here)
-----------------------------------
GLOBAL_DB.dbo.CREATE_DIM_TABLE

you can place this proc in a database and can use this from any database just make sure the use of tergat db name in table name.



@DIM_SQL
SQL for the Hierarchy and main dimension;
like select level1, level2, level3…  from DB_NAME.dbo.Table_Name; Hierarchy sequence is must
use full table name with db name like DB_NAME.dbo.Table_Name
column name would be the LEVEL_NAME
@ATTRB_SQL
SQL for additional columns in final dimension; internal Synonym of this SQL is "B"
You can say additional attributes like Manufacturer.
DECSRIPTION_COLUMN is must; like below to add additional attributes only available in level3
select level3 as DESCRIPTION_COLUMN, Man from DB_NAME.dbo.Table_Name
@DIM_TABLE_NAME
Full Table Name like DB_NAME.dbo.Table_Name
@ITEM_CODE_COL_NAME
Name for Child Column which is primary key; like PRODUCT_ID
@PARENT_CODE_COL_NAME
Name for parent column to maintain relation of hierarchy
@DESCRIPTION_COL_NAME
Name for Description column, which will contain the values from Hierarchy SQL (DIM_SQL)
@LEVEL_NO_COL_NAME
Name for LEVEL Number column of Hierarchy level no
@LEVEL_NAME_COL_NAME
Name for LEVEL Name column of Hierarchy
@ITEM_CODE_RULE
you can change ITEM_CODE here using CASE, subSQL etc by using synonym "A"
@PARENT_CODE_RULE
you can change PARENT_CODE here using CASE, subSQL etc by using synonym "A"
@LEVEL_NO_RULE
you can change LEVEL_NO here using CASE, subSQL etc by using synonym "A"
@LEVEL_NAME_RULE
you can change LEVEL_NAME here using CASE, subSQL etc by using synonym "A"
@DESCRIPTION_COL_RULE
you can change Description by using Synonym "A" & "B"
@CREATE_ID_TABLE_ALSO
if you want Intermediate FLAT_DIM, set it 1 ; 
internally this table has the synonym "A"
@PRINT_SQL
if you like to see all SQL generated by this proc; set it 1

Example:::

Let we have below table to create our Dimansion.

PRODUCT
ATC3
MANUFACTURER
PARA
A8A
ACI
NAPA
A7A
SQARE

now define the hierarchy..

Level1 = TOTAL MARKET
Level2 = ATC3 = MARKET
Level3 = PRODUCT

so, our output will be as below

PRODUCT_CODE
PARENT_CODE
DESCRIPTION
LEVEL_NO
LEVEL_NAME
MNUFACTURAR
T000001
T000001
TOTAL MARKET
1
TOTAL

M000001
T000001
A8A
2
MARKET

M000002
T000001
A7A
2
MARKET

P000001
M000001
PARA - ACI
3
PRODUCT
ACI
P000002
M000002
NAPA - SQARE
3
PRODUCT
SQARE


now CALL the proc to create a Table as our output !!

GLOBAL_DB.dbo.CREATE_DIM_TABLE        @DIM_SQL = 'select ''TOTAL MARKET'' as TOTAL, ATC3 as MARKET, PRODUCT as PRODUCT from TARGET_DB.dbo.TName ',
   @ATTRB_SQL = 'select PRODUCT as DESCRIPTIONMNUFACTURAR from TARGET_DB.dbo.TName ',
   @DIM_TABLE_NAME = 'TARGET_DB.dbo.PROD_DIM',
   @ITEM_CODE_COL_NAME = 'PRODUCT_CODE',
   @PARENT_CODE_COL_NAME = 'PARENT_CODE',
   @DESCRIPTION_COL_NAME = 'DESCRIPTION',
   @LEVEL_NO_COL_NAME = 'LEVEL_NO',
   @LEVEL_NAME_COL_NAME = 'LEVEL_NAME'
   @ITEM_CODE_RULE = ' case LEVEL_NO when 1 then ''T'' when 2 then ''M'' else ''P'' end + right (''0000000''+PRODUCT_CODE,8) ',
   @PARENT_CODE_RULE = ' case LEVEL_NO when 1 then ''T'' when 2 then ''T'' else ''M'' end + right (''0000000''+PARENT_CODE,8) ',
  -- @LEVEL_NO_RULE = 'NA',
  -- @LEVEL_NAME_COL_NAME = 'NA',
  @DESCRIPTION_COL_RULE = ' A.DESCRIPTION + '' - ''+B.MNUFACTURAR ',
  @CREATE_ID_TABLE_ALSO = 0,
  @PRINT_SQL = 0;
   


Below is the code of this PROC -- also you can download the PROC (click here)
---------------------------------------------


CREATE PROCEDURE [DBO].[CREATE_DIM_TABLE] @DIM_SQL VARCHAR(MAX)
 , @ATTRB_SQL VARCHAR(MAX) = 'NA'
 , @DIM_TABLE_NAME VARCHAR(1000)
 , @ITEM_CODE_COL_NAME VARCHAR(1000) = 'ITEM_CODE'
 , @PARENT_CODE_COL_NAME VARCHAR(100) = 'PARENT_CODE'
 , @DESCRIPTION_COL_NAME VARCHAR(100) = 'DESCRIPTION_COLUMN'
 , @LEVEL_NO_COL_NAME VARCHAR(100) = 'LEVEL_NUMBER'
 , @LEVEL_NAME_COL_NAME VARCHAR(100) = 'LEVEL_NAME'
 , @ITEM_CODE_RULE VARCHAR(1000) = 'NA'
 , @PARENT_CODE_RULE VARCHAR(1000) = 'NA'
 , @LEVEL_NO_RULE VARCHAR(1000) = 'NA'
 , @LEVEL_NAME_RULE VARCHAR(1000) = 'NA'
 , @DESCRIPTION_COL_RULE VARCHAR(1000) = 'NA'
 , @CREATE_ID_TABLE_ALSO BIT = 0
 , @PRINT_SQL BIT = 0
AS
BEGIN
 DECLARE @DB_NAME VARCHAR(100) = isnull(PARSENAME(@DIM_TABLE_NAME, 3), DB_NAME());
 DECLARE @Drop_SQL NVARCHAR(2000);
 SET NOCOUNT ON
 DECLARE @isTable BIT = 'False'

 IF (
   @ITEM_CODE_RULE = 'NA'
   OR @ITEM_CODE_RULE = ''
   )
  SET @ITEM_CODE_RULE = @ITEM_CODE_COL_NAME

 IF (
   @PARENT_CODE_RULE = 'NA'
   OR @PARENT_CODE_RULE = ''
   )
  SET @PARENT_CODE_RULE = @PARENT_CODE_COL_NAME

 IF (
   @LEVEL_NAME_RULE = 'NA'
   OR @LEVEL_NAME_RULE = ''
   )
  SET @LEVEL_NAME_RULE = @LEVEL_NAME_COL_NAME 

IF (
   @LEVEL_NO_RULE = 'NA'
   OR @LEVEL_NO_RULE = ''
   )
  SET @LEVEL_NO_RULE = @LEVEL_NO_COL_NAME 

 IF (
   @DESCRIPTION_COL_RULE = 'NA'
   OR @DESCRIPTION_COL_RULE = ''
   )
  SET @DESCRIPTION_COL_RULE = 'a.' + @DESCRIPTION_COL_NAME

 IF (@ATTRB_SQL = '')
  SET @ATTRB_SQL = 'NA'

 DECLARE @metaDataTableSQL VARCHAR(MAX) = 'select * into ##tmpMataDataScript from ' + CASE @isTable
   WHEN 'True'
    THEN @DIM_SQL
   ELSE ' ( ' + @DIM_SQL + ' )'
   END + ' a'
 DECLARE @mataDataSQL VARCHAR(MAX) = 'select * from ##tmpMataDataScript'

 IF object_id('tempdb..##tmpMataDataScript') IS NOT NULL
 BEGIN
  DROP TABLE ##tmpMataDataScript
 END

 DECLARE @Columns TABLE (
  TABLE_QUALIFIER VARCHAR(500)
  , TABLE_OWNER VARCHAR(500)
  , TABLE_NAME VARCHAR(500)
  , COLUMN_NAME VARCHAR(500)
  , DATA_TYPE VARCHAR(500)
  , TYPE_NAME VARCHAR(500)
  , PRECISION VARCHAR(500)
  , LENGTH VARCHAR(500)
  , SCALE VARCHAR(500)
  , RADIX VARCHAR(500)
  , NULLABLE VARCHAR(500)
  , REMARKS VARCHAR(500)
  , COLUMN_DEF VARCHAR(500)
  , SQL_DATA_TYPE VARCHAR(500)
  , SQL_DATETIME_SUB VARCHAR(500)
  , CHAR_OCTET_LENGTH VARCHAR(500)
  , ORDINAL_POSITION VARCHAR(500)
  , IS_NULLABLE VARCHAR(500)
  , SS_DATA_TYPE VARCHAR(500)
  )

 EXEC (@metaDataTableSQL)

 INSERT INTO @Columns
 EXEC tempdb..sp_columns @table_name = N'##tmpMataDataScript'

 UPDATE @Columns
 SET COLUMN_NAME = QUOTENAME(COLUMN_NAME)

 DECLARE @Table_ID TABLE (
  ORDINAL_POSITION INT
  , COLUMN_NAME VARCHAR(1000)
  , COLUMN_GROUP VARCHAR(1000)
  , CONDITION VARCHAR(5000)
  , LEFT_COLUMN_NAME AS ('a' + cast(ORDINAL_POSITION AS VARCHAR) + '.' + QUOTENAME('ia' + cast(ORDINAL_POSITION AS VARCHAR) + '_' + COLUMN_NAME + '_code')
   )
  , LEFT_JOIN AS (
   ' LEFT JOIN (
 SELECT cast(ROW_NUMBER() OVER (
    ORDER BY ' + COLUMN_GROUP + ') AS VARCHAR) ' + QUOTENAME('ia' + cast(ORDINAL_POSITION AS 
     VARCHAR) + '_' + COLUMN_NAME + '_code ') + ',' + COLUMN_GROUP + ' from Data ia' + cast(ORDINAL_POSITION AS VARCHAR) + ' group by ' + COLUMN_GROUP + 
   ' ) ' + 'a' + cast(ORDINAL_POSITION AS VARCHAR) + ' ON ' + CONDITION
   )
  , UNION_ALL AS (
   'SELECT ' + QUOTENAME(COLUMN_NAME + '_CODE') + ' AS @ITEM_CODE_COL_NAME, ' + QUOTENAME('LEVEL' + cast((ORDINAL_POSITION - 1) AS 
     VARCHAR) + '_CODE') + ' as @PARENT_CODE_COL_NAME, ' + COLUMN_NAME + ' AS @DESCRIPTION_COL_NAME, ''' + cast(ORDINAL_POSITION AS VARCHAR) + '''' + 
   ' AS @LEVEL_NO_COL_NAME, ' + '''' + replace(replace(COLUMN_NAME, '[', ''), ']', '') + '''' + ' AS @LEVEL_NAME_COL_NAME ' + ' from ' + '@DIM_TABLE_NAME' + 
   ' GROUP BY ' + QUOTENAME('LEVEL' + cast((ORDINAL_POSITION - 1) AS VARCHAR) + '_CODE') + ' , ' + QUOTENAME(COLUMN_NAME + '_CODE') + 
   ',' + COLUMN_NAME
   )
  )

 INSERT INTO @Table_ID (
  ORDINAL_POSITION
  , COLUMN_NAME
  )
 SELECT ORDINAL_POSITION
  , COLUMN_NAME
 FROM @Columns
 ORDER BY ORDINAL_POSITION

 UPDATE @Table_ID
 SET COLUMN_GROUP = substring((
    SELECT ',' + COLUMN_NAME
    FROM @Table_ID b
    WHERE a.ORDINAL_POSITION >= b.ORDINAL_POSITION
    FOR XML path('')
    ), 2, 50000)
  , CONDITION = substring((
    SELECT ' AND  a.' + COLUMN_NAME + '=' + 'a' + cast(ORDINAL_POSITION AS VARCHAR) + '.' + COLUMN_NAME
    FROM @Table_ID b
    WHERE a.ORDINAL_POSITION >= b.ORDINAL_POSITION
    FOR XML path('')
    ), 5, 50000)
 FROM @Table_ID a

 DECLARE @FINAL_QUERY VARCHAR(MAX) = 'WITH Data as (' + @DIM_SQL + ') select '
 DECLARE @left_join VARCHAR(MAX) = ''
 DECLARE @FINAL_QUERY_SELECT VARCHAR(MAX) = ''

 SELECT @FINAL_QUERY_SELECT = @FINAL_QUERY_SELECT + ', a.' + a.COLUMN_NAME + ', ' + a.LEFT_COLUMN_NAME + ' AS ' + QUOTENAME(a.COLUMN_NAME + '_CODE')
  , @left_join = @left_join + a.LEFT_JOIN
 FROM @Table_ID a

 SET @FINAL_QUERY = @FINAL_QUERY + substring(@FINAL_QUERY_SELECT, 2, 50000) + ' INTO ' + @DIM_TABLE_NAME + '_ID' + ' from Data a ' + @left_join
 
 SET @Drop_SQL = 'IF EXISTS (
   SELECT *
   FROM ' + @DB_NAME + '.sys.objects
   WHERE object_id = OBJECT_ID(''' + @DIM_TABLE_NAME + '' + 
  '_ID'')
    AND type IN (N''U'')
   )
  EXEC (''DROP TABLE ' + @DIM_TABLE_NAME + '' + '_ID'')';

 EXEC (@Drop_SQL)

 IF @PRINT_SQL = 1
  PRINT @FINAL_QUERY

 EXEC (@FINAL_QUERY)

 SET @FINAL_QUERY_SELECT = ''

 SELECT @FINAL_QUERY_SELECT = @FINAL_QUERY_SELECT + ' UNION ALL ' + a.UNION_ALL
 FROM @Table_ID a

 SET @FINAL_QUERY_SELECT = SUBSTRING(@FINAL_QUERY_SELECT, 12, 50000)

 SELECT @FINAL_QUERY_SELECT = CASE 
   WHEN a.ORDINAL_POSITION = 1
    THEN replace(replace(@FINAL_QUERY_SELECT, 'LEVEL0', a.COLUMN_NAME + ']'), 'LEVEL1', a.COLUMN_NAME + ']')
   ELSE replace(@FINAL_QUERY_SELECT, 'LEVEL' + cast(ORDINAL_POSITION AS VARCHAR), a.COLUMN_NAME + ']')
   END
 FROM @Table_ID a

 DECLARE @DIM_TABLE_NAME_INT VARCHAR(1050) = @DIM_TABLE_NAME + CASE @ATTRB_SQL
   WHEN 'NA'
    THEN ''
   ELSE '_INT'
   END

 SET @FINAL_QUERY_SELECT = 
  'select @ITEM_CODE_RULE as @ITEM_CODE_COL_NAME,@PARENT_CODE_RULE as @PARENT_CODE_COL_NAME,@DESCRIPTION_COL_NAME,@LEVEL_NO_RULE as @LEVEL_NO_COL_NAME,@LEVEL_NAME_RULE as @LEVEL_NAME_COL_NAME INTO ' 
  + @DIM_TABLE_NAME_INT + ' from ( ' + replace(@FINAL_QUERY_SELECT, '@DIM_TABLE_NAME', @DIM_TABLE_NAME + '_ID') + 
  ' ) a ORDER BY cast (@LEVEL_NO_COL_NAME as int),' + CASE rtrim(LTRIM(@ITEM_CODE_RULE))
   WHEN rtrim(LTRIM(@PARENT_CODE_RULE))
    THEN ''
   ELSE CASE 
     WHEN @PARENT_CODE_RULE = @PARENT_CODE_COL_NAME
      THEN 'cast(@PARENT_CODE_RULE as int), '
     ELSE '@PARENT_CODE_RULE, '
     END
   END + CASE 
   WHEN @ITEM_CODE_RULE = @ITEM_CODE_COL_NAME
    THEN ' cast(@ITEM_CODE_RULE as int)'
   ELSE ' @ITEM_CODE_RULE'
   END

 SET @Drop_SQL = 'IF EXISTS (
   SELECT *
   FROM ' + @DB_NAME + '.sys.objects
   WHERE object_id = OBJECT_ID(''' + @DIM_TABLE_NAME_INT + 
  ''')
    AND type IN (N''U'')
   )
  EXEC (''DROP TABLE ' + @DIM_TABLE_NAME_INT + ''')';

 EXEC (@Drop_SQL)

 SET @FINAL_QUERY_SELECT = REPLACE(@FINAL_QUERY_SELECT, '@ITEM_CODE_COL_NAME', @ITEM_CODE_COL_NAME)
 SET @FINAL_QUERY_SELECT = replace(@FINAL_QUERY_SELECT, '@PARENT_CODE_COL_NAME', @PARENT_CODE_COL_NAME)
 SET @FINAL_QUERY_SELECT = replace(@FINAL_QUERY_SELECT, '@DESCRIPTION_COL_NAME', @DESCRIPTION_COL_NAME)
 SET @FINAL_QUERY_SELECT = replace(@FINAL_QUERY_SELECT, '@LEVEL_NO_COL_NAME', @LEVEL_NO_COL_NAME)
 SET @FINAL_QUERY_SELECT = replace(@FINAL_QUERY_SELECT, '@LEVEL_NAME_COL_NAME', @LEVEL_NAME_COL_NAME)
 SET @FINAL_QUERY_SELECT = replace(@FINAL_QUERY_SELECT, '@ITEM_CODE_RULE', @ITEM_CODE_RULE)
 SET @FINAL_QUERY_SELECT = replace(@FINAL_QUERY_SELECT, '@PARENT_CODE_RULE', @PARENT_CODE_RULE)
 SET @FINAL_QUERY_SELECT = replace(@FINAL_QUERY_SELECT, '@LEVEL_NO_RULE', @LEVEL_NO_RULE)
 SET @FINAL_QUERY_SELECT = replace(@FINAL_QUERY_SELECT, '@LEVEL_NAME_RULE', @LEVEL_NAME_RULE)

 IF @PRINT_SQL = 1
  PRINT @FINAL_QUERY_SELECT

 EXEC (@FINAL_QUERY_SELECT)

 SET @Drop_SQL = 'IF EXISTS (
   SELECT *
   FROM ' + @DB_NAME + '.sys.objects
   WHERE object_id = OBJECT_ID(''' + @DIM_TABLE_NAME + '' + 
  '_ID'')
    AND type IN (N''U'')
   )
   AND ' + cast(@CREATE_ID_TABLE_ALSO AS VARCHAR) + ' = 0
  EXEC (''DROP TABLE ' + @DIM_TABLE_NAME + '' + 
  '_ID'')';

 EXEC (@Drop_SQL)

 IF @DIM_TABLE_NAME <> @DIM_TABLE_NAME_INT
 BEGIN
  SET @metaDataTableSQL = 'select * into ##tmpMataDataScript from ' + ' ( ' + @ATTRB_SQL + ' )' + ' a'
  SET @mataDataSQL = 'select * from ##tmpMataDataScript'

  IF object_id('tempdb..##tmpMataDataScript') IS NOT NULL
  BEGIN
   DROP TABLE ##tmpMataDataScript
  END
 
  DELETE
  FROM @Columns

  EXEC (@metaDataTableSQL)

  INSERT INTO @Columns
  EXEC tempdb..sp_columns @table_name = N'##tmpMataDataScript'

  UPDATE @Columns
  SET COLUMN_NAME = QUOTENAME(COLUMN_NAME)

  SET @FINAL_QUERY_SELECT = 'select ' + 'a.' + @ITEM_CODE_COL_NAME + ', a.' + @PARENT_CODE_COL_NAME + ',' + @DESCRIPTION_COL_RULE + ' as ' + @DESCRIPTION_COL_NAME + 
   ', a.' + @LEVEL_NO_COL_NAME + ', a.' + @LEVEL_NAME_COL_NAME

  SELECT @FINAL_QUERY_SELECT = @FINAL_QUERY_SELECT + ', b.' + a.COLUMN_NAME
  FROM @Columns a
  WHERE a.COLUMN_NAME NOT LIKE '%' + @DESCRIPTION_COL_NAME + '%'
  ORDER BY a.ORDINAL_POSITION

  SET @FINAL_QUERY_SELECT = @FINAL_QUERY_SELECT + ' into ' + @DIM_TABLE_NAME + ' from ' + @DIM_TABLE_NAME_INT + ' a ' + ' left join (' + @ATTRB_SQL + ') b on a.' + 
   @DESCRIPTION_COL_NAME + ' = b.' + @DESCRIPTION_COL_NAME + ' order by cast(a.' + @LEVEL_NO_COL_NAME + ' as int) , ' + CASE 
    WHEN @PARENT_CODE_RULE = @PARENT_CODE_COL_NAME
     THEN 'cast(' + 'a.' + @PARENT_CODE_COL_NAME + ' as int), '
    ELSE 'a.' + @PARENT_CODE_COL_NAME + ' ,'
    END + CASE 
    WHEN @ITEM_CODE_RULE = @ITEM_CODE_COL_NAME
     THEN 'cast(' + 'a.' + @ITEM_CODE_COL_NAME + ' as int) '
    ELSE 'a.' + @ITEM_CODE_COL_NAME
    END

  IF @PRINT_SQL = 1
   PRINT (@FINAL_QUERY_SELECT)

  SET @Drop_SQL = 'IF EXISTS (
   SELECT *
   FROM ' + @DB_NAME + '.sys.objects
   WHERE object_id = OBJECT_ID(''' + @DIM_TABLE_NAME + 
   ''')
    AND type IN (N''U'')
   )
  EXEC (''DROP TABLE ' + @DIM_TABLE_NAME + ''')';

  EXEC (@Drop_SQL)

  EXEC (@FINAL_QUERY_SELECT)

  SET @Drop_SQL = 'IF EXISTS (
   SELECT *
   FROM ' + @DB_NAME + '.sys.objects
   WHERE object_id = OBJECT_ID(''' + @DIM_TABLE_NAME_INT + 
   ''')
    AND type IN (N''U'')
   )
  EXEC (''DROP TABLE ' + @DIM_TABLE_NAME_INT + ''')';

  EXEC (@Drop_SQL)
 END
END

No comments:

Post a Comment