In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by a centralized fact table which is connected to multiple dimensions. “Snowflaking” is a method of normalising the dimension tables in a star schema. When it is completely normalised along all the dimension tables, the resultant structure resembles a snowflake with the fact table in the middle. The principle behind snowflaking is normalisation of the dimension tables by removing low cardinality attributes and forming separate tables.
The snowflake schema is similar to the star schema. However, in the snowflake schema, dimensions are normalized into multiple related tables, whereas the star schema’s dimensions are denormalized with each dimension represented by a single table. A complex snowflake shape emerges when the dimensions of a snowflake schema are elaborate, having multiple levels of relationships, and the child tables have multiple parent tables (“forks in the road”).
In “Part 09: Star Schema”, I showed a simple Star Schema for Sales with products, dates and stores. This star schema is now extended to a Snowflake Schema in order to show the differences between the two schemas.
The snowflake schema is in the same family as the star schema logical model. In fact, the star schema is considered a special case of the snowflake schema. The snowflake schema provides some advantages over the star schema in certain situations, including:
- Some OLAP multidimensional database modeling tools are optimized for snowflake schemas.
- Normalizing attributes results in storage savings, the tradeoff being additional complexity in source query joins.
The primary disadvantage of the snowflake schema is that the additional levels of attribute normalization adds complexity to source query joins, when compared to the star schema.
Snowflake schemas, in contrast to flat single table dimensions, have been heavily criticised. Their goal is assumed to be an efficient and compact storage of normalised data but this is at the significant cost of poor performance when browsing the joins required in this dimension. This disadvantage may have reduced in the years since it was first recognized, owing to better query performance within the browsing tools.
When compared to a highly normalized transactional schema, the snowflake schema’s denormalization removes the data integrity assurances provided by normalized schemas. Data loads into the snowflake schema must be highly controlled and managed to avoid update and insert anomalies.
About Tobias Riedner
Tobias Riedner foundet WYCDWD.com in 2015. He works and worked as innovator, consultant, analyst and educator in the fields of business intelligence and data warehousing. He learned a lot from the best consultants, managers und educators in the past and shares his knowledge worldwide. He works for a steady growing traditional company which is a leader in industry 4.0.