In today’s article, we shall dive deeper into Snowflake micro-partitions and the unique benefits they bring to the table.
You may have heard about the Partitioning concept as this is not a new term in the database world. We have seen this in many leading RDBMS like Oracle/Netezza etc. in one or another name. But, have you ever thought about how Snowflake maintains its performance in spite of very limited performance tools enabled for the end user? If yes, you are a rational thinker.
Although Snowflake has performance enable architecture designed from scratch, there are several ways snowflake supports the performance part of it. Let’s understand them in detail.
Partitioning in traditional databases vs Snowflake
Traditional databases allow the partitioning of large tables where the data in the table is divided into segments, called partitions, that make it easier to manage and query your data. This is referred to as static partitioning which requires you to include a partitioning clause in the CREATE TABLE statement to create a partitioned table.
This Static Partitions require overhead costs, constant user maintenance, and are difficult to optimize.
The Snowflake Data Platform implements a powerful and unique form of partitioning called micro-partitions which offer all the benefits of static partitions while also optimizing query performance using metadata, near-zero management, and supporting auto-clustering for extremely large tables.
Micro-partitioning is automatically performed on all Snowflake tables. Tables are transparently partitioned using the ordering of the data as it is inserted/loaded. This is also known as a natural clustering of tables.
What is Partitioning?
Data warehouses store large volumes of data. Occasionally they keep historical data for many years due to regulatory mandates. At the same time, the consumer rarely needs to query all data. In most cases, they are interested in data of recent days/weeks or months.
Let's understand partition from an example.
SELECT Item_type, State, Sum(Quantity)
BETWEEN ‘2022-02–01’ AND ‘2022–03–14’ AND item_type = ‘PEN’
GROUP BY Item_type, State;
For the above query, if the table is not partitioned and there is no index on ‘order placed date’, this query might scan the whole table rather than a month of data. If the table is partitioned on the order date, then it will only scan for a month of data and such optimization is called Partition pruning.
What is Snowflake Micro-Partitions?
While we do not explicitly specify partitions in Snowflake, all data are automatically loaded into partitions, called micro-partitions in Snowflake.
Micro Partitions are a attached unit of storage in Snowflake. All tables are automatically divided into a micro partition which is the smallest unit of storage in snowflake.
Each micro partition contains 50–500 MB of data in an uncompressed format and organized in a columnar fashion. This is further compressed during loading. Tables are transparently partitioned using the ordering of the data as it is inserted/loaded.
Unlike tables in traditional data warehouses that typically have a relatively small number of partitions (for example, 1 partition per day and per product), a table in Snowflake can easily have millions of partitions.
Snowflake maintains the metadata of each micro partition, such as:
What is the benefit of Snowflake Micro Partition?
The benefits of Snowflake’s style to partitioning table data include:
Thank you for reading this article. I believe, by now you know how micro partitions get generated automatically in Snowflake and the benefits of it.
Stay tuned for more articles related to the Snowflake data cloud.
Database architect, SnowPro Certified, Trainer.