SnowPro Core Exam cheat sheet — part 1 (October 2024)

Nasta Khralovich
7 min readOct 22, 2024

--

As I prepared for my SnowPro Core COF-C02 exam, I gathered some notes from the official Snowflake documentation. Hope you find them helpful too. Part 2 is available here and part 3 is here.

Photo by Aaron Burden on Unsplash

What you’ll find below:

  • Snowflake distinct layers
  • Snowflake editions
  • Snowflake tools and interfaces
  • Tables & Views
  • Clustering & micro-partitions
  • Data storage monitoring
  • Sharings, Tasks & Streams

1. Snowflake AI Data Cloud Features and Architecture (25% of exam questions)

Snowflake distinct layers

Layers are separated -> can be scaled independently.

  • Database Storage Layer (both structured and semi-structured data, files are compressed when loading, data transfer is free, but unloading/replication costs money).
  • Query Processing Layer (queries processing by virtual warehouses). Also referred as Computer Layer, Virtual Warehouse Layer.
  • Cloud Service Layer (authentication, metadata storage and management, queries planning, compilation and optimization, access control, data security, infrastructure management).

All three layers are deployed and managed entirely on a selected cloud platform (AWS, Azure, GCP).

Snowflake editions

4 editions, a new release each week, the next edition in the hierarchy includes everything the previous one has + everything from the brackets:

  • Standard (time travel up to 1 day, fail-safe up to 7 days, network policies, federated authentication, SSO, OAuth, MFA, auto encryption of all data, resource monitors, UDFs, database share and replication).
  • Enterprise (column-level and row-level security, time travel up to 90 days, materialized views, multi-clustered warehouses, search optimization, data classification).
  • Business Critical (failover and failback, private links, Tri-Secret secure, Protected Health Information data — requires an agreement).
  • Virtual Private Snowflake (dedicated metadata store and pool of compute resources in warehouses, highest security, a separate Snowflake environment isolated from other Snowflake accounts).

Snowflake tools and interfaces

  • Snowsight — a Snowflake web UI (worksheets, Query Profile, charts, monitoring, tags, costs management)
  • SnowSQL — a CLI client, executes SQL commands, DML/DDL operations, including loading/unloading. For Linux, Windows, MacOS.
  • Connectors: Spark, Kafka.
  • Drivers: ODBC, JDBC, .NET, Go, PHP, Python, Node.js
  • Snowpark — an API, works with Java, Python and Scala.
  • SnowCD — a connectivity diagnostic tool.
  • AI&ML: Copilot (free to use, English only), Cortex LLM functions (all accounts in select regions), Snowflake ML. More here.
  • Snowflake partners and 3rd-party tools. You may also check actual partners in your Snowflake account (Data Products -> Partner Connect).

Tables

All data in Snowflake is stored in database tables.

  • Permanent — default table, exist until explicitly dropped, fail-safe 7d, time travel is 1d by default but can be max 90d for Enterprise edition. Changes to a table are not automatically propagated to views created on that table.
  • Transient — exists until explicitly dropped, 1d time travel, cheaper than permanent because have no fail-safe.
  • Temporary — belong to a specific database and schema but exist only for a duration of a session, no fail-safe, 1d time travel, can’t be converted to other types.
  • External — read-only (no DMLs, but you can do joins and queries), Snowflake does not manage/store this data (but can store some metadata), querying may be slower (create a materialize view over it to work faster).

Hybrid, Iceberg tables — read more here. Data storage considerations — here.

Views

  • Non-materialized — any query expression that returns a valid result can be used to create it. Results are not stored, performance is slower comparing to materialized views.
  • Materialized — pre-computed, stored for later use (faster than tables, but costs are higher ), make sense when base table is big and does not change frequently, but the needed query output is small. Those views are maintained by Snowflake automatically.
  • Secure — a view definition is not visible to other users, and the underlying data can’t be seen. Details of this view is not exposed in Query Profile even for the owner. Both materialized and non-materialized views can be secure, and they can be converted back to non-secure. Slower than non-secure.

Micro-partitions

  • Each partition has 50–500 MB of uncompressed data.
  • They are created automatically for every table, and you don’t need to maintain them. Micro-partitioning happens when the data is loaded.
  • Data is stored and sorted by column, and rows are mapped to columns.
  • Columns are stored and compressed independently.
  • Pruning — helps to return data faster while querying thanks to reading metadata of each micro-partition first.
  • Metadata is stored by Snowflake and includes min and max values of each column, the number of distinct values and some additional properties used in optimization.
  • All DML operations (UPDATE, DELETE, MERGE etc) perform better thanks to micro-partitions.
  • Micro-partitions are immutable, which basically enables such feature as Time travel.

Data clustering

  • Clustering is basically co-location of data to optimize performance. Costs: compute and storage.
  • If a clustering key is defined, Snowflake can perform clustering and re- clustering automatically. Automatic clustering is applied to every table with a clustering key, but it does not always starts immediately. It can also be suspended or resumed at any time.
  • Use clustering: on a large tables (multiple terabytes of data, many micro-partitions); when the clustering depth of a table is large; when queries are running slower than expected.
  • Best columns for a clustering key: those that are most actively used in selective filters or/and frequently used in join predicates. The cardinality of such columns can’t be very large or very small.
  • Clustering keys can be changed and dropped at any time by executing ALTER TABLE [TABLE NAME] DROP CLUSTERING KEY command.
  • Clustering depth shows how well-clustered a table is (the smaller depth is, the better a table is clustered):

Data storage monitoring

Budgets

  • Account-level monitoring and notification of credits usage for: compute pool, databases, materialized views, pipes, tables, tasks, schemas, warehouses and serverless features.
  • Set a monthly limit and get an email notification when it is exceeded.
  • You may have one account budget and up to 100 custom budgets.

Resource monitors

  • A monitor scan be set up on account and warehouse levels, but it tracks only credits consumed by warehouses.
  • You can set a credit limit for a specified interval or date range. You can also suspend a warehouse at a specific date and time, regardless of whether a credit quota threshold has been reached.
  • Only users with the ACCOUNTADMIN role can create a resource monitor, but users with appropriate privileges can view/modify monitors.
  • Each resource monitor can have the following actions: Notify (up to 5), Notify&Suspend (one), Notify&Suspend Immediately (one) — at least one action should be defined.
  • One account can have only one monitor, and also each warehouse can have one monitor, but one monitor may control several warehouses.
  • Email and Snowsight notifications.
  • By default, a resource monitor starts tracking assigned warehouses as soon as it is created, at the current timestamp.
  • To resume a warehouse at least one condition should be met: new monthly billing cycle begins; credit quota increased; credit threshold increased; monitor is dropped or removed from the warehouse.

Streams

  • Record data manipulation changes made to a table (inserts, updates, deletes) and metadata about those changes.
  • Can be created for: views, standard, dynamic, event, Iceberg, external and directory tables.
  • Types of streams: standard, append-only, insert-only.
  • Use the MAX_DATA_EXTENSION_TIME_IN_DAYS parameter to prevent streams from becoming stale (max 90 days, default 14 days).

Tasks

  • Use for automating and scheduling.
  • A task can execute: a single SQL statement, call to a store procedure, procedural logic using Snowflake scripting.
  • Run a sequence of tasks with graphs: max 1 000 tasks in a graph, a single task can have max 100 parents and 100 children.
  • You need to run the ALTER TASK with RESUMEafter creation, otherwise this task will never be triggered.

Sharing data

Each option is read-only for consumers, and no physical data copies are made or transferred between accounts. Only the original Data Provider can share the data.

  • Secure Data Sharing

Shares require both accounts to be in the same cloud and same region. Share data without copying. A Provider pays for compute/storage, and creating a share is free. You may create a Share either via Snowsight or SQL commands (`CRATE SHARE`).

A list of object that can be shared.

Share data with consumers who don’t have their own Snowflake accounts by creating managed Reader accounts. You can share your data to an unlimited number of consumers. Uses replication for cross-region share.

New objects added / updates become immediately available to all consumers.

  • Listing

Share data/apps, includes monetization and metadata features.

Can be shared across regions (which may cause latency and additional costs). Can be either free or paid, available publicly (via Marketplace) or privately (to specific consumers).

When publishing a Snowflake Data Marketplace, the listing is automatically replicated across regions, but the data must be manually replicated.

  • Data Exchange

Create a private exchange for selected accounts to share and access data within a controlled group. Can be shared across regions/clouds. The Data Exchange Administrator controls who joins and what data is shared.

The CREATE DATA EXCHANGE LISTING privilege allows to submit listings for approval/publishing and modify listings properties.

Pipes

  • Snowpipe helps to load data from files continuously (as soon as they arrive) using the COPY command.
  • Stage -> Pipe -> Table.
  • Recommended files size: greater than 100 MB, and up to 250 MB compressed.
  • Two ways to use it: REST API or automatic Snowpipe.
  • All data types are supported.
  • Snowflake automatically manages the compute require to execute pipe’s COPY INTO commands.
  • Pipes won’t delete staged files automatically. Execute REMOVE command to delete them.
  • Per-second/per-core granularity charged.

That’s all for now, I am planning to post more about other domains of this exam in the next few days. Also, I am going to share my learning strategy and some general tips. Part 2 is available here.

Like this post? Buy me a coffee.

--

--

No responses yet