SnowPro Core Exam cheat sheet — part 2 (October 2024)

Nasta Khralovich
10 min readOct 27, 2024

--

As I prepared for my SnowPro Core COF-C02 exam, I gathered some notes from the official Snowflake documentation. Hope you find them helpful too. Find the first part here, and the third part here.

Photo by Aaron Burden on Unsplash

What you’ll find below:

  • Roles
  • Authentication
  • Tags & Data Classification
  • Row-level and column-level security
  • Access control options
  • Warehouses
  • Cache
  • Cloning
  • Query Profile & Query History

2. Account Access and Security (20%)

Roles

RBAC (Role-based Access Control) — access privileges are assigned to roles, which are in turn assigned to users.

DAC (Discretionary Access Control) — each object has an owner, who can in turn grant access to that object.

System-defined roles:

  • GLOBALORGADMIN (ORGADMIN for a multi-account organization).
  • ORGADMIN (create and view/show all accounts of an organization, usage information for organization) — is separated from other roles in the hierarchy.
  • ACCOUNTADMIN (top-level role in the system, the first role when creating an account, should be granted to at least two users but as few as possible, highly recommended to have MFA, can create resource monitors, avoid this role to create objects and automated scripts)
  • SYSADMIN (topmost for custom roles, best for creating databases, warehouses and all database objects, managed by ACCOUNTADMIN)
  • SECURITYADMIN (has global MANAGE GRANTS privilege to grant/revoke privileges globally, can create, monitor and manage users and roles, can use network policy)
  • USERADMIN (best for creating/managing roles and users, the ACCOUNTADMIN should create two users with the USERADMIN role after creating an account)
  • PUBLIC (is granted automatically to every new user)

Custom roles

Ownership and inheritance

  • When a role is dropped, it cannot be recovered, only recreated.
  • A role cannot be dropped if it has the OWNERSHIP privilege on a shared database.
  • Ownership of any objects owned by the dropped role is transferred to the role that executes the DROP ROLE command.
  • Access control considerations.

Tags

  • A tag is a schema-level object that you attach to database objects to provide additional metadata or information.
  • Tag-based masking policy: a tag is used to carry a masking policy and automatically apply it to matching columns within the object it’s assigned to. Can be applied at the database, schema, or table level. Each tag can support one masking policy for each data type.
  • When cloning a database or schema, tags stored within that database or schema are also cloned.

Data classification

  • Helps to identify and tag columns with specific data types or categories, such as personal data, quasi-identifiers, or sensitive information.
  • Use via Snowsight or SQL commands.
  • Needed privileges: SELECT, USAGE on a warehouse, APPLY TAG for classification process.
  • Ignores columns with only NULLs and with unsupported data types (ARRAY, BINARY, GEOGRAPHY, OBJECT, VARIANT, VECTOR).

Row/column-level security

Column-level security

  • Apply a masking policy to a column within a table or view.
  • Includes two features: Dynamic data masking (to selectively mask plain-text data in a specific column) and External tokenization (tokenize data before loading to Snowflake — so a user won’t see a real value).
  • You can apply the masking policy to one or more table/view columns with the matching data type.
  • Ownership of a table does not automatically grant the ability to change masking policies since a policy is a separate schema-level object. A role need an OWNERSHIP privilege on the masking policy and the USAGE privilege on the parent database and schema.

Row-level security

  • Controls which rows in a table or view a user can see when they run a query based on conditions or user roles.
  • Contains an expression that evaluates conditions using Snowflake functions (e.g., checking user roles or context) to determine if a row is visible.
  • At runtime, Snowflake converts the table or view into a dynamic secure view based on the row access policy, filtering rows dynamically.
  • When multiple row access policies exist (e.g., on both a table and a view based on that table), the table’s policy executes first. If the view has its own policy, it runs after the table’s policy.

Secure views/functions

Authentication

Network policy

  • Allow/deny access based on IP or range of IP addresses (but only PIv4). ALLOWED_IP_LIST is required, BLOCKED_IP_LIST is optional
  • Uses CIDR notations for IP ranges.
  • Applied on two levels: account and user. Must be ACCOUNTADMIN or SECURITYADMIN or a user with an ATTACH_POLICY privilege to apply/activate. SECURITYADMIN, ACCOUNTADMIN or a role with CREATE_NETWORK_POLICY to create.
  • You can define many policies, but only one may be applied at a time. If an IP address is in both the allowed and blocked lists, the blocked list takes precedence.
  • If you specify an allowed rule, all other similar identifiers (like other IP addresses) are automatically blocked unless explicitly allowed.
  • Use MINS_TO_BYPASS_NETWORK_POLICY to bypass a network policy for a set number of minutes, contact Snowflake support to set it.

MFA

  • An integrated Snowflake feature.
  • Users are not automatically enrolled, but Snowflake strongly suggests ACCOUNTADMINs to have it enabled.
  • Automatically enabled on account level (and requires no management).
  • Any Snowflake user can self-enroll in MFA through the web interface.
  • ACCOUNTADMINs may disable MFA for a user temporarily or permanently, or enforce a user to enroll in MFA.
  • MFA token caching is allowed with the following drivers: Snowflake CLI, ODBC, JDBC, Python connector. Can be combined with connection caching in federated single sign-on.
  • MINS_TO_BYPASS_MFA: for how many mins MFA is disabled for a user so he/she can log in.
  • DISABLE_MFA: cancel a user enrollment in MFA.

Federated authentication / SSO

  • Federated environment helps to separate authentication from user access.
  • SAML 2.0 e.g. Okta.

Key-pair authenticationnot available in Snowsight, JSON Web Token (JWT) signed with a private/public RSA key pair.

OAuth — Snowflake and External, authentication from a 3rd party service or application.

Access control privileges

  • ATTACH POLICY: to activate a network policy by associating it with your account.
  • APPLY MASKING POLICY: to set a column-level security masking policy on a table or view.
  • APPLY ROW ACCESS POLICY: to add and drop a row access policy on a table or view.
  • OPERATE: to suspend/resume a warehouse, a task, a dynamic table etc.
  • OWNERSHIP: to drop, alter, and grant or revoke access to an object.
  • MANAGE GRANTS: to grant or revoke privileges on any object as if the invoking role were the owner of the object.
  • MANAGE WAREHOUSES: to perform operations that require the MODIFY, MONITOR, and OPERATE privileges on warehouses in the same account.
  • MONITOR: to see details within an object (user, resource monitor, warehouse, database, schema, task).
  • MODIFY: to change the settings or properties of an object (resource monitor, warehouse, listing, database, schema).
  • SELECT: to execute a SELECT statement on the table/view.
  • SHOW GRANTS: to list all access control privileges that have been explicitly granted to roles, users, and shares.
    OF ROLE —to list all users and roles to which the role has been granted.
    TO ROLE — to list all privileges and roles granted to the role.
  • UPDATE: to execute an UPDATE command on the table.
  • USAGE: to execute a USE <object> / SHOW <objects> commands on warehouses, listings, databases, schemas, external stages, files, UDFs, stored procedures, functions.

Find a list of all privileges here.

3. Performance Concepts (15%)

Warehouses

  • Sizes: XS -> 6 XL (XS uses 1 credit per hour, S uses 2, M uses 4 etc).
  • Pay for: warehouse size, how long it runs (per second billing, minimum 60 sec), how many warehouses are used.
  • Examples of operations that require a warehouse: COPY INTO statements (loading data), query execution (not reading from the cache), DML operations (INSERT, UPDATE, DELETE, MERGE, COPY).
  • Warehouses can be created and destroyed instantly, can be resized and stopped at any time, but only new queries will benefit from the resizing.
  • Concurrency: set a MAX_CONCURRENCY_LEVEL parameter (10 by default) to define max number of queries allowed to run simultaneously.
  • Performance: increase size of a warehouse to run complex queries, increase number of warehouses to run more queries at the same time.
  • You can create an unlimited number of warehouses.
  • To view warehouse usage over time: the WAREHOUSE_METERING_HISTORY view, Snowsight’s billing tab.
  • Examples of operations that require a running warehouse: executing a stored procedure, querying data from a materialized view, performing DML operations like updating rows, loading/unloading data into tables.

Single-cluster warehouse

  • Has only a single compute cluster and cannot scale out.
  • Auto-suspend, auto-resume options are available while creating, active by default.
  • If not enough resources, queries are queued.
  • If a suspended warehouse is resized, new resources are provisioned when resumed.
  • AUTO_SUSPEND — if 0, will never be suspended, by default 10 mins, recommended: 5 mins.
  • A warehouse starts running automatically after creation, unless the INITIALLY_SUSPEND parameter is set to FALSE.
  • MAX_CONCURRENCY_LEVEL: use to limit the number of queries running in parallel. The default is 8.
  • STATEMENT_QUEUED_TIMEOUT_IN_SECONDS: amount of seconds an SQL statement remains queued before it is cancelled by a system. Default: 0 (no timeout). Can be set for a warehouse and/or a session. The session timeout takes precedence.
  • STATEMENT_TIMEOUT_IN_SECONDS: amount of seconds after which an SQL statement is canceled by the system. Default: 172800 (2 days). Can be set for a warehouse and/or a session. The session timeout takes precedence.

Multi-cluster warehouses

  • Scaling up: XS->6XL (manual), scaling out (auto-scaling option): 1M warehouse -> 3M warehouses (max 10 warehouses — can be performed automatically).
  • Maximum 10 clusters, minimum 1.
  • When reducing the size of a warehouse the compute resources are removed only when they are no longer being used to execute any current statements so ongoing queries are not interrupted.
  • Scaling policy: standard (the first warehouse starts immediately, each next one waits 20 sec, the less busy one stops after 2–3 checks) or economy (the first one starts if there is enough queries to run for 6 mins, the less busy one stops after 5–6 checks.

Cache

Users can’t see each other’s results, but the Result Cache can reuse one user’s Result Cache, and present it to another user.

Metadata cache

  • Holds metadata and statistics. Never dumps. Always has up-to-date stats.
  • Stored in the Cloud Service layer to help query optimization (so no warehouse used when performing MIN, COUNT etc.)
  • Stores information about staged files and their ingestion status for 64 days.

Result cache

  • Holds results of every query executed in the past 24 hours as if result was a table, but if reused, the result can be accessible for 31 days. After 31 days the result is purged, and the next time the query is submitted, a new result is generated and persisted.
  • Results are available to all users if the underlying data has not changed (otherwise the caches are automatically invalidated). Functions like CURRENT_TIME cause expiration.
  • User can be different, but ROLE must be the same.
  • Does not incur storage costs.
  • Will not be reused if: the query is modified; underlying data has changed; query uses external functions; select from hybrid tables; persisted result for the previous query is still available.

Warehouse cache

  • Holds raw data locally (sometimes called “Local” or “SSD” or “Data Cache”). Is available as long as the warehouse is active.
  • Larger warehouses have larger cache size, holds data used by SQL queries.
  • Can use partial data and go deeper to Centralized storage for remaining.
  • Any columns accessed during the query are cached (helps to optimize and speed up queries).

Query Profile

  • Shows details and statistics for the overall query and graphical representation of the query processing plan.
  • Areas that may help in optimization: bytes scanned (spilling), number of partitions scanned (pruning).
  • If a Query Profile shows a UnionAll operator with an extra Aggregate operator on top, it means that the query is performing a UNION operation without removing duplicate rows (UNION without ALL).

Query History

There are three ways to see your query history:

  • Query History page (monitors all queries in your account, shows details of these queries including each step of execution, you can see queries from the last 14 days, those results can be filtered and downloaded — but you can download only the results that has been produced during the last 24h, you can view other people’s queries but not results).
  • QUERY_HISTORY view (placed in the ACCOUNT_USAGE schema of the SNOWFLAKE database).
  • QUERY_HISTORY family of table functions (in the INFORMATION_SCHEMA).

Cloning

Zero-copy cloning

  • If created as a clone of a permanent table, this new clone will share micro-paritions with the original table and no additional storage will be used. All micro-partitions between the original and cloned tables are fully shared.
  • Zero-copy cloned tables are read-only.
  • Helps to reduce storage costs because data is not stored twice, but when you add, delete or modify rows in a cloned table or create a new populated table in a cloned schema, it will contribute to the overall data storage cost.
  • Objects that can be cloned: databases, schemas, tables (but not external!), external stages, pipes that use external stages, tasks, streams, roles, file formats, Java UDFs.
  • To create a clone, your role should have these privileges.
  • A cloned table does not include the load history of the source table.
  • A database or schema clone includes all child objects active at the time the statement is executed or at the specified time/point in the past.

Cloning considerations

  • When a database is cloned, it replicates all granted privileges on the corresponding child objects.
  • If the COPY GRANTS parameter is used, then the new object inherits any explicit access privileges granted on the original table but does not inherit any future grants defined for the object type in the schema.
  • The clone is created almost instantly, and you don’t need to re-run the SELECT statement.
  • For databases, cloning a database clones all the schemas and other objects in the database. For schemas, cloning a schema clones all objects in that schema.
  • When cloning a database, schema, or table, a snapshot of the data in each table is taken and made available to the clone. The snapshot represents the state of the source data either at the time the statement is executed or at the specified time/point in the past (using Time Travel).
  • Read more here.

The last third part is available here, and the first part is here.
Like this post? Buy me a coffee.

--

--

No responses yet