Warehouse Considerations — Snowflake Documentation (2024)

.

Related Topics

  • Working with Resource Monitors

  • Data Loading Considerations

This topic provides general guidelines and best practices for using virtual warehouses in Snowflake to process queries. It does not provide specific or absolute numbers, values,or recommendations because every query scenario is different and is affected by numerous factors, including number of concurrent users/queries, number of tables being queried, and data size andcomposition, as well as your specific requirements for warehouse availability, latency, and cost.

It also does not cover warehouse considerations for data loading, which are covered in another topic (see the sidebar).

The keys to using warehouses effectively and efficiently are:

  1. Experiment with different types of queries and different warehouse sizes to determine the combinations that best meet your specific query needs and workload.

  2. Don’t focus on warehouse size. Snowflake utilizes per-second billing, so you can run larger warehouses (Large, X-Large, 2X-Large, etc.) and simply suspend them when not in use.

Note

These guidelines and best practices apply to both single-cluster warehouses, which are standard for all accounts, and multi-cluster warehouses,which are available in Snowflake Enterprise Edition (and higher).

In this Topic:

  • How are Credits Charged for Warehouses?

  • How Does Query Composition Impact Warehouse Processing?

  • How Does Warehouse Caching Impact Queries?

  • Creating a Warehouse

    • Selecting an Initial Warehouse Size

    • Automating Warehouse Suspension

    • Automating Warehouse Resumption

  • Scaling Up vs Scaling Out

    • Warehouse Resizing Improves Performance

    • Multi-cluster Warehouses Improve Concurrency

How are Credits Charged for Warehouses?

Credit charges are calculated based on:

  • The warehouse size.

  • The number of clusters (if using multi-cluster warehouses).

  • The length of time the compute resources in each cluster runs.

For example:

X-Small

Bills 1 credit per full, continuous hour that each cluster runs; each successive size generally doubles the number of computeresources per warehouse.

4X-Large

Bills 128 credits per full, continuous hour that each cluster runs.

Note the following:

  • When compute resources are provisioned for a warehouse:

    • The minimum billing charge for provisioning compute resources is 1 minute (i.e. 60 seconds).

    • There is no benefit to stopping a warehouse before the first 60-second period is over because the credits have alreadybeen billed for that period.

    • After the first 60 seconds, all subsequent billing for a running warehouse is per-second (until all its compute resources are shut down).Three examples are provided below:

      • If a warehouse runs for 30 to 60 seconds, it is billed for 60 seconds.

      • If a warehouse runs for 61 seconds, it is billed for only 61 seconds.

      • If a warehouse runs for 61 seconds, shuts down, and then restarts and runs for less than 60 seconds, it is billed for 121 seconds (60 + 1 + 60).

  • Resizing a warehouse provisions additional compute resources for each cluster in the warehouse:

    • This results in a corresponding increase in the number of credits billed for the warehouse (while the additional compute resources arerunning).

    • The additional compute resources are billed when they are provisioned (i.e. credits for the additional resources are billed relativeto the time when the warehouse was resized).

      Preview Feature — Open

      Larger (5X-Large and 6X-Large) warehouse sizes are generally available inAWS regions and in preview in US Government and Azure regions.

    • Resizing between a 5XL or 6XL warehouse to a 4XL or smaller warehouse results in a brief period during which the customer ischarged for both the new warehouse and the old warehouse while the old warehouse is quiesced.

  • Credit usage is displayed in hour increments. With per-second billing, you will see fractional amounts for credit usage/billing.

How Does Query Composition Impact Warehouse Processing?

The compute resources required to process a query depends on the size and complexity of the query. For the most part, queries scale linearly with regards to warehouse size, particularly forlarger, more complex queries. When considering factors that impact query processing, consider the following:

  • The overall size of the tables being queried has more impact than the number of rows.

  • Query filtering using predicates has an impact on processing, as does the number of joins/tables in the query.

Tip

To achieve the best results, try to execute relatively hom*ogeneous queries (size, complexity, data sets, etc.) on the same warehouse; executing queries of widely-varying size and/orcomplexity on the same warehouse makes it more difficult to analyze warehouse load, which can make it more difficult to select the best size to match the size, composition, and number ofqueries in your workload.

How Does Warehouse Caching Impact Queries?

Each warehouse, when running, maintains a cache of table data accessed as queries are processed by the warehouse. This enables improvedperformance for subsequent queries if they are able to read from the cache instead of from the table(s) in the query. The size of the cacheis determined by the compute resources in the warehouse (i.e. the larger the warehouse and, therefore, more compute resources in thewarehouse), the larger the cache.

This cache is dropped when the warehouse is suspended, which may result in slower initial performance for some queries after the warehouse is resumed. As the resumed warehouse runs and processesmore queries, the cache is rebuilt, and queries that are able to take advantage of the cache will experience improved performance.

Keep this in mind when deciding whether to suspend a warehouse or leave it running. In other words, consider the trade-off between saving credits by suspending a warehouse versus maintaining thecache of data from previous queries to help with performance.

Creating a Warehouse

When creating a warehouse, the two most critical factors to consider, from a cost and performance perspective, are:

  • Warehouse size (i.e. available compute resources)

  • Manual vs automated management (for starting/resuming and suspending warehouses).

The number of clusters in a warehouse is also important if you are using Snowflake Enterprise Edition (or higher) andmulti-cluster warehouses. For more details, see Scaling Up vs Scaling Out (in this topic).

Selecting an Initial Warehouse Size

The initial size you select for a warehouse depends on the task the warehouse is performing and the workload it processes. For example:

  • For data loading, the warehouse size should match the number of files being loaded and the amount of data in each file. For more details, see Planning a Data Load.

  • For queries in small-scale testing environments, smaller warehouses sizes (X-Small, Small, Medium) may be sufficient.

  • For queries in large-scale production environments, larger warehouse sizes (Large, X-Large, 2X-Large, etc.) may be more cost effective.

However, note that per-second credit billing and auto-suspend give you the flexibility to start with larger sizes and then adjust the size to match your workloads. You can always decrease the sizeof a warehouse at any time.

Also, larger is not necessarily faster for smaller, more basic queries. Small/simple queries typically do not need an X-Large (or larger) warehouse because they do not necessarily benefit from theadditional resources, regardless of the number of queries being processed concurrently. In general, you should try to match the size of the warehouse to the expected size and complexity of thequeries to be processed by the warehouse.

Tip

Experiment by running the same queries against warehouses of multiple sizes (e.g. X-Large, Large, Medium). The queries you experiment with should be of a size and complexity that you know willtypically complete within 5 to 10 minutes (or less).

Automating Warehouse Suspension

Warehouses can be set to automatically suspend when there’s no activity after a specified period of time. Auto-suspend is enabled by specifying the time period (minutes, hours, etc.) of inactivityfor the warehouse.

We recommend setting auto-suspend according to your workload and your requirements for warehouse availability:

  • If you enable auto-suspend, we recommend setting it to a low value (e.g. 5 or 10 minutes or less) because Snowflake utilizes per-second billing. This will help keep your warehouses from running(and consuming credits) when not in use.

    However, the value you set should match the gaps, if any, in your query workload. For example, if you have regular gaps of 2 or 3 minutes between incoming queries, it doesn’t make sense to setauto-suspend to 1 or 2 minutes because your warehouse will be in a continual state of suspending and resuming (if auto-resume is also enabled) and each time it resumes, you are billed for theminimum credit usage (i.e. 60 seconds).

  • You might want to consider disabling auto-suspend for a warehouse if:

    • You have a heavy, steady workload for the warehouse.

    • You require the warehouse to be available with no delay or lag time. Warehouse provisioning is generally very fast (e.g. 1 or 2seconds); however, depending on the size of the warehouse and the availability of compute resources to provision, it can take longer.

Important

If you chose to disable auto-suspend, please carefully consider the costs associated with running a warehouse continually, even when the warehouse is not processing queries. The costscan be significant, especially for larger warehouses (X-Large, 2X-Large, etc.).

To disable auto-suspend, you must explicitly select Never in the web interface, or specify 0 or NULL in SQL.

Automating Warehouse Resumption

Warehouses can be set to automatically resume when new queries are submitted.

We recommend enabling/disabling auto-resume depending on how much control you wish to exert over usage of a particular warehouse:

  • If cost and access are not an issue, enable auto-resume to ensure that the warehouse starts whenever needed. Keep in mind that there might be a short delay in the resumption of the warehousedue to provisioning.

  • If you wish to control costs and/or user access, leave auto-resume disabled and instead manually resume the warehouse only when needed.

Scaling Up vs Scaling Out

Snowflake supports two ways to scale warehouses:

  • Scale up by resizing a warehouse.

  • Scale out by adding clusters to a multi-cluster warehouse (requires Snowflake Enterprise Edition orhigher).

Warehouse Resizing Improves Performance

Resizing a warehouse generally improves query performance, particularly for larger, more complex queries. It can also help reduce thequeuing that occurs if a warehouse does not have enough compute resources to process all the queries that are submitted concurrently. Notethat warehouse resizing is not intended for handling concurrency issues; instead, use additional warehouses to handle the workload or use amulti-cluster warehouse (if this feature is available for your account).

Snowflake supports resizing a warehouse at any time, even while running. If a query is running slowly and you have additional queries of similar size and complexity that you want to run on the samewarehouse, you might choose to resize the warehouse while it is running; however, note the following:

  • As stated earlier about warehouse size, larger is not necessarily faster; for smaller, basic queries that are already executing quickly,you may not see any significant improvement after resizing.

  • Resizing a running warehouse does not impact queries that are already being processed by the warehouse; the additional compute resources,once fully provisioned, are only used for queued and new queries.

    Preview Feature — Open

    Larger (5X-Large and 6X-Large) warehouse sizes are generally available inAWS regions and in preview in US Government and Azure regions.

  • Resizing between a 5XL or 6XL warehouse to a 4XL or smaller warehouse results in a brief period during which the customer is chargedfor both the new warehouse and the old warehouse while the old warehouse is quiesced.

Tip

Decreasing the size of a running warehouse removes compute resources from the warehouse. When the computer resources are removed, thecache associated with those resources is dropped, which can impact performance in the same way that suspending the warehouse can impactperformance after it is resumed.

Keep this in mind when choosing whether to decrease the size of a running warehouse or keep it at the current size. In other words, thereis a trade-off with regards to saving credits versus maintaining the cache.

Multi-cluster Warehouses Improve Concurrency

Enterprise Edition Feature

To inquire about upgrading to Enterprise Edition, please contact Snowflake Support.

Multi-cluster warehouses are designed specifically for handling queuing and performance issues related to large numbers of concurrent users and/orqueries. In addition, multi-cluster warehouses can help automate this process if your number of users/queries tend to fluctuate.

When deciding whether to use multi-cluster warehouses and the number of clusters to use per multi-cluster warehouse, consider thefollowing:

  • If you are using Snowflake Enterprise Edition (or a higher edition), all your warehouses should be configured as multi-cluster warehouses.

  • Unless you have a specific requirement for running in Maximized mode, multi-cluster warehouses should be configured to run in Auto-scalemode, which enables Snowflake to automatically start and stop clusters as needed.

  • When choosing the minimum and maximum number of clusters for a multi-cluster warehouse:

    Minimum

    Keep the default value of 1; this ensures that additional clusters are only started as needed. However, ifhigh-availability of the warehouse is a concern, set the value higher than 1. This helps ensure multi-cluster warehouse availabilityand continuity in the unlikely event that a cluster fails.

    Maximum

    Set this value as large as possible, while being mindful of the warehouse size and corresponding credit costs. For example, anX-Large multi-cluster warehouse with maximum clusters = 10 will consume 160 credits in an hour if all 10 clusters runcontinuously for the hour.

Warehouse Considerations — Snowflake Documentation (2024)
Top Articles
Latest Posts
Article information

Author: Errol Quitzon

Last Updated:

Views: 6674

Rating: 4.9 / 5 (59 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Errol Quitzon

Birthday: 1993-04-02

Address: 70604 Haley Lane, Port Weldonside, TN 99233-0942

Phone: +9665282866296

Job: Product Retail Agent

Hobby: Computer programming, Horseback riding, Hooping, Dance, Ice skating, Backpacking, Rafting

Introduction: My name is Errol Quitzon, I am a fair, cute, fancy, clean, attractive, sparkling, kind person who loves writing and wants to share my knowledge and understanding with you.