Snowflake : Getting sampled row from the specified table in Snowflake [SAMPLE,TABLESAMPLE]

Snowflake

Two of the most useful tools for working with large datasets in Snowflake are the SAMPLE and TABLESAMPLE functions.

The SAMPLE function allows you to randomly sample rows from a table in Snowflake, while the TABLESAMPLE function allows you to randomly sample a percentage of rows from a table.

In this article, we’ll take a closer look at how these functions work, and how you can use them to better understand your data.

Getting Started with Snowflake’s SAMPLE Function

The SAMPLE function in Snowflake is a simple yet powerful tool for randomly selecting rows from a table. To use the SAMPLE function, you simply specify the number of rows you want to select, and Snowflake will return a random subset of that size.

Here’s an example:

SELECT *
FROM freshers_in
SAMPLE (10);

In this example, we’re using the SAMPLE function to select 10 random rows from the freshers_in table. Snowflake will return a subset of 10 rows from the table, chosen at random.

Note that the SAMPLE function can be used in conjunction with other SQL commands, such as WHERE clauses and ORDER BY statements, to further refine your results. For example:

SELECT *
FROM freshers_in
WHERE age >= 25
ORDER BY salary DESC
SAMPLE (5);

In this example, we’re using the SAMPLE function to select 5 random rows from the freshers_in table where the age is greater than or equal to 25, and ordering the results by salary in descending order.

Getting Started with Snowflake’s TABLESAMPLE Function

The TABLESAMPLE function in Snowflake is similar to the SAMPLE function, but instead of specifying a fixed number of rows, you specify a percentage of rows to select. This is useful when you want to get a sense of the overall distribution of your data, but don’t necessarily need to look at every single row.

Here’s an example:

SELECT *
FROM freshers_in
TABLESAMPLE (10);

In this example, we’re using the TABLESAMPLE function to select 10% of the rows from the freshers_in table. Snowflake will return a random subset of rows that make up approximately 10% of the total rows in the table.

Note that the percentage you specify with TABLESAMPLE is approximate – Snowflake uses a statistical algorithm to determine which rows to select, so the actual number of rows returned may be slightly higher or lower than the specified percentage.

You can also use the TABLESAMPLE function in conjunction with other SQL commands, such as WHERE clauses and ORDER BY statements, to further refine your results. For example:

SELECT *
FROM freshers_in
WHERE age >= 25
ORDER BY salary DESC
TABLESAMPLE (5);

In this example, we’re using the TABLESAMPLE function to select approximately 5% of the rows from the freshers_in table where the age is greater than or equal to 25, and ordering the results by salary in descending order.

Snowflake’s SAMPLE and TABLESAMPLE functions are powerful tools for working with large datasets. Whether you need to randomly sample a fixed number of rows, or select a percentage of rows that give you a sense of the overall distribution of your data, these functions make it easy to get the information you need quickly and efficiently.

Snowflake important urls to refer

Author: user

Leave a Reply