Amazon Redshift interview questions

11. What are cluster in Redshift?
A cluster is the core unit of operations in the Amazon Redshift data warehouse. Each Redshift cluster is composed of two main components: Compute Node, which has its own dedicated CPU, memory, and disk storage. Compute nodes store data and execute queries and you can have many nodes in one cluster.

12. What are the thing that you suggest when working with Amazon Redshift?
Compress columns to reduce IOs, this is one of the most costly operation when running queries, there is good documentation in Redshift Help, to choose a good algorithm depending the column type.
Sort large tables in a way that most queries can be handled by a limited table scan (usually a date is a good choice for fact tables, event date for ex). Compound sort keys could be evaluated, but this requires to benchmark gains or losses on the major queries you are encountering
You should not compress distribution or sort keys so the filtering by block remains efficient (this could be surprising but it works: for example, in an extreme case, let’s imagine sort key values are all stored in one block of data, while another column uses 100 blocks, the filtering by a sort key value won’t work, other column’s 100 blocks are read anyway).
Try to use same distribution key in fact tables, even if it means to add a missing key in some of them. By doing this & including them in fact tables joins (even if not strictly required by the model), it helps Redshift to understand the join can be performed locally in every slice
Regularly vacuum & analyse tables, don’t forget to reclaim space if delete/update occur frequently on some tables. One a week should be enough.

13. What is Amazon Redshift ODBC?
ODBC Driver for Amazon Redshift provides a high-performance and feature-rich connectivity solution for ODBC-based applications to access Amazon Redshift from Windows, macOS, Linux, both 32-bit and 64-bit.

14. How does Amazon Redshift keep my data secure?
Amazon Redshift encrypts and keeps your data secure in transit and at rest using industry-standard encryption techniques. To keep data secure in transit, Amazon Redshift supports SSL-enabled connections between your client application and your Redshift data warehouse cluster. To keep your data secure at rest, Amazon Redshift encrypts each block using hardware-accelerated AES-256 as it is written to disk.

15. What is COPY command ?
You run COPY commands to load each of the tables in the SSB schema. The target table for the COPY command. The table must already exist in the database. The table can be temporary or persistent. The COPY command appends the new input data to any existing rows in the table.
COPY table_name [ column_list ] FROM data_source CREDENTIALS access_credentials [options]

Author: user

Leave a Reply