Aggregate functions are the function that is used to compute against a "returned column of…
Category: article
How to choose the right database for your service – SQL vs NoSQL – Diagrammatic explanation
Structured data is for tabular datastores. Semi-structured data is for NoSQL. Unstructured data is for Blob Storage. [ We will…
What is the use of “Content-type: application/json” in header in AWS API Gateway?
You must include “Content-type: application/json” in the message header when submitting a POST request with a JSON body to my…
What is an AWS API Gateway ? Simply explained – Basics
A server known as an API gateway serves as a single point of access for a collection of microservices. It…
How to transform columns into list of objects [arrays] on top of group by in PySpark – collect_list and collect_set
In this article we will see how to returns a set of objects in an array with or without duplicate…
How to create a table from CSV file and write SQL on top of it in Spark (Sample code)
In this article you will see how you can read a CSV file using pySpark , how to control header…
What is PPA in Ubuntu
PPA (Personal Package Archives) PPAs are software repositories that are created specifically for Ubuntu users and are simpler to install…
In AWS EC2 how to know your ubuntu version using the command line commands
lsb_release lsb stands for Linux Standard Base. Certain LSB (Linux Standard Base) and Distribution information is printed by lsb_release command….
Convert data from the PySpark DataFrame columns to Row format or get elements in columns in row
pyspark.sql.functions.collect_list(col) This is an aggregate function and returns a list of objects with duplicates. To retrieve the data from the PySpark…
How to give empty string as accepted_values in DBT
Jinja filter : as_text If we are giving ‘ ‘ [ two single quotes ] , this will consider as…
How to find if a substring exists in a string or not in Shell/Unix scripting
This will be a common use case when we need to check some substring in a log or some file. …