Structured data is for tabular datastores. Semi-structured data is for NoSQL. Unstructured data is for…
Author: user
Descriptive vs Diagnostic vs Predictive vs Prescriptive: 4 type of Analytics
a. Descriptive: This tells you what happened in the past. You will get the data from the past and report…
Excel shortcuts for daily use tasks
Shortcuts that you can use for your daily tasks 1 CTRL*A Select All 2 CTRL+C Copy all Cells in Highlighted…
What are the Optimization Techniques that you can apply on Apache Hive ?
1. Partitioning : Partitioning works by dividing the data into smaller segments, These are created using logical grouping based on…
How to replace a value with another value in a column in Pyspark Dataframe ?
In PySpark we can replace a value in one column or multiple column or multiple values in a column to…
How to drop nulls in a dataframe : PySpark
For most of the data cleansing the first thing that you may need to do drop the nulls in the…
Why sqitch init snowflake cannot determine Snowflake account name ?
Currently supported databases by Sqitch’s database change management tool include Snowflake’s Cloud Data Warehouse as well as PostgreSQL 8.4+, SQLite…
In Spark how to replace null value for all columns or for each column separately-PySpark (na.fill)
Spark api : pyspark.sql.DataFrameNaFunctions.fill Syntax : fill(value, subset=None) value : “value” can only be int, long, float, string, bool or…
How to create an array containing a column repeated count times – PySpark
For repeating array elements k times in PySpark we can use the below library. Library : pyspark.sql.functions.array_repeat array_repeat is a…
OOPS interview questions for freshers and experienced
1. What is OOPS? OOPS is abbreviated as Object Oriented Programming system in which programs are considered as a collection…
AI for Solving Quantitative Reasoning Problems – Minerva
Google AI Introduces Minerva: A Natural Language Processing (NLP) Model for solvingĀ Mathematical Questions Solving mathematical and scientific questions was…