During building of git pipeline you may need to add some values which you dont…
Tag: PySpark
Shell : Sync a folder from one Linux instance to another Linux instance in a specific path.
In Linux-based operating systems, it is common to synchronize files between different instances. One way to do this is by…
Shell : Script that iterates all it directory and sub directory and find the word search and print the complete file path.
One common task is searching for a specific word in all files in a directory and its subdirectories. This is…
Shell : Script that takes a file name as an argument and checks if it’s readable or not.
One common task is checking if a file is readable before proceeding with a script. This is useful to prevent…
Shell : Script that checks if a file exists or not and outputs a message accordingly.
One common task is checking if a file exists before proceeding with a script. This is useful to prevent errors…
PySpark : Format phone numbers in a specific way using PySpark
In this article, we’ll be working with a PySpark DataFrame that contains a column of phone numbers. We’ll use PySpark’s…
PySpark : PySpark to extract specific fields from XML data
XML data is commonly used in data exchange and storage, and it can contain complex hierarchical structures. PySpark provides a…
PySpark : Replacing special characters with a specific value using PySpark.
Working with datasets that contain special characters can be a challenge in data preprocessing and cleaning. PySpark provides a simple…
PySpark : Dataset has column that contains a string with multiple values separated by a delimiter.Count the number of occurrences of each value using PySpark.
Counting the number of occurrences of each value in a string column with multiple values separated by a delimiter is…
PySpark : Dataset has datetime column. Need to convert this column to a different timezone.
Working with datetime data in different timezones can be a challenge in data analysis and modeling. PySpark provides a simple…
PySpark : Dataset with columns contain duplicate values, How to to keep only the last occurrence of each value.
Duplicate values in a dataset can cause problems for data analysis and modeling. It is often necessary to remove duplicates…