PySpark : How to write Scala code in spark shell ?

PySpark @ Freshers.in

To write Scala code in the Spark shell, you can simply start the Spark shell by running the command “spark-shell” in your terminal. This will launch the Spark shell and give you a prompt where you can enter Scala code.

For example, you can create a RDD (Resilient Distributed Dataset) by using the following command:

val data = sc.parallelize(1 to 100)

This will create an RDD with the numbers 1 to 100.

You can then perform various operations on the RDD, such as filtering or mapping:

val evens = data.filter(_ % 2 == 0)

This filters the RDD to only include even numbers.

val squared = evens.map(x => x*x)

This maps the RDD to square each number in the RDD.

You can also perform actions on the RDD, such as counting the number of elements:

val count = squared.count()

This counts the number of elements in the RDD.

You can also save the RDD to a file with the following command:

squared.saveAsTextFile("output.txt")

This saves the RDD to a text file named “output.txt”.

You can also use the scala code in the Spark shell by creating a scala file, let’s say ‘example.scala’ and then load the file by using the following command :

:load example.scala

All the above commands are executed in spark shell, not in scala repl.

Author: user

Leave a Reply