To write Scala code in the Spark shell, you can simply start the Spark shell by running the command “spark-shell” in your terminal. This will launch the Spark shell and give you a prompt where you can enter Scala code.
For example, you can create a RDD (Resilient Distributed Dataset) by using the following command:
val data = sc.parallelize(1 to 100)
This will create an RDD with the numbers 1 to 100.
You can then perform various operations on the RDD, such as filtering or mapping:
val evens = data.filter(_ % 2 == 0)
This filters the RDD to only include even numbers.
val squared = evens.map(x => x*x)
This maps the RDD to square each number in the RDD.
You can also perform actions on the RDD, such as counting the number of elements:
val count = squared.count()
This counts the number of elements in the RDD.
You can also save the RDD to a file with the following command:
squared.saveAsTextFile("output.txt")
This saves the RDD to a text file named “output.txt”.
You can also use the scala code in the Spark shell by creating a scala file, let’s say ‘example.scala’ and then load the file by using the following command :
:load example.scala
All the above commands are executed in spark shell, not in scala repl.
Spark important urls to refer