Apache Spark interview questions

PySpark @ Freshers.in

140. What is the function of cancelAllJobs() in Apache spark ?
Cancel all jobs that have been scheduled or are running.

141. What will cancelJobGroup(groupId) do in Apache spark ?
Cancel active jobs for the specified group. See SparkContext.setJobGroup for more information.

142. What is clearFiles() in Apache Spark ?
Clear the job’s list of files added by addFile or addPyFile so that they do not get downloaded to any new nodes.

143. What is defaultMinPartitions in Apache Spark ?
Default min number of partitions for Hadoop RDDs when not given by user

144. What is defaultParallelism in Apache Spark ?
Default level of parallelism to use when not given by user (e.g. for reduce tasks)

145. What is dump_profiles(path) in Apache Spark ?
Dump the profile stats into directory path

146. What is emptyRDD() in Apache Spark
Create an RDD that has no partitions or elements.

Author: user

Leave a Reply