PySpark : cannot import name ‘RowMatrix’ from ‘pyspark.ml.linalg’

PySpark @ Freshers.in

The RowMatrix class was actually part of the older version of PySpark (before version 3.0), which was under the pyspark.mllib.linalg module.

Starting from PySpark version 3.0, the RowMatrix class has been deprecated and replaced with the DenseMatrix and SparseMatrix classes in the pyspark.ml.linalg module. Therefore, you can no longer import the RowMatrix class in PySpark version 3.0 and later.

If you need to create a distributed matrix of row vectors in PySpark version 3.0 and later, you can use the DenseMatrix or SparseMatrix classes instead. Here’s an example of how to create a DenseMatrix from an RDD of row vectors:

from pyspark.sql import SparkSession
from pyspark.ml.linalg import Vectors, DenseMatrix
# create a SparkSession object
spark = SparkSession.builder.appName("DenseMatrixExample").getOrCreate()
# create an RDD of row vectors
rows = spark.sparkContext.parallelize([
    Vectors.dense([1.0, 2.0, 3.0]),
    Vectors.dense([4.0, 5.0, 6.0]),
    Vectors.dense([7.0, 8.0, 9.0])
])
# convert the RDD of row vectors to a list of arrays
data = rows.map(lambda x: x.toArray()).collect()
# create a DenseMatrix from the list of arrays
mat = DenseMatrix(numRows=len(data), numCols=len(data[0]), values=data)
# print the DenseMatrix
print(mat)

In this example, we first create an RDD of row vectors using the parallelize method. We then convert the RDD of row vectors to a list of arrays using the map method and the toArray method of each row vector. We create a DenseMatrix from the list of arrays using the DenseMatrix constructor. Finally, we print the DenseMatrix.

Spark important urls to refer

  1. Spark Examples
  2. PySpark Blogs
  3. Bigdata Blogs
  4. Spark Interview Questions
  5. Official Page
Author: user

Leave a Reply