Skip to content

Issue in CH2-01-Generating Records Using DBKS Labs Datagen.py related to JVM dependency due to Spark Connect #90

@tamaskerekjarto

Description

@tamaskerekjarto

After running the WriteJasonFile function in cell 12 of the Chapter 2: Designing Databricks Day One/Project: Streaming Transactions/CH2-01-Generating Records Using DBKS Labs Datagen.py notebook I get the following error msg:
JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute _jdf is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail.

After some research, it looks like the reduce function in the generateRecordSet(): block is causing the issue due to this not being supported by Spark Connect.
I had to change
return reduce(pyspark.sql.dataframe.DataFrame.unionByName, recordSet)
to
result = recordSet[0] for df in recordSet[1:]: result = result.unionByName(df) return result

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions