Issue in CH2-01-Generating Records Using DBKS Labs Datagen.py related to JVM dependency due to Spark Connect

After running the WriteJasonFile function in cell 12 of the Chapter 2: Designing Databricks Day One/Project: Streaming Transactions/CH2-01-Generating Records Using DBKS Labs Datagen.py notebook I get the following error msg:
`JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `_jdf` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail.`

After some research, it looks like the reduce function in the `generateRecordSet():` block is causing the issue due to this not being supported by Spark Connect. 
I had to change 
`return reduce(pyspark.sql.dataframe.DataFrame.unionByName, recordSet)` 
to
`result = recordSet[0]
  for df in recordSet[1:]:
    result = result.unionByName(df)
  return result`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue in CH2-01-Generating Records Using DBKS Labs Datagen.py related to JVM dependency due to Spark Connect #90

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue in CH2-01-Generating Records Using DBKS Labs Datagen.py related to JVM dependency due to Spark Connect #90

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions