After running the WriteJasonFile function in cell 12 of the Chapter 2: Designing Databricks Day One/Project: Streaming Transactions/CH2-01-Generating Records Using DBKS Labs Datagen.py notebook I get the following error msg:
JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute _jdf is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail.
After some research, it looks like the reduce function in the generateRecordSet(): block is causing the issue due to this not being supported by Spark Connect.
I had to change
return reduce(pyspark.sql.dataframe.DataFrame.unionByName, recordSet)
to
result = recordSet[0] for df in recordSet[1:]: result = result.unionByName(df) return result