BDSGOLD-301. Enable test_pyspark_shell test to pass on a fresh cluster with spark-2.3.2 as default.#14
BDSGOLD-301. Enable test_pyspark_shell test to pass on a fresh cluster with spark-2.3.2 as default.#14anil-altiscale wants to merge 1 commit intobranch-2.3.2-altifrom
Conversation
…r with spark-2.3.2 as default
alee-altiscale
left a comment
There was a problem hiding this comment.
I have some questions on this investigation :)
| hdfs dfs -put $spark_home/README.md spark/test/ | ||
|
|
||
| # Including spark README.md in test_data to differentiate from sparkexample README.md | ||
| hdfs dfs -put "$spark_test_dir/test_data/README.md" spark/test/ |
There was a problem hiding this comment.
I notice that the README.md do exist in both https://github.com/Altiscale/sparkexample/blob/branch-2.3.2-alti/README.md and https://github.com/Altiscale/spark/blob/branch-2.3.2-alti/README.md although the one in sparkexample is pretty empty with one line of content.
The subject is misleading as well, it is enabling? I thought this test case always run, it is enabled all the time https://github.com/Altiscale/sparkexample/blob/branch-2.3.2-alti/run_all_test.kerberos.sh#L11
There was a problem hiding this comment.
Yep. The README.md file is present in both the repositories. This has more to do with the spark rpms. The test picks up README.md from $spark_home/ directory and hence should have been ideally present inside /opt/alti-spark-2.3.2/ directory. I am guessing while setting up the environment, the script does not copy the README file(https://github.com/Altiscale/sparkbuild/blob/branch-2.3.2-alti/scripts/justinstall.sh) as it does for 1.6.1(https://github.com/Altiscale/sparkbuild/blob/branch-1.6-alti/scripts/spark.spec#L365) and 2.1.1
(https://github.com/Altiscale/sparkbuild/blob/branch-2.1.1-alti/scripts/spark.spec#L314).
I thought we did not want to mess with the spark rpm and discussed that we could include it as a part of sparkexample rpm. Also, I just wanted the README.md to have some significant content and wanted to avoid conflict with already existing sparkexample README, hence included it in the test_data directory.
We could include a change to copy over the README.md in sparkbuild repo as well => changes to spark rpm.
For the subject, yep, I think I might need some other wordings.
| pushd `pwd` | ||
| cd $spark_home | ||
| hdfs dfs -mkdir -p spark/test/ | ||
| hdfs dfs -put $spark_home/README.md spark/test/ |
There was a problem hiding this comment.
Will removing this break test suite on existing clusters?
As far as I know, test suites are often ran after a maintenance to verify the status of a cluster.
The
test_pyspark_shell.shplaces README.md from /opt/spark-x.x.x directory into hdfs directory which is used bypyspark_shell_examples.pyto create TF vectors. Including README.md inside test_data directory to avoid conflict with README.md in test_spark directory.