-
Notifications
You must be signed in to change notification settings - Fork 118
Open
Description
Running through the exercise code, here are some issues I found:
Data Exploration using Spark SQL page:
- "parquetFile" has been deprecated and the resulting code should be changed to
wikiData = sqlCtx.read.parquet("data/wiki_parquet")
Explore In-Memory Data Store Tachyon page:
- the "tachyon" folder is now a subfolder of spark
- TACHYON_WORKER_MEMORY_SIZE is already set at 1GB
- When I try to format the storage using the command "tachyon format", class tachyon.Format cannot be found:
to fix:
export TACHYON_JARS="$TACHYON_HOME/../lib/tachyon-assemblies-${VERSION}-jar-with-dependencies.jar"
- the command "tachyon runTests" fails all the tests
- In the section "Run Spark on Tachyon", the command " ./bin/spark-shell" is specific to only Scala. Should be generalized for users using other languages, e.g. Python
Querying compressed RDDs with Succinct Spark page:
- Correct "articleIds.count" to say "articleIdsRDD.count"
- "val succinctWikiKV = wikiKV.map(t => (t._1, t._2.getBytes).succinctKV" is missing an ending parentheses, i.e. ")".
- Should combine
val wikiKV2 = sc.textFile("data/succinct/wiki-large.txt")
.map(_.split('|'))
.map(t => (t(0).toLong, t(1)))
into one line
val wikiKV2 = sc.textFile("data/succinct/wiki-large.txt").map(_.split('|')).map(t => (t(0).toLong, t(1)))
- Change
val wikiSuccinctKV2 = sc.succinctKV[Long]("data/succinct/succinct-wiki-large")
wikiSuccinctKV2.count
to
val succinctWikiKV2 = sc.succinctKV[Long]("data/succinct/succinct-wiki-large")
succinctWikiKV2.count
- Change "val articleIdsRDD3= succinctWikiKV3.regexSearch("(stanford|berkeley).edu")" to "val articleIdsRDD3= succinctWikiKV2.regexSearch("(stanford|berkeley).edu")"
Metadata
Metadata
Assignees
Labels
No labels