Skip to content

ampcamp6: parquet read & tachyon config bugs #207

@tranlm

Description

@tranlm

Running through the exercise code, here are some issues I found:

Data Exploration using Spark SQL page:

  1. "parquetFile" has been deprecated and the resulting code should be changed to
    wikiData = sqlCtx.read.parquet("data/wiki_parquet")

Explore In-Memory Data Store Tachyon page:

  1. the "tachyon" folder is now a subfolder of spark
  2. TACHYON_WORKER_MEMORY_SIZE is already set at 1GB
  3. When I try to format the storage using the command "tachyon format", class tachyon.Format cannot be found:
    to fix:
  export TACHYON_JARS="$TACHYON_HOME/../lib/tachyon-assemblies-${VERSION}-jar-with-dependencies.jar"
  1. the command "tachyon runTests" fails all the tests
  2. In the section "Run Spark on Tachyon", the command " ./bin/spark-shell" is specific to only Scala. Should be generalized for users using other languages, e.g. Python

Querying compressed RDDs with Succinct Spark page:

  1. Correct "articleIds.count" to say "articleIdsRDD.count"
  2. "val succinctWikiKV = wikiKV.map(t => (t._1, t._2.getBytes).succinctKV" is missing an ending parentheses, i.e. ")".
  3. Should combine
val wikiKV2 = sc.textFile("data/succinct/wiki-large.txt")
    .map(_.split('|'))
    .map(t => (t(0).toLong, t(1)))

into one line

val wikiKV2 = sc.textFile("data/succinct/wiki-large.txt").map(_.split('|')).map(t => (t(0).toLong, t(1)))
  1. Change
val wikiSuccinctKV2 = sc.succinctKV[Long]("data/succinct/succinct-wiki-large")
wikiSuccinctKV2.count

to

val succinctWikiKV2 = sc.succinctKV[Long]("data/succinct/succinct-wiki-large")
succinctWikiKV2.count
  1. Change "val articleIdsRDD3= succinctWikiKV3.regexSearch("(stanford|berkeley).edu")" to "val articleIdsRDD3= succinctWikiKV2.regexSearch("(stanford|berkeley).edu")"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions