Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
301 commits
Select commit Hold shift + click to select a range
acf68c1
Merge branch 'master' into testSVD
Dec 2, 2016
2dc0d7e
[SPARK-18324][ML][DOC] Update ML programming and migration guide for …
yanboliang Dec 3, 2016
a9cbfc4
[SPARK-18690][PYTHON][SQL] Backward compatibility of unbounded frames
zero323 Dec 3, 2016
c7c7265
[SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOT
rxin Dec 3, 2016
7c33b0f
[SPARK-18362][SQL] Use TextFileFormat in implementation of CSVFileFormat
JoshRosen Dec 3, 2016
553aac5
[SPARK-18586][BUILD] netty-3.8.0.Final.jar has vulnerability CVE-2014…
srowen Dec 3, 2016
d1312fb
[SPARK-18685][TESTS] Fix URI and release resources after opening in t…
HyukjinKwon Dec 3, 2016
5761973
[SPARK-18638][BUILD] Upgrade sbt, Zinc, and Maven plugins
weiqingy Dec 3, 2016
4a3c096
[SPARK-18582][SQL] Whitelist LogicalPlan operators allowed in correla…
nsyca Dec 3, 2016
3477718
[SPARK-18081][ML][DOCS] Add user guide for Locality Sensitive Hashing…
Yunni Dec 4, 2016
edb0ad9
[MINOR][README] Correct Markdown link inside readme
linbojin Dec 4, 2016
e463678
[SPARK-18091][SQL] Deep if expressions cause Generated SpecificUnsafe…
Dec 4, 2016
d9eb4c7
[SPARK-18661][SQL] Creating a partitioned datasource table should not…
ericl Dec 4, 2016
62947a6
change the way of genearting random matrix.
Dec 5, 2016
b019b3a
[SPARK-18643][SPARKR] SparkR hangs at session start when installed as…
felixcheung Dec 5, 2016
e9730b7
[SPARK-18702][SQL] input_file_block_start and input_file_block_length
rxin Dec 5, 2016
bdfe7f6
[SPARK-18625][ML] OneVsRestModel should support setFeaturesCol and se…
zhengruifeng Dec 5, 2016
eb8dd68
[SPARK-18279][DOC][ML][SPARKR] Add R examples to ML programming guide.
yanboliang Dec 5, 2016
410b789
[MINOR][DOC] Use SparkR `TRUE` value and add default values for `Stru…
dongjoon-hyun Dec 5, 2016
2460128
[SPARK-18694][SS] Add StreamingQuery.explain and exception to Python …
zsxwing Dec 5, 2016
01a7d33
[SPARK-18711][SQL] should disable subexpression elimination for Lambd…
cloud-fan Dec 5, 2016
5a92dc7
[DOCS][MINOR] Update location of Spark YARN shuffle jar
nchammas Dec 5, 2016
18eaabb
[SPARK-18719] Add spark.ui.showConsoleProgress to configuration docs
nchammas Dec 5, 2016
3ba69b6
[SPARK-18634][PYSPARK][SQL] Corruption and Correctness issues with ex…
viirya Dec 6, 2016
1b2785c
[SPARK-18729][SS] Move DataFrame.collect out of synchronized block in…
zsxwing Dec 6, 2016
bb57bfe
[SPARK-18657][SPARK-18668] Make StreamingQuery.id persists across res…
tdas Dec 6, 2016
2398fde
[SPARK-18720][SQL][MINOR] Code Refactoring of withColumn
gatorsmile Dec 6, 2016
508de38
[SPARK-18555][SQL] DataFrameNaFunctions.fill miss up original values …
Dec 6, 2016
4af142f
[SPARK-18722][SS] Move no data rate limit from StreamExecution to Pro…
zsxwing Dec 6, 2016
772ddbe
[SPARK-18572][SQL] Add a method `listPartitionNames` to `ExternalCata…
Dec 6, 2016
b8c7b8d
[SPARK-18672][CORE] Close recordwriter in SparkHadoopMapReduceWriter …
HyukjinKwon Dec 6, 2016
7863c62
[SPARK-18721][SS] Fix ForeachSink with watermark + append
zsxwing Dec 6, 2016
381ef4e
[SPARK-18634][SQL][TRIVIAL] Touch-up Generate
hvanhovell Dec 6, 2016
05d416f
[SPARK-18740] Log spark.app.name in driver logs
peterableda Dec 6, 2016
cb1f10b
[SPARK-18714][SQL] Add a simple time function to SparkSession
rxin Dec 6, 2016
1ef6b29
[SPARK-18671][SS][TEST] Added tests to ensure stability of that all S…
tdas Dec 6, 2016
fac5b75
[SPARK-18374][ML] Incorrect words in StopWords/english.txt
YY-OnCall Dec 6, 2016
eeed38e
[SPARK-18744][CORE] Remove workaround for Netty memory leak
zsxwing Dec 6, 2016
bd9a4a5
[SPARK-18652][PYTHON] Include the example data and third-party licens…
lins05 Dec 6, 2016
7f31d37
[SPARK-18697][BUILD] Upgrade sbt plugins
weiqingy Dec 6, 2016
a8ced76
[SPARK-18171][MESOS] Show correct framework address in mesos master w…
lins05 Dec 6, 2016
81e5619
[SPARK-18662] Move resource managers to separate directory
foxish Dec 7, 2016
4cc8d89
Revert "[SPARK-18697][BUILD] Upgrade sbt plugins"
srowen Dec 7, 2016
539bb3c
[SPARK-18734][SS] Represent timestamp in StreamingQueryProgress as fo…
tdas Dec 7, 2016
01c7c6b
Update Spark documentation to provide information on how to create Ex…
c-sahuja Dec 7, 2016
08d6441
Closes stale & invalid pull requests.
rxin Dec 7, 2016
5c6bcdb
[SPARK-18671][SS][TEST-MAVEN] Follow up PR to fix test for Maven
tdas Dec 7, 2016
81efa90
change the way of generating the random matrix and store the transpos…
Dec 7, 2016
90b59d1
[SPARK-18686][SPARKR][ML] Several cleanup and improvements for spark.…
yanboliang Dec 7, 2016
b828027
[SPARK-18701][ML] Fix Poisson GLM failure due to wrong initialization
actuaryzhang Dec 7, 2016
79f5f28
[SPARK-18678][ML] Skewed reservoir sampling in SamplingUtils
srowen Dec 7, 2016
c496d03
[SPARK-18208][SHUFFLE] Executor OOM due to a growing LongArray in Byt…
Dec 7, 2016
f1fca81
[SPARK-17760][SQL] AnalysisException with dataframe pivot when groupB…
aray Dec 7, 2016
dbf3e29
[SPARK-18764][CORE] Add a warning log when skipping a corrupted file
zsxwing Dec 7, 2016
bb94f61
[SPARK-18762][WEBUI] Web UI should be http:4040 instead of https:4040
sarutak Dec 7, 2016
ff85dd7
modify spectralNormEst for generating random vector and normalizing t…
Dec 7, 2016
edc87e1
[SPARK-18588][TESTS] Fix flaky test: KafkaSourceStressForDontFailOnDa…
zsxwing Dec 7, 2016
70b2bf7
[SPARK-18754][SS] Rename recentProgresses to recentProgress
marmbrus Dec 7, 2016
bec0a92
[SPARK-18654][SQL] Remove unreachable patterns in makeRootConverter
Dec 8, 2016
aad1120
[SPARK-18633][ML][EXAMPLE] Add multiclass logistic regression summary…
wangmiao1981 Dec 8, 2016
9ab725e
[SPARK-18758][SS] StreamingQueryListener events from a StreamingQuery…
tdas Dec 8, 2016
8225361
[SPARK-18705][ML][DOC] Update user guide to reflect one pass solver f…
sethah Dec 8, 2016
9725549
[SPARK-18326][SPARKR][ML] Review SparkR ML wrappers API for 2.1
yanboliang Dec 8, 2016
330fda8
Close stale pull requests.
rxin Dec 8, 2016
b47b892
[SPARK-18774][CORE][SQL] Ignore non-existing files when ignoreCorrupt…
zsxwing Dec 8, 2016
9bf8f3c
[SPARK-18325][SPARKR][ML] SparkR ML wrappers example code and user guide
yanboliang Dec 8, 2016
7f3c778
[SPARK-18718][TESTS] Skip some test failures due to path length limit…
HyukjinKwon Dec 8, 2016
6a5a725
[SPARK-18667][PYSPARK][SQL] Change the way to group row in BatchEvalP…
viirya Dec 8, 2016
b44d1b8
[SPARK-18662][HOTFIX] Add new resource-managers directories to SparkL…
Dec 8, 2016
ed8869e
[SPARK-8617][WEBUI] HistoryServer: Include in-progress files during c…
seyfe Dec 8, 2016
3c68944
[SPARK-16589] [PYTHON] Chained cartesian produces incorrect number of…
aray Dec 8, 2016
c3d3a9d
[SPARK-18590][SPARKR] build R source package when making distribution
felixcheung Dec 8, 2016
26432df
[SPARK-18751][CORE] Fix deadlock when SparkContext.stop is called in …
zsxwing Dec 8, 2016
f43fb9b
Use different seed for generating random matrix and vectors in partia…
Dec 8, 2016
5f894d2
[SPARK-18760][SQL] Consistent format specification for FileFormats
rxin Dec 8, 2016
3261e25
Close stale PRs.
rxin Dec 8, 2016
202fcd2
[SPARK-18590][SPARKR] Change the R source build to Hadoop 2.6
shivaram Dec 8, 2016
4727395
Set different seed with different partition.
Dec 8, 2016
2198f30
seed to Long.
Dec 8, 2016
458fa33
[SPARK-18776][SS] Make Offset for FileStreamSource corrected formatte…
tdas Dec 9, 2016
4ac8b20
[SPARKR][PYSPARK] Fix R source package name to match Spark version. R…
shivaram Dec 9, 2016
86a9603
[SPARK-18349][SPARKR] Update R API documentation on ml model summary
wangmiao1981 Dec 9, 2016
9338aa4
[SPARK-18697][BUILD] Upgrade sbt plugins
weiqingy Dec 9, 2016
934035a
Copy the SparkR source package with LFTP
shivaram Dec 9, 2016
83e7f55
refactor the code - parallelization.
Dec 9, 2016
c074c96
Copy pyspark and SparkR packages to latest release dir too
felixcheung Dec 9, 2016
67587d9
[SPARK-18637][SQL] Stateful UDF should be considered as nondeterministic
Dec 9, 2016
b162cc0
[MINOR][CORE][SQL][DOCS] Typo fixes
jaceklaskowski Dec 9, 2016
fd48d80
[SPARK-17822][R] Make JVMObjectTracker a member variable of RBackend
mengxr Dec 9, 2016
be5fc6e
[MINOR][SPARKR] Fix SparkR regex in copy command
shivaram Dec 9, 2016
b08b500
[SPARK-18620][STREAMING][KINESIS] Flatten input rates in timeline for…
maropu Dec 9, 2016
d60ab5f
[SPARK-18745][SQL] Fix signed integer overflow due to toInt cast
kiszk Dec 9, 2016
cf33a86
[SPARK-4105] retry the fetch or stage if shuffle block is corrupt
Dec 9, 2016
d2493a2
[SPARK-18812][MLLIB] explain "Spark ML"
mengxr Dec 10, 2016
3e11d5b
[SPARK-18807][SPARKR] Should suppress output print for calls to JVM m…
felixcheung Dec 10, 2016
63c9159
[SPARK-18811] StreamSource resolution should happen in stream executi…
brkyvz Dec 10, 2016
fbbdedf
V in partailSVD now is BlockMatrix.
Dec 10, 2016
c517256
[SPARK-17460][SQL] Make sure sizeInBytes in Statistics will not overflow
huaxingao Dec 10, 2016
f3a3fed
[MINOR][DOCS] Remove Apache Spark Wiki address
dongjoon-hyun Dec 10, 2016
3a3e65a
[SPARK-18606][HISTORYSERVER] remove useless elements while searching
WangTaoTheTonic Dec 10, 2016
422a45c
[SPARK-18766][SQL] Push Down Filter Through BatchEvalPython (Python UDF)
gatorsmile Dec 10, 2016
1143248
[SPARK-3359][DOCS] Fix greater-than symbols in Javadoc to allow build…
michalsenkyr Dec 10, 2016
e094d01
[SPARK-18803][TESTS] Fix JarEntry-related & path-related test failure…
HyukjinKwon Dec 10, 2016
a29ee55
[SPARK-18815][SQL] Fix NPE when collecting column stats for string/bi…
Dec 11, 2016
9abd05b
[SQL][MINOR] simplify a test to fix the maven tests
cloud-fan Dec 11, 2016
f60ffe7
[SPARK-18809] KCL version to 1.6.2 on master
boneill42 Dec 11, 2016
c802ad8
[SPARK-18628][ML] Update Scala param and Python param to have quotes
krishnakalyan3 Dec 11, 2016
83a4289
[SPARK-18790][SS] Keep a general offset history of stream batches
Dec 12, 2016
70ffff2
[DOCS][MINOR] Clarify Where AccumulatorV2s are Displayed
Dec 12, 2016
586d198
[SPARK-15844][CORE] HistoryServer doesn't come up if spark.authentica…
steveloughran Dec 12, 2016
bf42c2d
[SPARK-16297][SQL] Fix mapping Microsoft SQLServer dialect
meknio Dec 12, 2016
476b34c
[SPARK-18752][HIVE] isSrcLocal" value should be set from user query.
Dec 12, 2016
90abfd1
[SPARK-18681][SQL] Fix filtering to compatible with partition keys of…
wangyum Dec 12, 2016
8a51cfd
[SPARK-18810][SPARKR] SparkR install.spark does not work for RCs, sna…
felixcheung Dec 12, 2016
bc59951
[SPARK-18773][CORE] Make commons-crypto config translation consistent.
Dec 13, 2016
417e45c
[SPARK-18796][SS] StreamingQueryManager should not block when startin…
zsxwing Dec 13, 2016
2aa16d0
[SPARK-18797][SPARKR] Update spark.logit in sparkr-vignettes
wangmiao1981 Dec 13, 2016
46d30ac
[SPARK-18717][SQL] Make code generation for Scala Map work with immut…
aray Dec 13, 2016
096f868
[MINOR][CORE][SQL] Remove explicit RDD and Partition overrides
jaceklaskowski Dec 13, 2016
d53f18c
[SPARK-18675][SQL] CTAS for hive serde table should work for all hive…
cloud-fan Dec 13, 2016
fb3081d
[SPARK-13747][CORE] Fix potential ThreadLocal leaks in RPC when using…
zsxwing Dec 13, 2016
f280ccf
[SPARK-18835][SQL] Don't expose Guava types in the JavaTypeInference …
Dec 13, 2016
5572ccf
[SPARK-17932][SQL][FOLLOWUP] Change statement `SHOW TABLES EXTENDED` …
jiangxb1987 Dec 13, 2016
43298d1
[SPARK-18840][YARN] Avoid throw exception when getting token renewal …
jerryshao Dec 13, 2016
e57e393
[SPARK-18715][ML] Fix AIC calculations in Binomial GLM
actuaryzhang Dec 13, 2016
9e8a9d7
[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0
Dec 13, 2016
aebf44e
[SPARK-18816][WEB UI] Executors Logs column only ran visibility check…
ajbozarth Dec 13, 2016
c68fb42
[SPARK-18834][SS] Expose event time stats through StreamingQueryProgress
tdas Dec 13, 2016
594b14f
[SPARK-18793][SPARK-18794][R] add spark.randomForest/spark.gbt to vig…
mengxr Dec 14, 2016
ae5b2d3
[SPARK-18746][SQL] Add implicit encoder for BigDecimal, timestamp and…
weiqingy Dec 14, 2016
3ae63b8
[SPARK-18752][SQL] Follow-up: add scaladoc explaining isSrcLocal arg.
Dec 14, 2016
e104e55
[SPARK-18588][TESTS] Ignore KafkaSourceStressForDontFailOnDataLossSuite
zsxwing Dec 14, 2016
f2ddabf
[MINOR][SPARKR] fix kstest example error and add unit test
wangmiao1981 Dec 14, 2016
3e307b4
[SPARK-18566][SQL] remove OverwriteOptions
cloud-fan Dec 14, 2016
cccd643
[SPARK-18814][SQL] CheckAnalysis rejects TPCDS query 32
nsyca Dec 14, 2016
ac013ea
[SPARK-18846][SCHEDULER] Fix flakiness in SchedulerIntegrationSuite
squito Dec 14, 2016
ba4aab9
[SPARK-18730] Post Jenkins test report page instead of the full conso…
liancheng Dec 14, 2016
c6b8eb7
[SPARK-18842][TESTS][LAUNCHER] De-duplicate paths in classpaths in co…
HyukjinKwon Dec 14, 2016
169b9d7
[SPARK-18830][TESTS] Fix tests in PipedRDDSuite to pass on Windows
HyukjinKwon Dec 14, 2016
89ae26d
[SPARK-18753][SQL] Keep pushed-down null literal as a filter in Spark…
HyukjinKwon Dec 14, 2016
5d79947
[SPARK-18853][SQL] Project (UnaryNode) is way too aggressive in estim…
rxin Dec 14, 2016
1ac6567
[SPARK-18852][SS] StreamingQuery.lastProgress should be null when rec…
zsxwing Dec 14, 2016
7862742
[SPARK-18795][ML][SPARKR][DOC] Added KSTest section to SparkR vignettes
jkbradley Dec 14, 2016
ffdd1fc
[SPARK-18854][SQL] numberedTreeString and apply(i) inconsistent for s…
rxin Dec 15, 2016
3243885
[SPARK-18865][SPARKR] SparkR vignettes MLP and LDA updates
wangmiao1981 Dec 15, 2016
8db4d95
[SPARK-18703][SQL] Drop Staging Directories and Data Files After each…
gatorsmile Dec 15, 2016
d6f11a1
[SPARK-18856][SQL] non-empty partitioned table should not report zero…
cloud-fan Dec 15, 2016
5d510c6
[SPARK-18869][SQL] Add TreeNode.p that returns BaseType
rxin Dec 15, 2016
ec0eae4
[SPARK-18875][SPARKR][DOCS] Fix R API doc generation by adding `DESCR…
dongjoon-hyun Dec 15, 2016
7d858bc
[SPARK-18849][ML][SPARKR][DOC] vignettes final check update
felixcheung Dec 15, 2016
93cdb8a
[SPARK-8425][CORE] Application Level Blacklisting
squito Dec 15, 2016
01e14bf
[SPARK-17910][SQL] Allow users to update the comment of a column
jiangxb1987 Dec 15, 2016
4f7292c
[SPARK-18870] Disallowed Distinct Aggregations on Streaming Datasets
tdas Dec 15, 2016
68a6dc9
[SPARK-18826][SS] Add 'latestFirst' option to FileStreamSource
zsxwing Dec 15, 2016
0917c8e
[SPARK-18888] partitionBy in DataStreamWriter in Python throws _to_se…
brkyvz Dec 15, 2016
32ff964
[SPARK-8425][SCHEDULER][HOTFIX] fix scala 2.10 compile error
squito Dec 15, 2016
9c7f83b
[SPARK-18868][FLAKY-TEST] Deflake StreamingQueryListenerSuite: single…
brkyvz Dec 15, 2016
9634018
[MINOR] Only rename SparkR tar.gz if names mismatch
shivaram Dec 16, 2016
5a44f18
[MINOR] Handle fact that mv is different on linux, mac
shivaram Dec 16, 2016
172a52f
[SPARK-18892][SQL] Alias percentile_approx approx_percentile
rxin Dec 16, 2016
78062b8
[SPARK-18845][GRAPHX] PageRank has incorrect initialization value tha…
aray Dec 16, 2016
d7f3058
[SPARK-18850][SS] Make StreamExecution and progress classes serializable
zsxwing Dec 16, 2016
53ab8fb
[SPARK-18742][CORE] Clarify that user-defined BroadcastFactory is not…
Dec 16, 2016
dc2a4d4
[SPARK-18108][SQL] Fix a schema inconsistent bug that makes a parquet…
maropu Dec 16, 2016
836c95b
[SPARK-18723][DOC] Expanded programming guide information on wholeTex…
michalsenkyr Dec 16, 2016
f7a574a
[SPARK-18708][CORE] Improvement/improve docs in spark context file
Dec 16, 2016
ed84cd0
[MINOR][BUILD] Fix lint-check failures and javadoc8 break
HyukjinKwon Dec 16, 2016
1169db4
[SPARK-18897][SPARKR] Fix SparkR SQL Test to drop test table
dongjoon-hyun Dec 16, 2016
295db82
[SPARK-17769][CORE][SCHEDULER] Some FetchFailure refactoring
markhamstra Dec 16, 2016
4faa8a3
[SPARK-18904][SS][TESTS] Merge two FileStreamSourceSuite files
zsxwing Dec 16, 2016
2bc1c95
[SPARK-18895][TESTS] Fix resource-closing-related and path-related te…
HyukjinKwon Dec 17, 2016
6d2379b
[SPARK-18485][CORE] Underlying integer overflow when create ChunkedBy…
uncleGen Dec 17, 2016
38fd163
[SPARK-18849][ML][SPARKR][DOC] vignettes final check reorg
felixcheung Dec 17, 2016
c0c9e1d
[SPARK-18918][DOC] Missing </td> in Configuration page
gatorsmile Dec 18, 2016
1e5c51f
[SPARK-18827][CORE] Fix cannot read broadcast on disk
wangyum Dec 18, 2016
7db09ab
[SPARK-18356][ML] KMeans should cache RDD before training
ZakariaHili Dec 19, 2016
2448285
[SPARK-18700][SQL] Add StripedLock for each table's relation in cache
xuanyuanking Dec 19, 2016
7a75ee1
[SPARK-18921][SQL] check database existence with Hive.databaseExists …
cloud-fan Dec 19, 2016
70d495d
[SPARK-18624][SQL] Implicit cast ArrayType(InternalType)
jiangxb1987 Dec 19, 2016
4cb4941
[SPARK-18836][CORE] Serialize one copy of task metrics in DAGScheduler
shivaram Dec 19, 2016
5857b9a
[SPARK-18928] Check TaskContext.isInterrupted() in FileScanRDD, JDBCR…
JoshRosen Dec 20, 2016
fa829ce
[SPARK-18761][CORE] Introduce "task reaper" to oversee task killing i…
JoshRosen Dec 20, 2016
f923c84
[SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor the error check…
cloud-fan Dec 20, 2016
150d26c
Tiny style improvement.
rxin Dec 20, 2016
95c95b7
[SPARK-18281] [SQL] [PYSPARK] Remove timeout for reading data through…
viirya Dec 20, 2016
caed893
[SPARK-18927][SS] MemorySink for StructuredStreaming can't recover fr…
brkyvz Dec 20, 2016
047a9d9
[SPARK-18576][PYTHON] Add basic TaskContext information to PySpark
holdenk Dec 20, 2016
b2dd8ec
[SPARK-18900][FLAKY-TEST] StateStoreSuite.maintenance
brkyvz Dec 21, 2016
24c0c94
[SPARK-18949][SQL] Add recoverPartitions API to Catalog
gatorsmile Dec 21, 2016
ba4468b
[SPARK-18923][DOC][BUILD] Support skipping R/Python API docs
dongjoon-hyun Dec 21, 2016
b7650f1
[SPARK-18947][SQL] SQLContext.tableNames should not call Catalog.list…
cloud-fan Dec 21, 2016
1a64388
[SPARK-18951] Upgrade com.thoughtworks.paranamer/paranamer to 2.6
yhuai Dec 21, 2016
607a1e6
[SPARK-18894][SS] Fix event time watermark delay threshold specified …
tdas Dec 21, 2016
ccfe60a
[SPARK-18031][TESTS] Fix flaky test ExecutorAllocationManagerSuite.ba…
zsxwing Dec 21, 2016
078c71c
[SPARK-18954][TESTS] Fix flaky test: o.a.s.streaming.BasicOperationsS…
zsxwing Dec 21, 2016
354e936
[SPARK-18775][SQL] Limit the max number of records written per file
rxin Dec 21, 2016
95efc89
[SPARK-18588][SS][KAFKA] Create a new KafkaConsumer when error happen…
zsxwing Dec 21, 2016
afd9bc1
[SPARK-17807][CORE] split test-tags into test-JAR
ryan-williams Dec 22, 2016
83a6ace
[SPARK-18234][SS] Made update mode public
tdas Dec 22, 2016
b41ec99
[SPARK-18528][SQL] Fix a bug to initialise an iterator of aggregation…
maropu Dec 22, 2016
7e8994f
[SPARK-18903][SPARKR] Add API to get SparkUI URL
felixcheung Dec 22, 2016
afe3651
[FLAKY-TEST] InputStreamsSuite.socket input stream
brkyvz Dec 22, 2016
e1b43dc
[BUILD] make-distribution should find JAVA_HOME for non-RHEL systems
felixcheung Dec 22, 2016
ff7d82a
[SPARK-18908][SS] Creating StreamingQueryException should check if lo…
zsxwing Dec 22, 2016
7c5b7b3
[SQL] Minor readability improvement for partition handling code
rxin Dec 22, 2016
2e861df
[DOC] bucketing is applicable to all file-based data sources
rxin Dec 22, 2016
f489339
[SPARK-18953][CORE][WEB UI] Do now show the link to a dead worker on …
dongjoon-hyun Dec 22, 2016
4186aba
[SPARK-18922][TESTS] Fix more resource-closing-related and path-relat…
HyukjinKwon Dec 22, 2016
76622c6
[SPARK-16975][SQL][FOLLOWUP] Do not duplicately check file paths in d…
HyukjinKwon Dec 22, 2016
2615100
[SPARK-18973][SQL] Remove SortPartitions and RedistributeData
rxin Dec 22, 2016
31da755
[SPARK-18975][CORE] Add an API to remove SparkListener
jerryshao Dec 22, 2016
ce99f51
[SPARK-18537][WEB UI] Add a REST api to serve spark streaming informa…
Dec 22, 2016
4133880
refactor spectralNormEst.
Dec 22, 2016
a248ee4
initial base.
hl475 Aug 26, 2016
181f581
clean up the code and comments.
hl475 Sep 1, 2016
66aebbf
clean up the code and comments again.
hl475 Sep 2, 2016
3e6b820
modify the code to address Mark's comments.
hl475 Sep 4, 2016
7e54a9d
modify the code to address Mark's comments: partialSVD will do orthon…
hl475 Sep 5, 2016
4013d2f
modify tallSkinnySVD: do tallSkinnyQR twice instead of once to have p…
hl475 Sep 5, 2016
7c506f7
Add inline comments to RowMatrix.scala
hl475 Sep 6, 2016
8cee1d7
Use tallSkinnySVD for orthonormal in partialSVD; add isGram and ifTwi…
hl475 Sep 9, 2016
a7eaf2f
Add more inline comments to tallSkinnySVD.
hl475 Sep 10, 2016
cab070b
Modify the code according to Mark's comment: add parameter ifTwice to…
hl475 Sep 11, 2016
6867ec3
remove code duplication from lastStep; add unit tests of testing para…
hl475 Sep 12, 2016
427f10e
change value of rCond.
hl475 Sep 12, 2016
8607a05
Refactoring the code.
hl475 Sep 15, 2016
7eeaee1
add computeSVD to unit tests just for comparison.
hl475 Sep 15, 2016
bc770fe
refactoring the code.
hl475 Sep 16, 2016
ad3d2a9
refactoring the code and add another unit test of different combinati…
hl475 Sep 21, 2016
0605b28
refactoring the code to address Mark's comment.
hl475 Sep 22, 2016
a370f40
refactoring the code to pass the code style check.
hl475 Sep 30, 2016
79338d2
refactoring the test code and tobreeze/asbreeze
hl475 Oct 4, 2016
7283868
refactoring the unit test and adding comment to toBreeze v.s. asBreeze
hl475 Oct 5, 2016
d635684
add an svd example to spark
hl475 Oct 12, 2016
654a309
add two examples
hl475 Oct 22, 2016
c75866d
refactoring the test code and tobreeze/asbreeze
hl475 Oct 4, 2016
897affc
Refactoring the code:
hl475 Nov 13, 2016
d19a858
modify the code to address Mark's comments.
hl475 Nov 16, 2016
9ef0b95
modify inline comments in RowMatrix to address Mark's comments.
hl475 Nov 16, 2016
87129b5
add shuffle.scala
Dec 2, 2016
1581eb3
change the way of genearting random matrix.
Dec 5, 2016
fd20abb
change the way of generating the random matrix and store the transpos…
Dec 7, 2016
aca28ee
modify spectralNormEst for generating random vector and normalizing t…
Dec 7, 2016
188c42d
Use different seed for generating random matrix and vectors in partia…
Dec 8, 2016
4df7f29
Set different seed with different partition.
Dec 8, 2016
dada488
seed to Long.
Dec 8, 2016
c227a42
refactor the code - parallelization.
Dec 9, 2016
fe18b13
V in partailSVD now is BlockMatrix.
Dec 10, 2016
05c2e6d
refactor spectralNormEst.
Dec 22, 2016
4b46cbe
Merge branch 'testSVD' of https://github.com/hl475/svd into testSVD
Dec 22, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion R/CRAN_RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ To release SparkR as a package to CRAN, we would use the `devtools` package. Ple

First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.

Note that while `check-cran.sh` is running `R CMD check`, it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release.
Note that while `run-tests.sh` runs `check-cran.sh` (which runs `R CMD check`), it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release. Also note that for CRAN checks for pdf vignettes to success, `qpdf` tool must be there (to install it, eg. `yum -q -y install qpdf`).

To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.

Expand Down
19 changes: 18 additions & 1 deletion R/check-cran.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,9 @@ if [ ! -z "$R_HOME" ]
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"
echo "Using R_SCRIPT_PATH = ${R_SCRIPT_PATH}"

# Install the package (this is required for code in vignettes to run when building it later)
# Build the latest docs, but not vignettes, which is built with the package next
$FWDIR/create-docs.sh

Expand Down Expand Up @@ -82,4 +83,20 @@ else
# This will run tests and/or build vignettes, and require SPARK_HOME
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
fi

# Install source package to get it to generate vignettes rds files, etc.
if [ -n "$CLEAN_INSTALL" ]
then
echo "Removing lib path and installing from source package"
LIB_DIR="$FWDIR/lib"
rm -rf $LIB_DIR
mkdir -p $LIB_DIR
"$R_SCRIPT_PATH/"R CMD INSTALL SparkR_"$VERSION".tar.gz --library=$LIB_DIR

# Zip the SparkR package so that it can be distributed to worker nodes on YARN
pushd $LIB_DIR > /dev/null
jar cfM "$LIB_DIR/sparkr.zip" SparkR
popd > /dev/null
fi

popd > /dev/null
2 changes: 1 addition & 1 deletion R/install-dev.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ if [ ! -z "$R_HOME" ]
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"
echo "Using R_SCRIPT_PATH = ${R_SCRIPT_PATH}"

# Generate Rd files if devtools is installed
"$R_SCRIPT_PATH/"Rscript -e ' if("devtools" %in% rownames(installed.packages())) { library(devtools); devtools::document(pkg="./pkg", roclets=c("rd")) }'
Expand Down
3 changes: 3 additions & 0 deletions R/pkg/.Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
^.*\.Rproj$
^\.Rproj\.user$
^\.lintr$
^cran-comments\.md$
^NEWS\.md$
^README\.Rmd$
^src-native$
^html$
13 changes: 6 additions & 7 deletions R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,28 +1,27 @@
Package: SparkR
Type: Package
Title: R Frontend for Apache Spark
Version: 2.1.0
Date: 2016-11-06
Title: R Frontend for Apache Spark
Description: The SparkR package provides an R Frontend for Apache Spark.
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
email = "shivaram@cs.berkeley.edu"),
person("Xiangrui", "Meng", role = "aut",
email = "meng@databricks.com"),
person("Felix", "Cheung", role = "aut",
email = "felixcheung@apache.org"),
person(family = "The Apache Software Foundation", role = c("aut", "cph")))
License: Apache License (== 2.0)
URL: http://www.apache.org/ http://spark.apache.org/
BugReports: http://spark.apache.org/contributing.html
Depends:
R (>= 3.0),
methods
Suggests:
knitr,
rmarkdown,
testthat,
e1071,
survival,
knitr,
rmarkdown
Description: The SparkR package provides an R frontend for Apache Spark.
License: Apache License (== 2.0)
survival
Collate:
'schema.R'
'generics.R'
Expand Down
3 changes: 2 additions & 1 deletion R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
importFrom("methods", "setGeneric", "setMethod", "setOldClass")
importFrom("methods", "is", "new", "signature", "show")
importFrom("stats", "gaussian", "setNames")
importFrom("utils", "download.file", "object.size", "packageVersion", "untar")
importFrom("utils", "download.file", "object.size", "packageVersion", "tail", "untar")

# Disable native libraries till we figure out how to package it
# See SPARKR-7839
Expand All @@ -16,6 +16,7 @@ export("sparkR.stop")
export("sparkR.session.stop")
export("sparkR.conf")
export("sparkR.version")
export("sparkR.uiWebUrl")
export("print.jobj")

export("sparkR.newJObject")
Expand Down
7 changes: 4 additions & 3 deletions R/pkg/R/SQLContext.R
Original file line number Diff line number Diff line change
Expand Up @@ -634,7 +634,7 @@ tableNames <- function(x, ...) {
cacheTable.default <- function(tableName) {
sparkSession <- getSparkSession()
catalog <- callJMethod(sparkSession, "catalog")
callJMethod(catalog, "cacheTable", tableName)
invisible(callJMethod(catalog, "cacheTable", tableName))
}

cacheTable <- function(x, ...) {
Expand Down Expand Up @@ -663,7 +663,7 @@ cacheTable <- function(x, ...) {
uncacheTable.default <- function(tableName) {
sparkSession <- getSparkSession()
catalog <- callJMethod(sparkSession, "catalog")
callJMethod(catalog, "uncacheTable", tableName)
invisible(callJMethod(catalog, "uncacheTable", tableName))
}

uncacheTable <- function(x, ...) {
Expand All @@ -686,7 +686,7 @@ uncacheTable <- function(x, ...) {
clearCache.default <- function() {
sparkSession <- getSparkSession()
catalog <- callJMethod(sparkSession, "catalog")
callJMethod(catalog, "clearCache")
invisible(callJMethod(catalog, "clearCache"))
}

clearCache <- function() {
Expand Down Expand Up @@ -730,6 +730,7 @@ dropTempTable <- function(x, ...) {
#' If the view has been cached before, then it will also be uncached.
#'
#' @param viewName the name of the view to be dropped.
#' @return TRUE if the view is dropped successfully, FALSE otherwise.
#' @rdname dropTempView
#' @name dropTempView
#' @export
Expand Down
6 changes: 3 additions & 3 deletions R/pkg/R/context.R
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,8 @@ objectFile <- function(sc, path, minPartitions = NULL) {
#' in the list are split into \code{numSlices} slices and distributed to nodes
#' in the cluster.
#'
#' If size of serialized slices is larger than spark.r.maxAllocationLimit or (200MB), the function
#' will write it to disk and send the file name to JVM. Also to make sure each slice is not
#' If size of serialized slices is larger than spark.r.maxAllocationLimit or (200MB), the function
#' will write it to disk and send the file name to JVM. Also to make sure each slice is not
#' larger than that limit, number of slices may be increased.
#'
#' @param sc SparkContext to use
Expand Down Expand Up @@ -379,5 +379,5 @@ spark.lapply <- function(list, func) {
#' @note setLogLevel since 2.0.0
setLogLevel <- function(level) {
sc <- getSparkContext()
callJMethod(sc, "setLogLevel", level)
invisible(callJMethod(sc, "setLogLevel", level))
}
38 changes: 27 additions & 11 deletions R/pkg/R/install.R
Original file line number Diff line number Diff line change
Expand Up @@ -79,19 +79,28 @@ install.spark <- function(hadoopVersion = "2.7", mirrorUrl = NULL,
dir.create(localDir, recursive = TRUE)
}

packageLocalDir <- file.path(localDir, packageName)

if (overwrite) {
message(paste0("Overwrite = TRUE: download and overwrite the tar file",
"and Spark package directory if they exist."))
}

releaseUrl <- Sys.getenv("SPARKR_RELEASE_DOWNLOAD_URL")
if (releaseUrl != "") {
packageName <- basenameSansExtFromUrl(releaseUrl)
}

packageLocalDir <- file.path(localDir, packageName)

# can use dir.exists(packageLocalDir) under R 3.2.0 or later
if (!is.na(file.info(packageLocalDir)$isdir) && !overwrite) {
fmt <- "%s for Hadoop %s found, with SPARK_HOME set to %s"
msg <- sprintf(fmt, version, ifelse(hadoopVersion == "without", "Free build", hadoopVersion),
packageLocalDir)
message(msg)
if (releaseUrl != "") {
message(paste(packageName, "found, setting SPARK_HOME to", packageLocalDir))
} else {
fmt <- "%s for Hadoop %s found, setting SPARK_HOME to %s"
msg <- sprintf(fmt, version, ifelse(hadoopVersion == "without", "Free build", hadoopVersion),
packageLocalDir)
message(msg)
}
Sys.setenv(SPARK_HOME = packageLocalDir)
return(invisible(packageLocalDir))
} else {
Expand All @@ -104,7 +113,12 @@ install.spark <- function(hadoopVersion = "2.7", mirrorUrl = NULL,
if (tarExists && !overwrite) {
message("tar file found.")
} else {
robustDownloadTar(mirrorUrl, version, hadoopVersion, packageName, packageLocalPath)
if (releaseUrl != "") {
message("Downloading from alternate URL:\n- ", releaseUrl)
downloadUrl(releaseUrl, packageLocalPath, paste0("Fetch failed from ", releaseUrl))
} else {
robustDownloadTar(mirrorUrl, version, hadoopVersion, packageName, packageLocalPath)
}
}

message(sprintf("Installing to %s", localDir))
Expand Down Expand Up @@ -182,16 +196,18 @@ getPreferredMirror <- function(version, packageName) {
}

directDownloadTar <- function(mirrorUrl, version, hadoopVersion, packageName, packageLocalPath) {
packageRemotePath <- paste0(
file.path(mirrorUrl, version, packageName), ".tgz")
packageRemotePath <- paste0(file.path(mirrorUrl, version, packageName), ".tgz")
fmt <- "Downloading %s for Hadoop %s from:\n- %s"
msg <- sprintf(fmt, version, ifelse(hadoopVersion == "without", "Free build", hadoopVersion),
packageRemotePath)
message(msg)
downloadUrl(packageRemotePath, packageLocalPath, paste0("Fetch failed from ", mirrorUrl))
}

isFail <- tryCatch(download.file(packageRemotePath, packageLocalPath),
downloadUrl <- function(remotePath, localPath, errorMessage) {
isFail <- tryCatch(download.file(remotePath, localPath),
error = function(e) {
message(sprintf("Fetch failed from %s", mirrorUrl))
message(errorMessage)
print(e)
TRUE
})
Expand Down
Loading