Skip to content

Comments

Maintining an order#21695

Closed
nagpall wants to merge 1 commit intoapache:branch-2.3from
nagpall:patch-spark-8614
Closed

Maintining an order#21695
nagpall wants to merge 1 commit intoapache:branch-2.3from
nagpall:patch-spark-8614

Conversation

@nagpall
Copy link

@nagpall nagpall commented Jul 2, 2018

What is the problem?

In both IndexedRowMatrix.computeSVD and IndexedRowMatrix.multiply indices are dropped before calling the methods from RowMatrix.
For the IndexedRowMatrix.multiply I have observed that ordering within partitions is preserved, but that it seems to get mixed up between partitions. For example, for:

part1Index1 part1Vector1
part1Index2 part1Vector2
part2Index1 part2Vector1
part2Index2 part2Vector2

I got:

part2Index1 part1Vector1
part2Index2 part1Vector2
part1Index1 part2Vector1
part1Index2 part2Vector2

You can find the more details here :
https://issues.apache.org/jira/browse/SPARK-8614

What changes were proposed in this pull request?

Instead of converting IndexedRowMatrix to RowMatrix and loosing index, we are keeping it IndexedRowMatrix and taking out index and row matrix and then multiplying the row with matrix and placing it at right index.

How was this patch tested?

With this changes all Ut's are passing for mllib module.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Jul 2, 2018

As noted above, please see http://spark.apache.org/contributing.html

@nagpall nagpall closed this Jul 2, 2018
@tygert
Copy link

tygert commented Jul 2, 2018

As requested, I can comment here: the issue cited is indeed a problem, and is fixed in hl475/svd#1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants