Skip to content

Conversation

@dgnorton
Copy link
Contributor

The TakeSnapshot() function in _vendor/raft/server.go was failing at s.stateMachine.Save() and exiting without resetting s.pendingShapshot = nil. That left the raft server in an invalid state.

s.stateMachine.Save() was failing with the error message "gob: encodeReflectValue: nil element", which comes from the Save() function in cluster/cluster_configuration.go. This is a separate issue.

@dgnorton dgnorton self-assigned this Oct 30, 2014
@otoolep
Copy link
Contributor

otoolep commented Oct 30, 2014

Interesting. This PR:

goraft/raft#242

is pretty close to the same thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the lock, influxdb attempts to take multiple snapshots at the same time. If you comment those lines out, rebuild influxdb, and then run it & influxdb_stress (described above) at the same time...you'll see the messages in the log.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it, I was thinking that might be it from the title of the PR. I guess I might have put the lock inside TakeSnapshot(). This is kind of a matter of taste though -- depends on one thinks about TakeSnapshot().

@dgnorton
Copy link
Contributor Author

@otoolep it was related. There were actually two spots that needed to be reset to nil.

@jvshahid
Copy link
Contributor

Do you know why this gob: encodeReflectValue: nil element is happening ?

@dgnorton
Copy link
Contributor Author

@jvshahid no but I think it's a separate issue.

@jvshahid
Copy link
Contributor

Agree, I was just wondering why it's failing to save the state in the first place. I'll merge this in a bit.

@dgnorton dgnorton closed this in 4b59ddd Oct 31, 2014
@dgnorton dgnorton removed the review label Oct 31, 2014
@dgnorton dgnorton deleted the fix-943 branch December 9, 2014 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants