-
Notifications
You must be signed in to change notification settings - Fork 108
Ported object-file IO onto a safe codec to mitigate gadget chain attack #640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ported object-file IO onto a safe codec to mitigate gadget chain attack #640
Conversation
…ck that runs arbitrary bytecode inside the Wayang JVM
...orms/wayang-spark/src/main/java/org/apache/wayang/spark/operators/SparkObjectFileSource.java
Outdated
Show resolved
Hide resolved
...orms/wayang-spark/src/main/java/org/apache/wayang/spark/operators/SparkObjectFileSource.java
Outdated
Show resolved
Hide resolved
...tforms/wayang-spark/src/main/java/org/apache/wayang/spark/operators/SparkObjectFileSink.java
Outdated
Show resolved
Hide resolved
juripetersen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor stylistic changes could be done here to match the style we have in our codebase and improve readability a bit.
wayang-api/wayang-api-python/pom.xml
Outdated
| <spark.version>2.4.0</spark.version> | ||
| <scala.mayor.version>2.12</scala.mayor.version> | ||
| <giraph.version>1.2.0-hadoop2</giraph.version> | ||
| <python.worker.tests.skip>true</python.worker.tests.skip> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you adding this because the tests fail locally for you?
I don't think this should be addressed in this PR, it has nothing to do with the python API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fails int he git workflow ... maybe you can fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our other recent PRs don't seem to have that problem. @mspruc, what is your take on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has been opened with it in the initial commit, it is kind of hard for me to comment on it without having the stack trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove and run again, maybe the pipeline had a issue.
|
What about our usage of pickle.loads in: |
|
One general question that remains for me: If there is none, why do we even keep offering the legacy serialization? Given that it poses as a potential security threat, it should be removed if there is no overhead. |
|
Tested with Apache Flink, there is no overhead from source and sink's side. It's just to use the serializer - physically, like computation power and memory, it adds the burden to Wayang's processing when filtering, but I think it's marginal tbh, |
@2pk03 Do you know if this also needs to be changed? |
the pickle.loads(decoded_udf) mirrors how PySpark workers hydrate the UDF sent from the JVM. That payload comes from our own driver, right? |
…ationMode and SparkObjectFileSink.encodeBuffer to inline the BytesWritable construction for readability
|
Yeah, this is from our own driver, I guess by reasoning it should be fine |
Port object-file IO onto a safe codec
ObjectFileSerializationMode+ helper to switch between legacy Java serialization and a JSON-based format, with unit tests covering both pathsObjectInputStreamMotivation: the legacy path deserializes untrusted SequenceFile payloads via
ObjectInputStream.readObject, letting an attacker ship a gadget chain that runs arbitrary bytecode inside the Wayang JVM. That RCE can be used to execute system commands, exfiltrate data, or tamper with jobs, so we move to JSON by default and require explicit opt-in for the old codec.Replaces PR #638