-
Notifications
You must be signed in to change notification settings - Fork 102
At/optimizations #135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 1.x
Are you sure you want to change the base?
At/optimizations #135
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,7 @@ | ||
| package zstd | ||
|
|
||
| /* | ||
| #cgo CFLAGS: -O3 | ||
| #include "zstd.h" | ||
| */ | ||
| import "C" | ||
|
|
@@ -19,6 +20,7 @@ type Ctx interface { | |
|
|
||
| // CompressLevel is the same as Compress but you can pass a compression level | ||
| CompressLevel(dst, src []byte, level int) ([]byte, error) | ||
| CompressLevelDirect(dst, src []byte, level int) ([]byte, error) | ||
|
|
||
| // Decompress src into dst. If you have a buffer to use, you can pass it to | ||
| // prevent allocation. If it is too small, or if nil is passed, a new buffer | ||
|
|
@@ -32,14 +34,14 @@ type ctx struct { | |
| } | ||
|
|
||
| // Create a new ZStd Context. | ||
| // When compressing/decompressing many times, it is recommended to allocate a | ||
| // context just once, and re-use it for each successive compression operation. | ||
| // This will make workload friendlier for system's memory. | ||
| // Note : re-using context is just a speed / resource optimization. | ||
| // It doesn't change the compression ratio, which remains identical. | ||
| // Note 2 : In multi-threaded environments, | ||
| // use one different context per thread for parallel execution. | ||
| // | ||
| // When compressing/decompressing many times, it is recommended to allocate a | ||
| // context just once, and re-use it for each successive compression operation. | ||
| // This will make workload friendlier for system's memory. | ||
| // Note : re-using context is just a speed / resource optimization. | ||
| // It doesn't change the compression ratio, which remains identical. | ||
| // Note 2 : In multi-threaded environments, | ||
| // use one different context per thread for parallel execution. | ||
| func NewCtx() Ctx { | ||
| c := &ctx{ | ||
| cctx: C.ZSTD_createCCtx(), | ||
|
|
@@ -54,6 +56,37 @@ func (c *ctx) Compress(dst, src []byte) ([]byte, error) { | |
| return c.CompressLevel(dst, src, DefaultCompression) | ||
| } | ||
|
|
||
| func (c *ctx) CompressLevelDirect(dst, src []byte, level int) ([]byte, error) { | ||
| bound := CompressBound(len(src)) | ||
| if cap(dst) >= bound { | ||
| dst = dst[0:bound] // Reuse dst buffer | ||
| } else { | ||
| dst = make([]byte, bound) | ||
| } | ||
|
|
||
| // We need unsafe.Pointer(&src[0]) in the Cgo call to avoid "Go pointer to Go pointer" panics. | ||
| // This means we need to special case empty input. See: | ||
| // https://github.com/golang/go/issues/14210#issuecomment-346402945 | ||
| var cWritten C.size_t | ||
| if len(src) == 0 { | ||
| cWritten = CompressCtx(c.cctx, unsafe.Pointer(&dst[0]), C.size_t(len(dst)), unsafe.Pointer(nil), C.size_t(0), C.int(level)) | ||
| } else { | ||
| cWritten = CompressCtx(c.cctx, unsafe.Pointer(&dst[0]), C.size_t(len(dst)), unsafe.Pointer(&src[0]), C.size_t(len(src)), C.int(level)) | ||
| } | ||
| written := int(cWritten) | ||
| // Check if the return is an Error code | ||
| if err := getError(written); err != nil { | ||
| return nil, err | ||
| } | ||
| return dst[:written], nil | ||
| } | ||
|
|
||
| // ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize, int compressionLevel | ||
| // | ||
| //go:linkname CompressCtx ZSTD_compressCCtx | ||
| //go:noescape | ||
| func CompressCtx(ctx *C.ZSTD_CCtx, dst unsafe.Pointer, dc C.size_t, src unsafe.Pointer, sc C.size_t, level C.int) C.size_t | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting idea 😅 . I haven't verified this, but if this completely bypasses the runtime's cgo machinery, I could imagine a number of bad things to happen:
Generally speaking, I'd says there is no presumption of innocence here. This is code likely to by guilty until we can proof otherwise 🙈.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for these specific concerns! I'd definitely like to understand this more - obviously CGO exists for a reason 😄
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not specific to cgo, but this talk explains the P-M-G abstraction in the scheduler and why/how non-go function calls (syscalls/cgo - they are treated mostly the same) require special treatment: https://youtu.be/-K11rY57K7k?t=980&si=wFEQWUPe3NayCjBh tl;dr: a syscall (or cgo call) might block off-cpu (network, sleep, etc.), so if a lot of goroutine would make such calls, it would cause a lot of wasted CPU. Therefor cgo/syscalls get special treatement by the runtime so they don't end up blocking other goroutines from executing. I can try to explain his a bit more when I find time, but this might be a good starting point for finding articles on the subject.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just remembered that I had this bookmarked which you'll probably find interesting: https://words.filippo.io/rustgo/ I also realized I forgot to mention one of the most important issues (that might explain the crashes you get in CI) 😅: C requires large immovable stacks. When using cgo, the Go runtime takes care of switching from a goroutine stack (which is small and movable) to a system stack (which is large and immovable). Without this switch, you're running C code on a goroutine stack, which will lead to stack overflows and other nasty problems. So yeah, I think this is likely a dead-end 😞 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In addition to the issues you already mentioned, this is also bypassing the ABI translation. Go has an internal calling convention which is different than the C calling convention, and is not guaranteed to be stable across releases. I can cook up examples that break on my laptop, e.g. make a C function which takes 9 parameters and call it this way. On my arm64 laptop, the 9th argument is junk since the C and Go ABIs use different registers after the 8th argument. |
||
|
|
||
| func (c *ctx) CompressLevel(dst, src []byte, level int) ([]byte, error) { | ||
| bound := CompressBound(len(src)) | ||
| if cap(dst) >= bound { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 If we aren't already compiling (our external?) zstd library with -O3, this is probably a good idea. But I don't know if that's the case.