Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 40 additions & 7 deletions zstd_ctx.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package zstd

/*
#cgo CFLAGS: -O3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 If we aren't already compiling (our external?) zstd library with -O3, this is probably a good idea. But I don't know if that's the case.

#include "zstd.h"
*/
import "C"
Expand All @@ -19,6 +20,7 @@ type Ctx interface {

// CompressLevel is the same as Compress but you can pass a compression level
CompressLevel(dst, src []byte, level int) ([]byte, error)
CompressLevelDirect(dst, src []byte, level int) ([]byte, error)

// Decompress src into dst. If you have a buffer to use, you can pass it to
// prevent allocation. If it is too small, or if nil is passed, a new buffer
Expand All @@ -32,14 +34,14 @@ type ctx struct {
}

// Create a new ZStd Context.
// When compressing/decompressing many times, it is recommended to allocate a
// context just once, and re-use it for each successive compression operation.
// This will make workload friendlier for system's memory.
// Note : re-using context is just a speed / resource optimization.
// It doesn't change the compression ratio, which remains identical.
// Note 2 : In multi-threaded environments,
// use one different context per thread for parallel execution.
//
// When compressing/decompressing many times, it is recommended to allocate a
// context just once, and re-use it for each successive compression operation.
// This will make workload friendlier for system's memory.
// Note : re-using context is just a speed / resource optimization.
// It doesn't change the compression ratio, which remains identical.
// Note 2 : In multi-threaded environments,
// use one different context per thread for parallel execution.
func NewCtx() Ctx {
c := &ctx{
cctx: C.ZSTD_createCCtx(),
Expand All @@ -54,6 +56,37 @@ func (c *ctx) Compress(dst, src []byte) ([]byte, error) {
return c.CompressLevel(dst, src, DefaultCompression)
}

func (c *ctx) CompressLevelDirect(dst, src []byte, level int) ([]byte, error) {
bound := CompressBound(len(src))
if cap(dst) >= bound {
dst = dst[0:bound] // Reuse dst buffer
} else {
dst = make([]byte, bound)
}

// We need unsafe.Pointer(&src[0]) in the Cgo call to avoid "Go pointer to Go pointer" panics.
// This means we need to special case empty input. See:
// https://github.com/golang/go/issues/14210#issuecomment-346402945
var cWritten C.size_t
if len(src) == 0 {
cWritten = CompressCtx(c.cctx, unsafe.Pointer(&dst[0]), C.size_t(len(dst)), unsafe.Pointer(nil), C.size_t(0), C.int(level))
} else {
cWritten = CompressCtx(c.cctx, unsafe.Pointer(&dst[0]), C.size_t(len(dst)), unsafe.Pointer(&src[0]), C.size_t(len(src)), C.int(level))
}
written := int(cWritten)
// Check if the return is an Error code
if err := getError(written); err != nil {
return nil, err
}
return dst[:written], nil
}

// ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize, int compressionLevel
//
//go:linkname CompressCtx ZSTD_compressCCtx
//go:noescape
func CompressCtx(ctx *C.ZSTD_CCtx, dst unsafe.Pointer, dc C.size_t, src unsafe.Pointer, sc C.size_t, level C.int) C.size_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea 😅 .

I haven't verified this, but if this completely bypasses the runtime's cgo machinery, I could imagine a number of bad things to happen:

  • Long GC STW pauses because the runtime is unable to preempt goroutines that are currently in a zstd call.
  • The runtime incorrectly trying to do async preemption on a goroutine running C code, which might cause data corruption, crashes, etc.
  • All Ps (GOMAXPROCs) calling zstd, causing significant latency on other goroutines.

Generally speaking, I'd says there is no presumption of innocence here. This is code likely to by guilty until we can proof otherwise 🙈.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these specific concerns! I'd definitely like to understand this more - obviously CGO exists for a reason 😄

Copy link
Member

@felixge felixge Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not specific to cgo, but this talk explains the P-M-G abstraction in the scheduler and why/how non-go function calls (syscalls/cgo - they are treated mostly the same) require special treatment: https://youtu.be/-K11rY57K7k?t=980&si=wFEQWUPe3NayCjBh

tl;dr: a syscall (or cgo call) might block off-cpu (network, sleep, etc.), so if a lot of goroutine would make such calls, it would cause a lot of wasted CPU. Therefor cgo/syscalls get special treatement by the runtime so they don't end up blocking other goroutines from executing.

I can try to explain his a bit more when I find time, but this might be a good starting point for finding articles on the subject.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remembered that I had this bookmarked which you'll probably find interesting: https://words.filippo.io/rustgo/

I also realized I forgot to mention one of the most important issues (that might explain the crashes you get in CI) 😅:

C requires large immovable stacks. When using cgo, the Go runtime takes care of switching from a goroutine stack (which is small and movable) to a system stack (which is large and immovable).

Without this switch, you're running C code on a goroutine stack, which will lead to stack overflows and other nasty problems.

So yeah, I think this is likely a dead-end 😞

Copy link

@nsrip-dd nsrip-dd Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't verified this, but if this completely bypasses the runtime's cgo machinery

In addition to the issues you already mentioned, this is also bypassing the ABI translation. Go has an internal calling convention which is different than the C calling convention, and is not guaranteed to be stable across releases.

I can cook up examples that break on my laptop, e.g. make a C function which takes 9 parameters and call it this way. On my arm64 laptop, the 9th argument is junk since the C and Go ABIs use different registers after the 8th argument.


func (c *ctx) CompressLevel(dst, src []byte, level int) ([]byte, error) {
bound := CompressBound(len(src))
if cap(dst) >= bound {
Expand Down
28 changes: 28 additions & 0 deletions zstd_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,34 @@ func TestSmallPayload(t *testing.T) {

}

func BenchmarkCompress(b *testing.B) {
data := make([]byte, 1024)
for i := range data {
data[i] = byte(i)
}
dest := make([]byte, CompressBound(len(data)))
b.Run("cgo", func(b *testing.B) {
b.SetBytes(int64(len(data)))
ctx := NewCtx()
for i := 0; i < b.N; i++ {
_, err := ctx.CompressLevel(dest, data, 3)
if err != nil {
b.Fail()
}
}
})
b.Run("direct", func(b *testing.B) {
b.SetBytes(int64(len(data)))
ctx := NewCtx()
for i := 0; i < b.N; i++ {
_, err := ctx.CompressLevelDirect(dest, data, 3)
if err != nil {
b.Fail()
}
}
})
}

func BenchmarkCompression(b *testing.B) {
if raw == nil {
b.Fatal(ErrNoPayloadEnv)
Expand Down