Skip to content

S3 support#7

Closed
or-shachar wants to merge 1 commit intobradfitz:mainfrom
or-shachar:s3-support
Closed

S3 support#7
or-shachar wants to merge 1 commit intobradfitz:mainfrom
or-shachar:s3-support

Conversation

@or-shachar
Copy link

My humble attempt to do #6

Signed-off-by: or-shachar <or.shachar@wiz.io>
Copy link

@klauspost klauspost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was going to write my own, but saw you had already made the bulk of the work.

Comment on lines +99 to +103
body, err := io.ReadAll(result.Body)
if err != nil {
return "", "", err
}
if err := json.Unmarshal(body, &av); err != nil {
Copy link

@klauspost klauspost Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stream the response:

Suggested change
body, err := io.ReadAll(result.Body)
if err != nil {
return "", "", err
}
if err := json.Unmarshal(body, &av); err != nil {
dec := json.NewDecoder(result.Body)
if err := dec.Decode(&av); err != nil {

(but maybe drop, see later)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if c.verbose {
log.Printf("error S3 get for %s: %v", actionKey, err)
}
return "", "", fmt.Errorf("unexpected S3 get for %s: %v", actionKey, err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "", "", fmt.Errorf("unexpected S3 get for %s: %v", actionKey, err)
return "", "", fmt.Errorf("unexpected S3 get for %s: %w", actionKey, err)

var ae smithy.APIError
if errors.As(err, &ae) {
code := ae.ErrorCode()
return code == "AccessDenied" || code == "NoSuchKey"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AccessDenied?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Not sure why I chose AccessDenied, maybe my original permissions were only to s3:GetObject without s3:ListBucket which results in AccessDenied in case key cannot be found.

if c.verbose {
log.Printf("error S3 get for %s: %v", outputKey, getOutputErr)
}
return "", "", fmt.Errorf("unexpected S3 get for %s: %v", outputKey, getOutputErr)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "", "", fmt.Errorf("unexpected S3 get for %s: %v", outputKey, getOutputErr)
return "", "", fmt.Errorf("unexpected S3 get for %s: %w", outputKey, getOutputErr)

We support S3 backend for caching.
You can connect to S3 backend by setting the following parameters:
- `GOCACHE_S3_BUCKET` - Name of S3 bucket
- `GOCACHE_AWS_REGION` - AWS Region of bucket

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `GOCACHE_AWS_REGION` - AWS Region of bucket
- `GOCACHE_S3_REGION` - AWS Region of bucket

I think having the same prefix makes it clearer. I suggest using the service name rather than the vendor name. (same goes for other env vars)

os := runtime.GOOS
// get current version of Go
ver := strings.ReplaceAll(strings.ReplaceAll(runtime.Version(), " ", "-"), ":", "-")
prefix := fmt.Sprintf("cache/%s/%s/%s/%s", cacheKey, arc, os, ver)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good behavior to make the cache prefix configurable. No biggie. Consider the default to be gocache to make it a bit easier to identify.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


putBody = outputResult.Body
}
diskPath, err = c.diskCache.Put(ctx, actionID, outputID, av.Size, putBody)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need help: Documentation indicates that modtime of the file is used for "something".

You may need some way to set the mod-time to the object write time. This should be available from S3.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please reference the relevant documentation?

OutputID: outputID,
Size: size,
}
avj, err := json.Marshal(av)
Copy link

@klauspost klauspost Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 PutObject calls per write seems like one to many. And a place where inconsistencies could creep in. Having 1 write will make it atomic, so probably that is a win here.

I suggest you store the object using the actionID only and add objectID as a custom header.

There is a 2KB limit on metadata, so maybe check if that is likely to be exceeded.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done on the main branch of my fork

Comment on lines +184 to +189
_, err = client.PutObject(ctx, &s3.PutObjectInput{
Bucket: &c.Bucket,
Key: &outputKey,
Body: &readerForS3,
ContentLength: size,
})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_, err = client.PutObject(ctx, &s3.PutObjectInput{
Bucket: &c.Bucket,
Key: &outputKey,
Body: &readerForS3,
ContentLength: size,
})
_, err = client.PutObject(ctx, &s3.PutObjectInput{
Bucket: &c.Bucket,
Key: &outputKey,
Body: &readerForS3,
ContentLength: size,
Metadata: map[string]string{"gocache-output-id": outputID},
})

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

})
}
if size > 0 && err == nil {
outputKey := c.outputKey(outputID)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
outputKey := c.outputKey(outputID)
outputKey := c.actionKey(actionID)

@klauspost
Copy link

@or-shachar @bradfitz Let me know if you are not interested in pursuing this, and I will just do a fork.

@or-shachar
Copy link
Author

Oh seeing this only now. I can probably work on this on Friday but not earlier unfortunately...

Thanks for the feedback! 🙏

@klauspost
Copy link

@or-shachar Wasn't pushing - just asking, since I didn't know if you were still interested after having it open for a year.

Either way, you can just ping me for S3 questions. And we can decide if we fork or not.

If you want a custom endpoint to test against, you can use https://play.min.io, access/secret key: minioadmin/minioadmin - It has TLS enabled as may be visible from the URL.

To create a bucket, using our commandline client, use the default config and mc mb play/shachar should create it. We wipe it on regular intervals, but should allow you to test.

@or-shachar
Copy link
Author

We use my flavor of S3 for more than a year. A major improvement over traditional CI cache.

@or-shachar
Copy link
Author

Hey @klauspost - thanks for your kind feedback... I did made some progress on my fork and started working on main branch there. I'll incorporate your feedback there.

Generally - I understand you want to abstract AWS terms so it can work with any S3 compatible API, right? Feel free to open another PR to the fork as I don't think @bradfitz is here and I'm not the maintainer of this repo.

@or-shachar
Copy link
Author

To avoid confusing - I'm closing this PR as we've made a lot of progress on my fork, main branch, since then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants