feat: Port to serverless GCS and add configurable folder depth (jules)#9
feat: Port to serverless GCS and add configurable folder depth (jules)#9
Conversation
This commit introduces a Google Cloud Storage (GCS) serverless implementation alongside the existing MinIO setup. Blob uploads to a designated "write" GCS bucket trigger a Cloud Function that performs hash-based deduplication and directory sharding before moving the blob to a "read" GCS bucket.
Key changes and features:
1. **GCS Implementation (`gcs/` directory):**
* A new Go module `repos.se/minio-deduplication/v2/gcs` contains the GCS-specific logic.
* A Google Cloud Function (`gcs_transfer.HandleGCSEvent`) processes blob uploads.
* Uses SHA256 hashing and a 2-level directory sharding strategy (e.g., `aa/bb/`).
* Logging is implemented using `go.uber.org/zap`.
2. **Configurable Folder Depth (GCS specific):**
* A new feature allows preserving a configurable number of leading directory levels from the source path in the destination path.
* Controlled by the `PRESERVED_FOLDER_DEPTH` environment variable for the Cloud Function.
* For testing, this can be overridden per-object via the `preserved-depth-override` GCS metadata key.
* Defaults to 0 (no preservation), maintaining backward compatibility with the original MinIO behavior.
3. **Refactored Integration Tests (`integration_tests/go/`):**
* Existing bash+curl tests were ported to a Go test framework using the standard `testing` package.
* A `StorageService` interface abstracts backend operations (MinIO, GCS).
* An `AppMonitor` interface abstracts application monitoring (metrics for MinIO, logs for GCS).
* Tests can be run against either MinIO or GCS by setting the `TEST_TARGET` environment variable.
* Includes `TestBasicUploadAndTransfer` ported from `basic-flow.sh`.
* New integration tests (`TestFolderDepthFeatureGCS`) specifically validate the configurable folder depth feature on GCS.
4. **Unit Tests:**
* Unit tests for the GCS transfer logic (`gcs_transfer_test.go`) cover path generation, hashing, and the folder depth feature.
* The GCS code was refactored for better testability (e.g., using interfaces).
5. **Documentation (`gcs/README.md`):**
* Detailed instructions on provisioning required Google Cloud services (GCS buckets, Cloud Function, IAM).
* Guidance on configuring service accounts and authorizing integration tests.
* A sample `gcloud` deployment command for the Cloud Function.
Both MinIO and GCS implementations are now expected to pass the common integration tests (basic flow), while the new folder depth feature is specific to and tested on GCS.
|
This was the task description given to Jules: I'd like a port of this project so that the same logic runs serverless on google cloud. A blob upload to a "write" bucket should trigger a move to a "read" bucket using the same hash based deduplication strategy and directory sharding. The feature branch should be named I want to use Go for this so we can reuse code if necessary. I want both implementations to pass the same integration tests. It's likely that the current bash+curl based tests must be ported to a test framework that can use a thin abstraction. I want the port to have one new feature that the golang+minio impl currently does not have: A configurable folder depth that should be preserved. If I set that to 1 the first directory level of the write path should be used for the read path as well, with the sharding placed under it. For example with value 2 a blob written to testbucket-write/xyz/foo/file.png with checksum 0123456789 should be placed at testbucket-read/xyz/foo/01/23/0123456789.png. The feature should be backwards compatible (default is to preserve zero levels) and have its own unit and integration tests. Please document how to provision the necessary features in google cloud and how to authorize tests to use these resources. I expect that you get integration tests up and running early on in the implementation process. If there's a choice of logging library I prefer zap with replace globals and zap.L() based statements. |
This commit introduces a GitHub Actions workflow to automate testing for both unit and integration tests. The workflow (`.github/workflows/go-tests.yml`) includes the following jobs: - `unit-tests`: Runs unit tests for the main Go module and the `gcs` submodule. - `integration-tests-minio`: Starts MinIO and the application using Docker Compose, then runs integration tests against the MinIO backend. - `integration-tests-gcs`: Authenticates to Google Cloud using Workload Identity Federation and runs integration tests against the GCS backend and deployed Cloud Function. Integration tests have been tagged with `//go:build integration` to allow separation from unit tests. The `gcs/README.md` file has been updated with a new section "CI/CD with GitHub Actions" detailing: - The workflow structure. - Instructions for running tests locally. - Required GitHub Secrets for GCS integration tests (e.g., `GCP_PROJECT_ID`, `GCP_SA_EMAIL`, GCS bucket names, Cloud Function details). - Notes on setting up Workload Identity Federation in GCP.
Updates the Go version used in the GitHub Actions workflow (`.github/workflows/go-tests.yml`) from '1.22' to '1.23'. This change ensures that the CI environment uses the same Go version as specified in your project's `go.mod` files (which indicate Go 1.23.x). This prevents potential compatibility issues where code using Go 1.23 features might fail to compile in a CI environment running an older Go version.
This commit introduces a Google Cloud Storage (GCS) serverless implementation alongside the existing MinIO setup. Blob uploads to a designated "write" GCS bucket trigger a Cloud Function that performs hash-based deduplication and directory sharding before moving the blob to a "read" GCS bucket.
Key changes and features:
GCS Implementation (
gcs/directory):repos.se/minio-deduplication/v2/gcscontains the GCS-specific logic.gcs_transfer.HandleGCSEvent) processes blob uploads.aa/bb/).go.uber.org/zap.Configurable Folder Depth (GCS specific):
PRESERVED_FOLDER_DEPTHenvironment variable for the Cloud Function.preserved-depth-overrideGCS metadata key.Refactored Integration Tests (
integration_tests/go/):testingpackage.StorageServiceinterface abstracts backend operations (MinIO, GCS).AppMonitorinterface abstracts application monitoring (metrics for MinIO, logs for GCS).TEST_TARGETenvironment variable.TestBasicUploadAndTransferported frombasic-flow.sh.TestFolderDepthFeatureGCS) specifically validate the configurable folder depth feature on GCS.Unit Tests:
gcs_transfer_test.go) cover path generation, hashing, and the folder depth feature.Documentation (
gcs/README.md):gclouddeployment command for the Cloud Function.Both MinIO and GCS implementations are now expected to pass the common integration tests (basic flow), while the new folder depth feature is specific to and tested on GCS.