Skip to content

Conversation

@maciej
Copy link

@maciej maciej commented Dec 12, 2025

An attempt at reducing disk space usage of the published module. See #1658 for issue description.

The code was written by ChatGPT 5.1 Codex, using OpenCode with the following prompt:

The project is currently using test_data/largefile.txt for some of its tests.
It's a large ~70MiB file that when compressed with zstd (see the untracked changes) takes up only 15kiB.
In order to reduce the working directory size and the size of the published module change every instance
of where the largefile.txt is used in tests into the zstd equiv. Few things you'll have to remember:
the compress package is currently used as an indirect dependency, we'll need to run go mod tidy
to bring it into the direct deps section, tests cannot be run without setting up Snowflake DB;
I think we won't be able to do it here. Try reading from the zstd file wrapping any readers
with transparent zstd decompression. Try not to decompress the file in place.

The zstded file decompresses exactly to the largefile.txt, but I nonetheless suggest code reviewers verifying this claim.

I did not setup tests locally for this change.

@maciej maciej requested a review from a team as a code owner December 12, 2025 12:24
@github-actions
Copy link


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

}

func openLargeTestFile(sourceDir string) (io.ReadCloser, error) {
filePath := filepath.Join(sourceDir, "test_data", "largefile.txt.zstd")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, if we want to fix it, I would prefer having randomly generated content instead of compressed file. My opinion it would be best to:

  1. remove largefile.txt
  2. generate random content if file not exists
  3. save it in test_data dir
  4. add it to gitignore

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfc-gh-pfus OK, fair enough. I'm happy to contribute, but I'd appreciate if you'd point me to how to run tests (Snowflake setup); I'm a tourist here, just caring for small Go mod caches.

go mod why github.com/snowflakedb/gosnowflake
# github.com/snowflakedb/gosnowflake
[INTERNAL | REDACTED]
github.com/burningalchemist/sql_exporter/cmd/sql_exporter
github.com/burningalchemist/sql_exporter
github.com/snowflakedb/gosnowflake

We don't use the lib directly ourselves, it's a sql_exporter requirement. I'll fork / tweak sql_exporter if I need to.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, to run integration tests locally, you have to have a Snowflake account, trial should be good enough. Our tests are disabled for external contributions by default, but when we have something close to final version, I'll trigger them manually.

Our tests create random schema in SF during the init, but it can be disabled by setting SKIP_SETUP=true env variable. Because your changes are mostly about what happens before a connection is made, my suggestion is (if you don't want to start a trial):

  1. Set SKIP_SETUP=true
  2. Add printing of a query text (to ensure file paths are correct) temporarily - remove it when your change is ready
  3. Run a test
  4. Observe manually if a correct file is created in a directory and it matches printed query text

An attempt at reducing disk space usage of the published
module. See snowflakedb#1658 for issue description.

The code was written by ChatGPT 5.1 Codex, using OpenCode with the following prompt:
```
The project is currently using test_data/largefile.txt for some of its tests.
It's a large ~70MiB file that when compressed with zstd (see the untracked changes) takes up only 15kiB.
In order to reduce the working directory size and the size of the published module change every instance
of where the largefile.txt is used in tests into the zstd equiv. Few things you'll have to remember:
the compress package is currently used as an indirect dependency, we'll need to run go mod tidy
to bring it into the direct deps section, tests cannot be run without setting up Snowflake DB;
I think we won't be able to do it here. Try reading from the zstd file wrapping any readers
with transparent zstd decompression. Try not to decompress the file in place.
```

The _zstded_ file decompresses exactly to the `largefile.txt`,
but I nonetheless suggest code reviewers verifying this claim.

I did not setup tests locally for this change.
@maciej
Copy link
Author

maciej commented Dec 19, 2025

I have read the CLA Document and I hereby sign the CLA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants