split S3 files into smaller files to send large union file#77
Open
yuyashiraki wants to merge 3 commits intofacebookresearch:mainfrom
Open
split S3 files into smaller files to send large union file#77yuyashiraki wants to merge 3 commits intofacebookresearch:mainfrom
yuyashiraki wants to merge 3 commits intofacebookresearch:mainfrom
Conversation
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D39219674 |
yuyashiraki
pushed a commit
to yuyashiraki/Private-ID
that referenced
this pull request
Sep 4, 2022
…esearch#77) Summary: Pull Request resolved: facebookresearch#77 # Context We found that AWS-SDK S3 API would fail when we try to write more than 5GB of data. It is a blocking us to do capacity testing for a larger FARGATE container. In this diff, as mentioned in [the post](https://fb.workplace.com/groups/pidmatchingxfn/posts/493743615908631), we are splitting union file based on number of rows. # Description We have made following changes. - Added new arg `s3api_max_rows` in the private-id-multi-key-client and private-id-multi-key-server binaries. We will use this to split a file for S3 upload. - Added an optional arg `num_split` in save_id_map() and writer_helper(). When `num_split` is specified, it would use the arg `path` as its prefix and save files in `{path}_0`, `{path}_1`, etc. - In rpc_server.rs and client.rs, calculates the num_split based on s3api_max_rows, and passes the num_split arg for S3 only. Then, for each split file, it calls copy_from_local(). Differential Revision: D39219674 fbshipit-source-id: 82dc1788b0d4db5cf9c3de07178b52a8cc11633c
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D39219674 |
4ba74f7 to
4f28018
Compare
Summary: # What * Add unit tests for encrypt and create_id_map funcion on partner side * Add create_key function to create fixed keys for testing. * encrypt and create_id_map function both use partner.private_keys.1 to encrypt. * self_permutation also needs to be fixed when we test create_id_map() # Why * need to improve code coverage Differential Revision: https://internalfb.com/D39127178 fbshipit-source-id: 22acb4c9d2d642b8df1348547098a7539f6ce7df
Summary: Pull Request resolved: facebookresearch#76 # What * Add unit tests for save_id_map funcion on partner side. * save_id_map function is called after the create_id_map(). * Add create_key function to create fixed keys for testing. * create_id_map function use partner.private_keys.1 to encrypt. * self_permutation also needs to be fixed when we test create_id_map(). * Create a temp file and pass the path to save_id_map() and check the string in the file is correct or not. # Why * need to improve code coverage Differential Revision: D39142927 fbshipit-source-id: 82884647935873fe1f2feef5b061f3cc5385bba2
…esearch#77) Summary: Pull Request resolved: facebookresearch#77 # Context We found that AWS-SDK S3 API would fail when we try to write more than 5GB of data. It is a blocking us to do capacity testing for a larger FARGATE container. In this diff, as mentioned in [the post](https://fb.workplace.com/groups/pidmatchingxfn/posts/493743615908631), we are splitting union file based on number of rows. # Description We have made following changes. - Added new arg `s3api_max_rows` in the private-id-multi-key-client and private-id-multi-key-server binaries. We will use this to split a file for S3 upload. - Added an optional arg `num_split` in save_id_map() and writer_helper(). When `num_split` is specified, it would use the arg `path` as its prefix and save files in `{path}_0`, `{path}_1`, etc. - In rpc_server.rs and client.rs, calculates the num_split based on s3api_max_rows, and passes the num_split arg for S3 only. Then, for each split file, it calls copy_from_local(). Differential Revision: D39219674 fbshipit-source-id: 871df40d1a377ef8115422e39a868a26e09e027d
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D39219674 |
4f28018 to
48c8aa6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Context
We found that AWS-SDK S3 API would fail when we try to write more than 5GB of data. It is a blocking us to do capacity testing for a larger FARGATE container.
In this diff, as mentioned in the post, we are splitting union file based on number of rows.
Description
We have made following changes.
s3api_max_rowsin the private-id-multi-key-client and private-id-multi-key-server binaries. We will use this to split a file for S3 upload.num_splitin save_id_map() and writer_helper(). Whennum_splitis specified, it would use the argpathas its prefix and save files in{path}_0,{path}_1, etc.Differential Revision: D39219674