Skip to content

Upload large files (>=1GB) takes too long to stream to GCS #3

@ntdkhiem

Description

@ntdkhiem

Problem

When a user submits a large file to manager's endpoint, they have to wait for a considerable amount of time before receiving a job's ID.

Solution

  • Option 1: Delegate streaming to the background and returns the job's ID immediately. Drawback: even though this manager can take requests immediately after, its memory will take heavy hit if multiple background jobs are initiated.
  • Option 2: MapReduce. Distribute the chunks of the file across multiple managers. Each manager will compute its own character frequency table and upload its assigned chunk to the correct bucket. At the end, the master manager will collect these tables and merge into a final table. Way more complex with many coordination.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions