-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture Revision Journey #4
Copy link
Copy link
Open
Description
Current Architecture
- a user uploads text file to manager service.
- manager service streams data straight to GCS while simultaneously calculate character frequency table.
- once manager service finish, it streams the table to GCS and sends a message to Pub/Sub.
- a worker receives that message from listening to the assigned Pub/Sub topic.
- worker downloads the table and computes Huffman Coding Tree.
- worker streams data from GCS by UTF-8 encoding character, encodes using Huffman algorithm, then streams back to GCS in different object.
Goals
- highly scalable through the use of Pub/Sub.
- decoupled components.
- leverage managed services: GCS, Pub/Sub for durability, reliability, and availability.
Benefits:
- manager service skips a step and saves space from having to store data in the local file system.
- manager saves computing power by simultaneously calculating character frequency.
- having Pub/Sub as a message queue allows the job to be asynchronous.
- overall, the user won't have to wait for the workflow to finish before retrieving a job's ID.
Drawbacks:
- what happens when streaming data to/from GCS fail?
- what happens when manager service fail to upload character frequency table to GCS?
- what happens if the manager service crashes before sending a message to Pub/Sub?
- streaming data to GCS has to through manager service because it needs to calculate character frequency table. Can I skip this step? Is this too complex and too early?
- can I allow only the workers and restrict the manager service from having access to GCP resources?
Architecture Draft 1 (I maybe am going overboard with this)
- a user uploads text file to manager service.
- manager service requests a signed URL from the worker (or independent service) along with job's ID to stream data to GCS.
- once the streaming has finished, GCS will trigger a cloud function to calculate the character frequency table and send message to Pub/Sub.
- TBD
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels