Skip to content

Architecture Revision Journey #4

@ntdkhiem

Description

@ntdkhiem

Current Architecture

  • a user uploads text file to manager service.
  • manager service streams data straight to GCS while simultaneously calculate character frequency table.
  • once manager service finish, it streams the table to GCS and sends a message to Pub/Sub.
  • a worker receives that message from listening to the assigned Pub/Sub topic.
  • worker downloads the table and computes Huffman Coding Tree.
  • worker streams data from GCS by UTF-8 encoding character, encodes using Huffman algorithm, then streams back to GCS in different object.

Goals

  • highly scalable through the use of Pub/Sub.
  • decoupled components.
  • leverage managed services: GCS, Pub/Sub for durability, reliability, and availability.

Benefits:

  • manager service skips a step and saves space from having to store data in the local file system.
  • manager saves computing power by simultaneously calculating character frequency.
  • having Pub/Sub as a message queue allows the job to be asynchronous.
  • overall, the user won't have to wait for the workflow to finish before retrieving a job's ID.

Drawbacks:

  • what happens when streaming data to/from GCS fail?
  • what happens when manager service fail to upload character frequency table to GCS?
  • what happens if the manager service crashes before sending a message to Pub/Sub?
  • streaming data to GCS has to through manager service because it needs to calculate character frequency table. Can I skip this step? Is this too complex and too early?
  • can I allow only the workers and restrict the manager service from having access to GCP resources?

Architecture Draft 1 (I maybe am going overboard with this)

  • a user uploads text file to manager service.
  • manager service requests a signed URL from the worker (or independent service) along with job's ID to stream data to GCS.
  • once the streaming has finished, GCS will trigger a cloud function to calculate the character frequency table and send message to Pub/Sub.
  • TBD

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions