001 - Publish service - Async #413
eveleighoj
started this conversation in
Open design proposal
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Author(s) - @cpcundill
Status
Closed
Introduction
The first iteration of the Check service (originally named Publish service) offers validation of data files via an
upload. Interaction with the user is currently synchronous, meaning that a user uploads their data file and waits for
an immediate validation response. The service needs to scale to support many more data sets and concurrent users meaning
that the this approach is likely to result in system failure and/or poor user experience.
Causes of system failure might include:
Meanwhile, poor user experience might be caused by excessive wait times for validation of large files.
During development of the current service, it was observed that a data file of ~11MB was taking over 30 seconds to
process. Clearly, there are genuine scenarios where it will not be feasible to provide validation responses to end
users synchronously.
Detail
Overview
In order to address the problems outlined, the following changes are proposed:
By responding to validation requests asynchronously, the system is able to fulfil a user's request in a manner that
minimises the risk of system failure. Memory and disk space can be optimised for processing each individual request.
A database will be introduced to persist requests and validation results while a queue will be used to trigger
asynchronous processing of requests.
By uploading files directly to S3, large files will be handled by AWS' dedicated infrastructure designed for multipart
uploads. Gateways, web servers and APIs do not need to be scaled for file-based requests. In addition, allocated disk
space does not need to cater for storage of multiple concurrent uploaded files.
Containers
Structure
The Publish Request API will manage request data and persist such to a Postgres database.
An SQS queue will be used to trigger request to be fulfilled by the Publish Request Processor.
The Publish Request Processor will process just one request at a time meaning its CPU, memory and disk requirements
need only be as large as is necessary to process one file. Running multiple instances of the Processor allows the
system to process multiple requests concurrently.
GOV.UK Notify will be used to send an email notification to users
Interaction
Testing
In order to validate the solution and indeed arrive at the correct assignment of compute resources (e.g. CPU, memory and disk),
it will be necessary to undertake load testing. This testing will also help determine the necessary number of processors
required to meet concurrency targets.
One possible method for the load testing could be using AWS's Distributed Load Testing pattern
(see https://aws.amazon.com/solutions/implementations/distributed-load-testing-on-aws/).
Implementation considerations
New AWS resources will need to be provisioned for:
New code repositories, GitHub pipelines, ECR image repositories and ECS tasks will be needed for:
For upload direct to S3, users will be required to have Javascript enabled.
For load testing, it will be necessary to identify realistic figures for concurrent users and file sizes.
Prototype
A prototype of the design was put together in the following repositories and tagged:
Database design
In order to make the backend re-usable for other asynchronous workloads on the platform, a generic data model has
been designed for the Request Database which should satisfy the needs for most types of request.
The design allows for arbitrary request parameters in JSON format and arbitrary response data, again in JSON format, at
two levels of granularity. High-level data associated with a response can be stored in the response entity while
lower-level or repeating data associated with a response can be stored in the related response_detail entity.
It is envisaged that for some request types, response data at the root level will prove
sufficient, while other request types such as datafile checks will require row-level granularity and as such can make
use of multiple response_detail entities for each row.
Design Comments/Questions
Main feedback from Owen Eveleigh was to make the Publish Request API and Processor (backend) part of the platform
system so that it can be re-used by other systems for handling large workloads asynchronously.
Beta Was this translation helpful? Give feedback.
All reactions