015 - Extract #417
eveleighoj
started this conversation in
Open design proposal
Replies: 1 comment 1 reply
-
|
Ae there any dependencies on other I.AI repos outside of plannin-extract and extract-app |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Introduction
The extract tool has been developed by I.AI and they are looking to MHCLG to take over the hosting and management of the tool. This ODP hopes to explain how we will host this infrastructure.
Status
Draft
Summary And Content
extract is a tool for creating data from documents. this ODP may eb too detailed but it aims to lay out all of the components and work required to get this working in mhclg infrastructure there are four main technical things that need to happen:
I have attempted to explain the above and put key acceptance criteria against them in the key deliverables section below.
Contents
Key Deliverables
We want to migrate the current system developed by I.AI into MHCLG. There are two main componants which we expect I.AI to deliver:
Extract
An open source Python library which provides tools to extract data from documents. This should be the primary entrypoint for others to gain value from the work completed by I.AI
Acceptance Criteria:
Extract Service
A service which provides an API which can be used by other MHCLG systems to allow users to extract data from documents via an API interface.
Acceptance Criteria:
In addition to the above Both MHCLG and I.AI are working together to produce two additional pieces of work to support the usage of the extract service.
Authentication & Authorisation
Currently MHCLG does not have an authentication or authorisation ready for a product like extract. I.AI have an internal solution which is currently being used within the alpha but it has limitations. This is a fairly unknown section and likely needs invetigation of requirements needed now and requirements needed for the future.
Acceptance Criteria:
Application Code
A key part of of using the extract service is generating both front end and backend code which can interact with the extract api. the functionality should be available through a single api. this code should be usable in any express.js application including test apps and the providers service in planning data.
Acceptance criteria
Expected System
System Context
Above is the expected System context once extract has migrated to MHCLG. There are some key points to cover:
Key big open questions:
Extract API System
Now let's explore the containers required in the extract service:
And an infrastructure diagram which includes supporting infrastructure:
Questions:
Extract Containers
Here let's split the Extract architecture down by each container adding details for each one. remember a container isn't strictly a docker container but a container as defined in the C4 model!
Extract API
I.AI repo: extact-app
digital-land repo:
This container or for a simple fast API application, it accepts requests and queues them into the task queue. Once requests are complete it's responsible for retrieving details for the front end application.
Alerting & Monitoring
CI/CD
Performance & Scalability
Security
Extract Task Queue
I.AI repos: extract-app
digital-land repos:
The SQS queue where the API can drop tasks and the workers can pull tasks from
Extract Worker
Segmenter
Georeferencer
Data Flow Diagram
STEP 1 - moving application side code into provide
Beta Was this translation helpful? Give feedback.
All reactions