Merge 20251016 by pgaetani · Pull Request #2 · Worth-NL/document-download-api

pgaetani · 2025-10-16T09:27:58Z

No description provided.

## 94.0.1 * Add `ruff.toml` to `MANIFEST.in` ## 94.0.0 * `version_tools.copy_config` will now copy `ruff.toml` instead of `pyproject.toml`. Apps should maintain their own `pyproject.toml`, if required * Replaces `black` formatter with `ruff format` * Upgrades `ruff` to version 0.9.2 ## 93.2.1 * Replaces symlink at `./pyproject.toml` with a duplicate file ## 93.2.0 * logging: add verbose eventlet context to app.request logs if request_time is over a threshold ## 93.1.0 * Introduce `NOTIFY_LOG_LEVEL_HANDLERS` config variable for separate control of handler log level ## 93.0.0 * BREAKING CHANGE: logging: all contents of `logging/__init__.py` have been moved to `logging/flask.py` because they all assume a flask(-like) environment and this way we don't implicitly import all of flask etc. every time anything under `logging` is imported. * Adds `request_cpu_time` to `app.request` logs when available. * Adds ability to track eventlet's switching of greenlets and annotate `app.request` logs with useful metrics when the flask config parameter `NOTIFY_EVENTLET_STATS` is set and the newly provided function `account_greenlet_times` is installed as a greenlet trace hook. *** Complete changes: alphagov/notifications-utils@92.1.1...94.0.1

Bump utils to 94.0.1

All our apps are continously deployed now, so we don’t need to specifically call it out when making PRs against this app.

Copies https://github.com/alphagov/notifications-api/pull/4376/files Also consolidates environment variables into a single block

Remove pull request template

Install uv from Docker registry

## 95.1.1 * Add `RUF100` rule to linter config (checks for inapplicable uses of `# noqa`) ## 95.1.0 * Adds log message and statsd metric for retried celery tasks ## 95.0.0 * Reverts 92.0.0 to restore new validation code *** Complete changes: alphagov/notifications-utils@94.0.1...95.1.1

Bump utils to 95.1.1

Fixes GHSA-g92j-qhmh-64v2 Not something that likely affects us, but clears down another Dependabot warning. This fix was released in sentry-sdk==2.8.0, then also backported to sentry-sdk==1.45.1. I think we could probably upgrade to Sentry >= 2 without making any changes, but more testing would be needed to validate this. So just going with the backport for now. Specifying an exact version because that’s our convention.

This makes sure that: - all the imports in `gunicorn_config.py` exist - it’s exporting the variables we expect it to

Bump Sentry SDK to 1.45.1

Add test for gunicorn config

## 97.0.3 * Bump minimum jinja2 version to 3.1.6 ## 97.0.2 * Fix external link redirect ## 97.0.1 * Fix QR code URL with '&' encoded as '&' ## 97.0.0 * for bilingual letters preview, only show barcodes on the first page (patch). * rename `include_notify_tag` argument to `includes_first_page` (major). ## 96.0.0 * BREAKING CHANGE: the `gunicorn_defaults` module has been moved to `gunicorn.defaults` to make space for gunicorn-related utils that don't have the restricted-import constraints of the gunicorn defaults. Imports of `notifications_utils.gunicorn_defaults` should be changed to `notifications_utils.gunicorn.defaults`. * Added `ContextRecyclingEventletWorker` custom gunicorn worker class. ## 95.2.0 * Implement `InsensitiveSet.__contains__` ## 95.1.2 * Fix to the number of arguments the `on_retry` of the `NotifyTask` class takes. *** Complete changes: alphagov/notifications-utils@95.1.1...97.0.3

Bump utils to 97.0.3

don't squash all store errors to 400 status code - if the intention was to prevent information leak, it failed to do this because it included the error string in the response. do this by making DocumentStoreError a superclass of other store exception types so exception handlers can continue to listen for the general DocumentStoreError. return 410 for keys that have a delete marker, 404 for documents with no such key, 403 for documents that are blocked. notably also catch NoSuchKey from check_for_blocked_document's get_object_tagging calls and turn this into a DocumentNotFound error, because in practise this will be the first problem a request will encounter when trying to access a missing document.

…ation gracefully instead of causing a crash this should simply return the metadata with the expiration missing, allowing the client to make up its own mind what to do. in most normal cases this is only used for informational display, and in case of a maintenance issue or misconfiguration it's better to have this information missing rather than being unable to serve any documents to anyone (a certain P1)

… app logic and Expiration metadata this check was previously done in document-download-frontend, but it really belongs in the document-download-api if anywhere, if only because the api's download endpoints are publicly accessible and a document denied access by just the frontend could always just be downloaded via the api. note this check still relies on the s3 bucket's lifecycle policies being in place and implicitly the s3 object's last modified timestamp. this will only "rescue" us in cases where a delete marker has not yet been added by the lifecycle rile for some reason.

The version of Gunicorn we are using is more than 18 months out of date[1] and has a high severity security vulnerabilites[2]. We have not updated the version on the API (and therefore the minimum version in utils[3]) because last time we tried (while still on PaaS) it had some performance issues, documented here[4]: > We originally pinned this due to eventlet v0.33 compatibility issues. > That was supposedly fixed in version v21.0.0 and we merged v21.2.0 for > a while. Until we ran a load test again, and identified that the > bumped version of gunicorn led to a 33%+ drop-off in > performance/requests per second that the API was able to handle. If a > version greater than 21.2.0 is released, and it either gives us > something we need or we think it addresses said performance issues, > make sure to run a load test in staging before releasing to > production. But document download does not serve anywhere near the same number of requests per second as the API, so we have already upgraded to version 21.2.0. This pull request just updates from 21.2.0 to 23.0.0 (the latest version), which resolves the security vulnerability. *** 1. https://github.com/benoitc/gunicorn/tree/21.2.0 2. GHSA-w3h3-4rj7-4ph4 3. https://github.com/alphagov/notifications-utils/blob/main/setup.py#L31 4. https://github.com/alphagov/notifications-api/blob/aff08653d951d6f60dec8d701ae7cf1681b78a27/requirements.in#L10

Bump gunicorn to the latest version

Rework error handling

## 99.5.1 * Bump minimum Flask version to 3.1.1 ## 99.5.0 * Update economy letter transit date to 5 days ## 99.4.1 * Fix celery beat logging so that it will log in json ## 99.4.0 * Add celery logging configuration for json logging ## 99.3.4 * `celery.NotifyTask`: don't emit early log if called synchronously ## 99.3.3 * Fix bug with non-SMS templates considering international SMS limits ## 99.3.2 * Stops counting Crown Dependency phone numbers in CSV file as international ## 99.3.1 * Bump `python-json-logger` to version 3 ## 99.3.0 * Adds `RecipientCSV.international_sms_count` * Adds `RecipientCSV.more_international_sms_than_can_send` (initialise per `RecipientCSV(remaining_international_sms_messages=123)` to enable this check) ## 99.2.0 * Add s3 multipart upload lifecycle (create, upload, complete, abort) ## 99.1.2 * Normalise unusual dashes and spacing characters in phone numbers ## 99.1.1 * Bump ruff to latest version ## 99.1.0 * Adds support for economy mail in get_min_and_max_days_in_transit, with delivery estimated between 2 and 6 days ## 99.0.0 * NOTABLE CHANGE: Celery tasks emit an "early" log message when starting, with a customisable log level to allow this to be disabled for high-volume tasks. When upgrading past this version you may want to pass the extra argument `early_log_level=logging.DEBUG` to the task decorator of your most heavily-called tasks to reduce the effect on log volume * Annotates automatic celery task log messages with `celery_task_id` ## 98.0.1 * Moves from `setup.py` to `pyproject.toml` for package configuation * Changes ruff config from `exclude` to `extend-exclude` ## 98.0.0 * Update rate multipliers for international SMS to have values for 1 April 2025 onwards. When used in notifications-api and notifications-admin this change will affect the billing of text messages the pricing shown. ## 97.0.5 * Fix mailto: URLs in Markdown links ## 97.0.4 * Fix inset text block styling *** Complete changes: alphagov/notifications-utils@97.0.3...99.5.1

Bump utils to 99.5.1

to an extent this is a "backup" of the creation-time information we usually refer to s3's native timestamp for in case some maintenance-related reason causes us to lose it, but we will probably add logic to the app further down the line to use the data from this tag for additional access enforcement. timestamp will probably not match s3's native timestamp all the time.

DocumentStore: add `created-at` tag when uploading documents

The versions we are using do not install on newer versions of Python because of the way they are packaged

including minimal test changes intending to "prove" that everything still works more-or-less the same without tagging present

These can control the mime type so we shouldn’t cache under the same key if they change.

It’s what we do elsewhere

We want to re-check files for viruses after 24 hours, in case the definitions have changed.

We were a bit light on test coverage here

Either the format is `{"success": …}` or {`error`: …}. `{"success": {"virus_free": False}}` is ambiguous so it shouldn’t be a structure we allow to be possible.

This is a bit more robust than just relying on execution order.

This prevents timing attacks, where one service could see if any other service has uploaded a document by seeing if the checks for that document run more quickly (because they have been cached).

This is more robust because it shows that the difference in. behaviour is entirely dependent on file size

The file is always marked as virus-free when antivirus is disabled. This makes local development easier. This behaviour was accidentally switched in the process of refactoring, because - the original code was hard to read - there were no tests

Before, virus free could theoretically: - return `True` - return `False` - raise In practice the way it was structured meant it could never return False. This means calling code had to account for the `raise`/`False` possibilities, even though the latter would never happen. Changing it to a method makes it clearer that it’s doing work. It also means it less unexpected when we change it to either: - return `None` (success) - raise (failed)

This is a better separation of the control flow because it groups the exception handler with the calls which can raise the exception.

Add a file check endpoint and cache results

Only the file extension can change what MIME type is detected for the file. The rest of the filename doesn’t matter. This probably won’t result in a speedup (the dict lookup from extention to MIME type is cheap) but it could result in slightly fewer cache keys being used.

The `is_csv` argument is ignored if a filename is provided. So we can reduce the space of possible hash keys by not including it

Co-authored-by: Mervi Tyczynska <CrystalPea@users.noreply.github.com>

…ilename Optimise hash key for MIME type check

quis and others added 30 commits January 24, 2025 12:30

Remove pyproject.toml

427df81

Run ruff format

3d83815

Don’t run black from Makefile

9c1da63

Merge pull request alphagov#366 from alphagov/bump-utils-94.0.1

0ab672a

Bump utils to 94.0.1

Remove pull request template

936dac2

All our apps are continously deployed now, so we don’t need to specifically call it out when making PRs against this app.

Install uv from Docker registry

8c90408

Copies https://github.com/alphagov/notifications-api/pull/4376/files Also consolidates environment variables into a single block

Merge pull request alphagov#367 from alphagov/remove-pr-template

4e10509

Remove pull request template

Merge pull request alphagov#368 from alphagov/install-uv-docker

9aec6bd

Install uv from Docker registry

Merge pull request alphagov#369 from alphagov/bump-utils-95.1.1

7b271f1

Bump utils to 95.1.1

Add test for gunicorn config

f357521

This makes sure that: - all the imports in `gunicorn_config.py` exist - it’s exporting the variables we expect it to

Merge pull request alphagov#370 from alphagov/sentry-1.45.1

f033377

Bump Sentry SDK to 1.45.1

Merge pull request alphagov#371 from alphagov/test-gunicorn-config

61483a3

Add test for gunicorn config

Merge pull request alphagov#373 from alphagov/bump-utils-97.0.3

e8e5cb6

Bump utils to 97.0.3

Bump eventlet to 0.39.1

1b36437

Merge pull request alphagov#365 from alphagov/gunicorn-23

8829622

Bump gunicorn to the latest version

Merge pull request alphagov#374 from alphagov/ris-error-handling

af97c0b

Rework error handling

Merge pull request alphagov#376 from alphagov/bump-utils-99.5.1

55239fa

Bump utils to 99.5.1

Merge pull request alphagov#377 from alphagov/ris-created-at-tag

157c5c8

DocumentStore: add `created-at` tag when uploading documents

Bump greenlet and cffi to the latest versions

5e34715

The versions we are using do not install on newer versions of Python because of the way they are packaged

DocumentStore: fallback to using s3 tags to determine expiration

0eb1837

including minimal test changes intending to "prove" that everything still works more-or-less the same without tagging present

quis and others added 23 commits September 29, 2025 10:42

Vary the hash key for different filenames and values of is_csv

adc9f81

These can control the mime type so we shouldn’t cache under the same key if they change.

Use kebab case for URL paths 🍢

1041c24

It’s what we do elsewhere

Reduce cache lifetime to 24 hours

db8b587

We want to re-check files for viruses after 24 hours, in case the definitions have changed.

Add more tests for serialisation/deserialisation

3416747

We were a bit light on test coverage here

Remove virus_free from the serialised format

2c6f65d

Either the format is `{"success": …}` or {`error`: …}. `{"success": {"virus_free": False}}` is ambiguous so it shouldn’t be a structure we allow to be possible.

Re-order methods for easier reading

d70466a

Always check for viruses when file data is changed

7e9c849

This is a bit more robust than just relying on execution order.

Hash file contents with service ID

ea51ebe

This prevents timing attacks, where one service could see if any other service has uploaded a document by seeing if the checks for that document run more quickly (because they have been cached).

Restore deleted test assertion

bf0d05c

Refactor with parameterize

9851b2c

This is more robust because it shows that the difference in. behaviour is entirely dependent on file size

Move do_virus_scan call into mimetype_serialised

f069adc

This is a better separation of the control flow because it groups the exception handler with the calls which can raise the exception.

Fixed wrongly-swapped test names

f9bae07

Merge pull request alphagov#388 from alphagov/file_check_endpoint-class

4231a4f

Add a file check endpoint and cache results

Only cache is_csv if filename not provided

2309b82

The `is_csv` argument is ignored if a filename is provided. So we can reduce the space of possible hash keys by not including it

Spelling mistake

8d7064c

Co-authored-by: Mervi Tyczynska <CrystalPea@users.noreply.github.com>

Spelling mistake

63ae6bc

Co-authored-by: Mervi Tyczynska <CrystalPea@users.noreply.github.com>

Merge pull request alphagov#389 from alphagov/hash-on-extension-not-f…

e6373b4

…ilename Optimise hash key for MIME type check

upstream merge, config abstractions, dependency fixes

5ab910a

nl dockerfile

679d2bb

workflow changes

83b7e74

pgaetani requested a review from a team October 16, 2025 09:27

pgaetani self-assigned this Oct 16, 2025

pgaetani added enhancement New feature or request upstream merge Changes from alphagov labels Oct 16, 2025

ruff formatting

02eb6d0

frankmolenaar1986 approved these changes Oct 16, 2025

View reviewed changes

frankmolenaar1986 merged commit 3328457 into main Oct 16, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge 20251016#2

Merge 20251016#2
frankmolenaar1986 merged 81 commits intomainfrom
merge/20251016

pgaetani commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

pgaetani commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants