Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
5bcfdc1
Bump utils to 94.0.1
quis Jan 24, 2025
427df81
Remove pyproject.toml
quis Jan 24, 2025
3d83815
Run ruff format
quis Jan 24, 2025
9c1da63
Don’t run black from Makefile
quis Jan 24, 2025
0ab672a
Merge pull request #366 from alphagov/bump-utils-94.0.1
quis Jan 28, 2025
936dac2
Remove pull request template
quis Feb 11, 2025
8c90408
Install uv from Docker registry
quis Feb 11, 2025
4e10509
Merge pull request #367 from alphagov/remove-pr-template
quis Feb 12, 2025
9aec6bd
Merge pull request #368 from alphagov/install-uv-docker
quis Feb 12, 2025
f7519b3
Bump utils to 95.1.1
quis Feb 13, 2025
7b271f1
Merge pull request #369 from alphagov/bump-utils-95.1.1
quis Feb 25, 2025
6441b98
Bump Sentry SDK to 1.45.1
quis Mar 5, 2025
f357521
Add test for gunicorn config
quis Mar 6, 2025
f033377
Merge pull request #370 from alphagov/sentry-1.45.1
quis Mar 10, 2025
61483a3
Merge pull request #371 from alphagov/test-gunicorn-config
quis Mar 13, 2025
47f3244
Bump utils to 97.0.3
quis Mar 14, 2025
e8e5cb6
Merge pull request #373 from alphagov/bump-utils-97.0.3
quis Mar 14, 2025
c205b27
download views: rework error handling and status codes
risicle Apr 24, 2025
087d88a
DocumentStore.get_document_metadata: handle missing Expiration inform…
risicle Apr 29, 2025
1d917fd
DocumentStore: partially enforce expired document inaccessibility via…
risicle Apr 29, 2025
11ef29a
Bump gunicorn to the latest version
quis Jan 14, 2025
1b36437
Bump eventlet to 0.39.1
quis May 2, 2025
8829622
Merge pull request #365 from alphagov/gunicorn-23
quis May 6, 2025
af97c0b
Merge pull request #374 from alphagov/ris-error-handling
risicle May 7, 2025
838a046
Bump utils to 99.5.1
quis May 21, 2025
55239fa
Merge pull request #376 from alphagov/bump-utils-99.5.1
quis May 21, 2025
6737bba
DocumentStore: add created-at tag when uploading documents
risicle May 28, 2025
157c5c8
Merge pull request #377 from alphagov/ris-created-at-tag
risicle May 29, 2025
5e34715
Bump greenlet and cffi to the latest versions
quis Jun 2, 2025
0eb1837
DocumentStore: fallback to using s3 tags to determine expiration
risicle Jun 9, 2025
a56743c
DocumentStore tests: expand to cover various cases of tag/Expiration …
risicle Jun 9, 2025
d502174
Merge pull request #380 from alphagov/ris-tagging-based-expiry
risicle Jun 19, 2025
c5aef2d
chore: bump utils to 100.2.0
WesleyHindle Jul 24, 2025
795b103
Merge pull request #382 from alphagov/update-utils
WesleyHindle Jul 28, 2025
10f448a
Merge pull request #378 from alphagov/bump-greenlet-cffi
quis Aug 5, 2025
f3471b8
Update how to check for the right Python version in README
CrystalPea Sep 3, 2025
e69ec1d
Put mimetype and antivirus checks into their own method
rparke Sep 16, 2025
31d84e5
Upgrade Python version from 3.11 to 3.13
CrystalPea Sep 3, 2025
61909ac
Merge pull request #385 from alphagov/python_313_upgrade
CrystalPea Sep 26, 2025
c79febc
new endpoint and tests
rparke Sep 16, 2025
fae2943
working example of cachable function
rparke Sep 16, 2025
903acf5
Refactor into a class
quis Sep 26, 2025
fd50626
Refactor antivirus checks into class
quis Sep 26, 2025
7cc5849
Refactor virus_free and mimetype into properties
quis Sep 26, 2025
eef708c
Refactor virus_free with early return
quis Sep 26, 2025
1c9890a
Encapsulate dict introspection in the class
quis Sep 26, 2025
e4d5e4f
Make getters and setter to validate properties
quis Sep 26, 2025
32409ed
Refactor to remove duplication
quis Sep 26, 2025
0581b7e
Do checks on init
quis Sep 26, 2025
885ef3b
Remove unused classes
quis Sep 26, 2025
4924a7e
Move class into utils
quis Sep 26, 2025
1d76ae2
Remove unused code
quis Sep 26, 2025
1bcc635
Ensure file checks are always called with a hash of the contents
quis Sep 26, 2025
37ff0b6
Add caching of mime type and virus checks
quis Sep 26, 2025
924bf4a
Catch specific exceptions
quis Sep 26, 2025
342b6bb
Move classmethod up
quis Sep 26, 2025
1c8fc1d
Only raise one type of exception
quis Sep 26, 2025
adc9f81
Vary the hash key for different filenames and values of `is_csv`
quis Sep 26, 2025
1041c24
Use kebab case for URL paths 🍢
quis Sep 26, 2025
db8b587
Reduce cache lifetime to 24 hours
quis Sep 26, 2025
3416747
Add more tests for serialisation/deserialisation
quis Sep 26, 2025
2c6f65d
Remove `virus_free` from the serialised format
quis Sep 26, 2025
d70466a
Re-order methods for easier reading
quis Sep 26, 2025
7e9c849
Always check for viruses when file data is changed
quis Sep 26, 2025
ea51ebe
Hash file contents with service ID
quis Sep 29, 2025
bf0d05c
Restore deleted test assertion
quis Sep 29, 2025
9851b2c
Refactor with parameterize
quis Sep 29, 2025
b15942a
Restore original behaviour when antivirus is disabled
quis Sep 29, 2025
7345ec8
Make `virus_free` a method
quis Sep 29, 2025
f069adc
Move `do_virus_scan` call into `mimetype_serialised`
quis Sep 30, 2025
f9bae07
Fixed wrongly-swapped test names
quis Sep 30, 2025
4231a4f
Merge pull request #388 from alphagov/file_check_endpoint-class
quis Sep 30, 2025
4f841d8
Use file extension to determine cache key
quis Sep 30, 2025
2309b82
Only cache `is_csv` if filename not provided
quis Sep 30, 2025
8d7064c
Spelling mistake
quis Oct 2, 2025
63ae6bc
Spelling mistake
quis Oct 2, 2025
e6373b4
Merge pull request #389 from alphagov/hash-on-extension-not-filename
quis Oct 2, 2025
5ab910a
upstream merge, config abstractions, dependency fixes
pgaetani Oct 16, 2025
679d2bb
nl dockerfile
pgaetani Oct 16, 2025
83b7e74
workflow changes
pgaetani Oct 16, 2025
02eb6d0
ruff formatting
pgaetani Oct 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 0 additions & 7 deletions .github/pull_request_template.md

This file was deleted.

8 changes: 4 additions & 4 deletions .github/workflows/merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,11 @@ jobs:
activate-environment: true
enable-cache: true
cache-dependency-glob: |
requirements_for_test.in
requirements_for_test.txt
requirements_nl_test.in
requirements_nl_test.txt

- name: Install application requirements (pip)
run: uv pip install -r requirements_for_test.txt
run: uv pip install -r requirements_nl_test.txt

- name: Set tag
id: set-tag
Expand Down Expand Up @@ -88,7 +88,7 @@ jobs:

- uses: docker/build-push-action@v6
with:
file: docker/Dockerfile
file: docker/Dockerfile_nl
push: true
tags: ${{ env.IMAGE }}:latest,${{ env.IMAGE }}:${{ needs.build-tag-test.outputs.tag }}
platforms: linux/amd64,linux/arm64
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,11 @@ jobs:
activate-environment: true
enable-cache: true
cache-dependency-glob: |
requirements_for_test.in
requirements_for_test.txt
requirements_nl_test.in
requirements_nl_test.txt

- name: Install application requirements (pip)
run: uv pip install -r requirements_for_test.txt
run: uv pip install -r requirements_nl_test.txt

- uses: astral-sh/ruff-action@v3

Expand Down
10 changes: 3 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# This file was automatically copied from notifications-utils@92.1.1
# This file was automatically copied from notifications-utils@100.2.0

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
Expand All @@ -9,12 +9,8 @@ repos:
- id: check-yaml
- id: debug-statements
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: 'v0.8.2'
rev: 'v0.11.4'
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- repo: https://github.com/psf/black
rev: 24.10.0
hooks:
- id: black
name: black (python)
- id: ruff-format
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ run-flask-with-docker: ## Run flask with docker
.PHONY: test
test: ## Run all tests
ruff check .
black --check .
ruff format --check .
py.test tests/

.PHONY: test-with-docker
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

### Python version

Check the version is [runtime.txt](runtime.txt)
Check the version is the same as at the top of [Dockerfile](docker/Dockerfile)


### uv
Expand Down
7 changes: 5 additions & 2 deletions app/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,12 @@

from flask import Flask, current_app, jsonify
from gds_metrics import GDSMetrics
from notifications_utils import logging, request_helper
from notifications_utils import request_helper
from notifications_utils.clients.antivirus.antivirus_client import AntivirusClient
from notifications_utils.clients.redis.redis_client import RedisClient
from notifications_utils.eventlet import EventletTimeout
from notifications_utils.local_vars import LazyLocalGetter
from notifications_utils.logging import flask as utils_logging
from werkzeug.local import LocalProxy

from app.config import Config, configs
Expand Down Expand Up @@ -48,6 +49,7 @@


from app.download.views import download_blueprint
from app.file_checks.views import file_checks_blueprint
from app.upload.views import upload_blueprint

mimetypes.init()
Expand All @@ -63,7 +65,7 @@ def create_app():
application.config.from_object(Config)

request_helper.init_app(application)
logging.init_app(application)
utils_logging.init_app(application)

metrics.init_app(application)
redis_client.init_app(application)
Expand All @@ -73,6 +75,7 @@ def create_app():

application.register_blueprint(download_blueprint)
application.register_blueprint(upload_blueprint)
application.register_blueprint(file_checks_blueprint)

@application.errorhandler(EventletTimeout)
def eventlet_timeout(error):
Expand Down
52 changes: 51 additions & 1 deletion app/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,57 @@ class Development(Config):
REDIS_URL = os.getenv("REDIS_URL", "redis://localhost:6379/1")


################
### NotifyNL ###
################
NL_PREFIX = "notifynl"


class ConfigNL(Config):
NOTIFY_APP_NAME = "document-download-api"


class DevNL(ConfigNL):
NOTIFY_ENVIRONMENT = "development"

DEBUG = True
NOTIFY_REQUEST_LOG_LEVEL = "DEBUG"

DOCUMENTS_BUCKET = f"{NL_PREFIX}-{NOTIFY_ENVIRONMENT}-document-download"


class TestNL(ConfigNL):
NOTIFY_ENVIRONMENT = "test"

DEBUG = True
NOTIFY_REQUEST_LOG_LEVEL = "DEBUG"

DOCUMENTS_BUCKET = f"{NL_PREFIX}-{NOTIFY_ENVIRONMENT}-document-download"


class AccNL(ConfigNL):
NOTIFY_ENVIRONMENT = "acceptance"

DEBUG = False
NOTIFY_REQUEST_LOG_LEVEL = "INFO"

DOCUMENTS_BUCKET = f"{NL_PREFIX}-{NOTIFY_ENVIRONMENT}-document-download"


class ProdNL(ConfigNL):
NOTIFY_ENVIRONMENT = "production"

DEBUG = False
NOTIFY_REQUEST_LOG_LEVEL = "ERROR"

DOCUMENTS_BUCKET = f"{NL_PREFIX}-{NOTIFY_ENVIRONMENT}-document-download"


configs = {
"test": Test,
"development": Development,
"devnl": DevNL,
"test": Test,
"testnl": TestNL,
"acceptance": AccNL,
"production": ProdNL
}
35 changes: 16 additions & 19 deletions app/download/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def download_document(service_id, document_id, extension=None):
"document_id": document_id,
},
)
return jsonify(error=str(e)), 400
return jsonify(error=str(e)), e.suggested_status_code

if redirect := get_redirect_url_if_user_not_authenticated(request, document):
return redirect
Expand Down Expand Up @@ -130,24 +130,21 @@ def get_document_metadata(service_id, document_id):
"document_id": document_id,
},
)
return jsonify(error=str(e)), 400

if metadata:
document = {
"direct_file_url": get_direct_file_url(
service_id=service_id,
document_id=document_id,
key=key,
mimetype=metadata["mimetype"],
),
"confirm_email": metadata["confirm_email"],
"size_in_bytes": metadata["size"],
"file_extension": current_app.config["MIME_TYPES_TO_FILE_EXTENSIONS"][metadata["mimetype"]],
"filename": metadata["filename"],
"available_until": metadata["available_until"],
}
else:
document = None
return jsonify(error=str(e)), e.suggested_status_code

document = {
"direct_file_url": get_direct_file_url(
service_id=service_id,
document_id=document_id,
key=key,
mimetype=metadata["mimetype"],
),
"confirm_email": metadata["confirm_email"],
"size_in_bytes": metadata["size"],
"file_extension": current_app.config["MIME_TYPES_TO_FILE_EXTENSIONS"][metadata["mimetype"]],
"filename": metadata["filename"],
"available_until": metadata["available_until"],
}

response = make_response({"document": document})
response.headers["X-Robots-Tag"] = "noindex, nofollow"
Expand Down
1 change: 1 addition & 0 deletions app/file_checks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .errors import * # noqa import errors module to register error handlers
8 changes: 8 additions & 0 deletions app/file_checks/errors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
from flask import jsonify

from .views import file_checks_blueprint


@file_checks_blueprint.errorhandler(413)
def request_entity_too_large(error):
return jsonify(error="Uploaded file exceeds file size limit"), 413
17 changes: 17 additions & 0 deletions app/file_checks/views.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from flask import Blueprint, jsonify, request

from app.utils.authentication import check_auth
from app.utils.file_checks import AntivirusAndMimeTypeCheckError, UploadedFile

file_checks_blueprint = Blueprint("file_checks", __name__, url_prefix="")
file_checks_blueprint.before_request(check_auth)


@file_checks_blueprint.route("/services/<uuid:service_id>/antivirus-and-mimetype-check", methods=["POST"])
def get_mime_type_and_run_antivirus_scan(service_id):
try:
uploaded_file = UploadedFile.from_request_json(request.json, service_id=service_id)
except AntivirusAndMimeTypeCheckError as e:
return jsonify(error=e.message), e.status_code

return jsonify(mimetype=uploaded_file.mimetype), 200
107 changes: 13 additions & 94 deletions app/upload/views.py
Original file line number Diff line number Diff line change
@@ -1,109 +1,28 @@
import mimetypes
from base64 import b64decode, binascii
from io import BytesIO
from flask import Blueprint, jsonify, request

from flask import Blueprint, abort, current_app, jsonify, request
from notifications_utils.clients.antivirus.antivirus_client import AntivirusError
from notifications_utils.recipient_validation.errors import InvalidEmailError
from werkzeug.exceptions import BadRequest

from app import antivirus_client, document_store
from app.utils import get_mime_type
from app import document_store
from app.utils.authentication import check_auth
from app.utils.files import split_filename
from app.utils.file_checks import AntivirusAndMimeTypeCheckError, UploadedFile
from app.utils.urls import get_direct_file_url, get_frontend_download_url
from app.utils.validation import (
clean_and_validate_email_address,
clean_and_validate_retention_period,
validate_filename,
)

upload_blueprint = Blueprint("upload", __name__, url_prefix="")
upload_blueprint.before_request(check_auth)


def _get_upload_document_request_data(data): # noqa: C901
if "document" not in data:
raise BadRequest("No document upload")

try:
raw_content = b64decode(data["document"])
except (binascii.Error, ValueError) as e:
raise BadRequest("Document is not base64 encoded") from e

if len(raw_content) > current_app.config["MAX_DECODED_FILE_SIZE"]:
abort(413)
file_data = BytesIO(raw_content)
is_csv = data.get("is_csv", False)

if not isinstance(is_csv, bool):
raise BadRequest("Value for is_csv must be a boolean")

confirmation_email = data.get("confirmation_email", None)
if confirmation_email is not None:
try:
confirmation_email = clean_and_validate_email_address(confirmation_email)
except InvalidEmailError as e:
raise BadRequest(str(e)) from e

retention_period = data.get("retention_period", None)
if retention_period is not None:
try:
retention_period = clean_and_validate_retention_period(retention_period)
except ValueError as e:
raise BadRequest(str(e)) from e

filename = data.get("filename", None)
if filename:
try:
filename = validate_filename(filename)
except ValueError as e:
raise BadRequest(str(e)) from e

return file_data, is_csv, confirmation_email, retention_period, filename


@upload_blueprint.route("/services/<uuid:service_id>/documents", methods=["POST"])
def upload_document(service_id):
try:
file_data, is_csv, confirmation_email, retention_period, filename = _get_upload_document_request_data(
request.json
)
except BadRequest as e:
return jsonify(error=e.description), 400

if current_app.config["ANTIVIRUS_ENABLED"]:
try:
virus_free = antivirus_client.scan(file_data)
except AntivirusError:
return jsonify(error="Antivirus API error"), 503

if not virus_free:
return jsonify(error="File did not pass the virus scan"), 400

if filename:
mimetype = mimetypes.types_map[split_filename(filename, dotted=True)[1]]
else:
mimetype = get_mime_type(file_data)

# Our mimetype auto-detection sometimes resolves CSV content as text/plain, so we use
# an explicit POST body parameter `is_csv` from the caller to resolve it as text/csv
if is_csv and mimetype == "text/plain":
mimetype = "text/csv"

if mimetype not in current_app.config["MIME_TYPES_TO_FILE_EXTENSIONS"]:
allowed_file_types = ", ".join(
sorted({f"'.{x}'" for x in current_app.config["FILE_EXTENSIONS_TO_MIMETYPES"].keys()})
)
return jsonify(error=f"Unsupported file type '{mimetype}'. Supported types are: {allowed_file_types}"), 400
uploaded_file = UploadedFile.from_request_json(request.json, service_id=service_id)
except AntivirusAndMimeTypeCheckError as e:
return jsonify(error=e.message), e.status_code

document = document_store.put(
service_id,
file_data,
mimetype=mimetype,
confirmation_email=confirmation_email,
retention_period=retention_period,
filename=filename,
uploaded_file.file_data,
mimetype=uploaded_file.mimetype,
confirmation_email=uploaded_file.confirmation_email,
retention_period=uploaded_file.retention_period,
filename=uploaded_file.filename,
)

return (
Expand All @@ -115,14 +34,14 @@ def upload_document(service_id):
service_id=service_id,
document_id=document["id"],
key=document["encryption_key"],
mimetype=mimetype,
mimetype=uploaded_file.mimetype,
),
"url": get_frontend_download_url(
service_id=service_id,
document_id=document["id"],
key=document["encryption_key"],
),
"mimetype": mimetype,
"mimetype": uploaded_file.mimetype,
},
),
201,
Expand Down
Loading