update get_result() route by dale-wahl · Pull Request #510 · digitalmethodsinitiative/4cat

dale-wahl · 2025-07-04T12:07:11Z

this requires a dataset key (or, if provided, can use the dataset object) to ensure user has rights to files.

one annoying caveat is that PixPlot HTML files have the route hardcoded. I need to either figure out a way to circumvent that or write a migrate script. new plots will be produced correctly.

to reduce the db requests, we could memcache the dataset data and re-instantiate the dataset object with that data. dataset.refresh_owners() would still run which has its own db requests. this only really seems relevant with something like pixplot which makes many requests for various thumbs/images.

…set files

stijn-uva · 2025-07-16T11:22:18Z

The implementation makes sense, but one concern I have is that this breaks existing URLs (i.e. the key is now required in the URL and old URLs won't work at all anymore). I have used direct URLs in the past to share files and now these would no longer work. So I would prefer if the key-less URLs would still work, perhaps optionally - there could be a global option "allow file downloads without login" or something.

…uble prefix in archive path

dale-wahl · 2026-02-25T12:43:08Z

@stijn-uva This looks good now. You can enable the legacy links via a general global setting.

For the legacy path, we could attempt to extract the dataset key. E.g.:

    # Attempt to extract dataset key from query_file and validate it
    # the key of the dataset files belong to can be extracted from the
    # file name in a predictable way.
    possible_keys = re.findall(r"[abcdef0-9]{32}", query_file)
    if possible_keys:
        # if for whatever reason there are multiple hashes in the filename,
        # the key would always be the last one
        key = possible_keys.pop()
        try:
            dataset = DataSet(key=key, db=g.db, modules=g.modules)
            # If dataset is found, delegate to the main get_result function for proper access control and file serving
            return get_result(query_file=query_file, dataset_key=key, dataset=dataset)
        except DataSetException:
            # If dataset is not found, fall back to legacy behavior
            pass

We could do that before checking the setting and only check if no dataset is found. Lifted from the cleanup worker.

dale-wahl · 2026-03-04T15:05:10Z

@stijn-uva, alright, merged the archive and get_result route... mostly. @sal-uva does not have the actual the dataset's result file (just key and filename of the specific archived file). We need both the result zip filename and the file path inside the zip to extract it via get_result route (I did not like the idea of parsing paths like /dataset_results.zip/some/zipped/file but it is a possible alternative). Thus I kept the archive route to instantiate the DataSet and get it. We could refactor the new get_media_from_children function to provide the dataset's result archive filename as well (and also use the extract_file_from_archive to avoid iterating through the dataset files until .metadata.json) and then remove the archive route if really desired.

I also extract the zipped file and stream it as opposed to reading it into memory. Might not matter for most social media, but could cause issues with larger videos.

update get_result() route to ensure user ought to have access to data…

d033ae0

…set files

dale-wahl requested a review from stijn-uva July 4, 2025 12:07

better check on query file in results_folder

56ab476

dale-wahl added 7 commits February 25, 2026 11:01

Merge branch 'master' into dataset-key-for-result

8ed76e6

add config setting to use legacy links

16e1fdd

update dataset get_result_url

abf092e

update api_tool check_processor url

a681146

views_dataset: simpler check relative to, allow archive paths, fix do…

472cf5a

…uble prefix in archive path

Merge branch 'master' into dataset-key-for-result

57df985

move legacy links setting

87bcca9

dale-wahl added 5 commits March 4, 2026 11:27

Merge branch 'master' into dataset-key-for-result

eae2cd7

Update views_dataset.py

020602e

stream zipfiles instead of loading in memory

d6a98de

ruff caught me with my unused imports

5314bf2

get smart; let flask do it

8772962

dale-wahl marked this pull request as ready for review March 4, 2026 14:45

ruff you are so needy (also where did my auto check plugin go? 😂)

0ba6b02

dale-wahl added 2 commits March 4, 2026 16:19

get_staging_area: little extra to avoid race conditions

fc675cf

Merge branch 'master' into dataset-key-for-result

f5cfb2c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update get_result() route #510

update get_result() route #510
dale-wahl wants to merge 17 commits intomasterfrom
dataset-key-for-result

dale-wahl commented Jul 4, 2025

Uh oh!

stijn-uva commented Jul 16, 2025

Uh oh!

dale-wahl commented Feb 25, 2026

Uh oh!

dale-wahl commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dale-wahl commented Jul 4, 2025

Uh oh!

stijn-uva commented Jul 16, 2025

Uh oh!

dale-wahl commented Feb 25, 2026

Uh oh!

dale-wahl commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants