KeyError for 'content-location' and 'link' when trying to save non-HTML

I'm trying to see if I can integrate `savepagenow` into my election night scraping system. The idea would be to save online results files into the Wayback Machine when my system detects the results have changed.

Most of the URLs I want to save are CSVs, JSON, or XML files. However, I am often finding that when I try to use `savepagenow` to save them, I get  error tracebacks like these:

```
Traceback (most recent call last):
  File "/Users/xxxx/.virtualenvs/yyyyy/zzzzzzz/savepagenow/api.py", line 99, in capture
    content_location = response.headers["Content-Location"]
  File "/Users/xxxx/.virtualenvs/yyyyy/zzzzzzz/requests/structures.py", line 52, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'content-location'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/xxxx/.virtualenvs/yyyyy/zzzzzzz/savepagenow/api.py", line 105, in capture
    header_links = parse_header_links(response.headers["Link"])
  File "/Users/xxxx/.virtualenvs/yyyyy/zzzzzzz/requests/structures.py", line 52, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'link'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "scrape-results.py", line 533, in <module>
    scrape(wks, headers, counties)
  File "scrape-results.py", line 375, in scrape
    archive_url, captured_flag = savepagenow.capture_or_cache(url, authenticate=True, user_agent="savepagenow (https://stltoday.com)")
  File "/Users/xxxx/.virtualenvs/yyyyy/zzzzzzz/savepagenow/api.py", line 148, in capture_or_cache
    capture(
  File "/Users/xxxx/.virtualenvs/yyyyy/zzzzzzz/savepagenow/api.py", line 110, in capture
    raise WaybackRuntimeError(
savepagenow.exceptions.WaybackRuntimeError: {'status_code': 200, 'headers': {'Server': 'nginx', 'Date': 'Wed, 31 Jul 2024 17:00:50 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'server-timing': 'captures_list;dur=0.433099, exclusion.robots;dur=0.030207, exclusion.robots.policy;dur=0.022052, esindex;dur=0.008905, cdx.remote;dur=7.628953, LoadShardBlock;dur=296.977015, PetaboxLoader3.datanode;dur=255.918508, load_resource;dur=8.860866, MISS', 'x-app-server': 'wwwb-app204', 'x-ts': '200', 'x-tr': '367', 'X-location': 'All', 'X-RL': '1', 'X-NA': '0', 'X-Page-Cache': 'MISS', 'X-NID': '-', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Permissions-Policy': 'interest-cohort=()', 'Content-Encoding': 'gzip'}}
```

It's very odd. Occasionally the requests work, but most times they error out with this same sequence. You may be able to reproduce with any/all three of these command-line examples:

```
savepagenow https://www.livevoterturnout.com/ENR/stcharlesmoenr/28/summary_28.xml
```
```
savepagenow https://extcontent.stlouisco.com/BOE/eResults/media/media.csv
```
```
savepagenow https://travisenr.blob.core.usgovcloudapi.net/prod/Current_02.json
```

Anyway, is this just me? Am I doing something wrong?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError for 'content-location' and 'link' when trying to save non-HTML #65

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

KeyError for 'content-location' and 'link' when trying to save non-HTML #65

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions