About how to download raw PDF files using 2012_manifest.tsv

I didn't understand the specific download process, and my company does not allow direct downloads from cloud storage. However, based on the `dc_slug` in the TSV file, I have a general idea of how to find the original PDF URL.

For example, for a `dc_slug` like `456300-sept-17-23-2012-11953-13474707086771-_-pdf`, I can use the `split` function to split at the first hyphen and then construct the URL as follows:
```python
url = f'https://s3.amazonaws.com/s3.documentcloud.org/documents/{456300}/{sept-17-23-2012-11953-13474707086771-_-pdf}.pdf'
```

This way, I can directly access the PDF!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About how to download raw PDF files using 2012_manifest.tsv #111

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About how to download raw PDF files using 2012_manifest.tsv #111

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions