Skip to content

Conversation

@amywieliczka
Copy link
Collaborator

@amywieliczka amywieliczka commented Feb 2, 2024

@christinklez this should address the Quartex issue we were just talking about in standup

@amywieliczka
Copy link
Collaborator Author

@barbarahui this is a bit of a duplicate of your empty metadata page handling. I'm going to let that land first since it's a bit more encompassing and then update this after. OaiFetcher.check_page could return 0 instead of raising an error.

@amywieliczka
Copy link
Collaborator Author

Hey @barbarahui did you ever take a look at this? I think there's overlap in the work you did re: empty pages?

@barbarahui
Copy link
Collaborator

@amywieliczka This is compatible with the work I did re. empty pages. The change I made to the Fetcher in that PR was to write the vernacular page to S3 if we were able to fetch one, regardless of the number of records that check_page() finds. I also added a warning to the log if the number of records is 0, where previously we had neither a warning nor an error.

Since in this PR you raise an error in the oai_fetcher rather than Fetcher, this works just fine with what's in main for now. We should still go back at some point and do a holistic review of our approach and be consistent about how we handle pages without metadata (both at the fetching and mapping stage).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants