Skip to content

Conversation

@lthurston
Copy link
Contributor

No description provided.

@lthurston lthurston force-pushed the xml branch 2 times, most recently from 4ee0b6f to 9130b12 Compare July 19, 2023 18:00
@lthurston
Copy link
Contributor Author

@amywieliczka, if you want to take a sneak peek at this XML fetcher, I invite you to do so. It works, and I fetched collection 26935 of the reported 77k records in about 20 seconds locally. It reported there are actually more than 110k records though, so there might be an issue there, or maybe there's actually more records.

I haven't written any mapping code yet, so I consider this to be a little naive, a little optimistic, but nevertheless it does what it's supposed to do. Let me know your thoughts!

@aturner
Copy link
Collaborator

aturner commented Jul 19, 2023

@lthurston I think our legacy harvester code has some logic built in to leave out "metadata only" records; the source collection has some items that don't have a digital image -- just metadata records only. That may account for the count difference that you're seeing

@lthurston
Copy link
Contributor Author

@aturner That makes sense, thanks for the explanation. My instinct is to leave those records in our imported files in order to stay as true to the original source data as possible (despite the fact that we have to rewrite it to paginate), but am only too happy to be overruled.

@lthurston lthurston changed the title [WIP] Implement xml_file fetcher Implement XML fetcher / PastPerfect mapper Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PastPerfectXMLMapper(Mapper) -- paused Fetcher: XML -- paused

4 participants