Skip to content

Have --restart pick up IDs from the output files#38

Open
katfang wants to merge 9 commits intoradiolarian:masterfrom
katfang:easier-restart
Open

Have --restart pick up IDs from the output files#38
katfang wants to merge 9 commits intoradiolarian:masterfrom
katfang:easier-restart

Conversation

@katfang
Copy link

@katfang katfang commented Sep 21, 2022

PROBLEM STATEMENT

Sometimes my internet cuts out and then my script quits. It's annoying to go dig out the next fic id, verify it, and slap it as a parameter into --restart. I just want to --resume

SOLUTION

These commits will skip fic ids that are in the output files (the csv fic file and the error file) IF

  1. you pass the --resume flag, no parameter
  2. you pass in an input for the fic ids. (I'm not sure why this was differentiated between if you pass fic IDs directly vs pass in a CSV, but sure, we can keep that.)

PLEASE NOTE: THIS PR REMOVES --restart ID. I had originally just used the same flag and gave it new functionality, but I decided to give it a new name.

CAVEATS

A tad controversial in that

  1. You can't decide you want to use different output file names when continuing your run. I have no idea how big your files get, but this basically means you can't decide to "manually batch" by quitting out and starting at some other fic id. (I'm downloading only metadata for 90k fic and decided that I'm cool with 90 MB file. I don't know how big if you're downloading Actual Fic.)
  2. if you want to retry your errored fics by passing in the error fic id list (without renaming it) as the input ... the script will do nothing on --resume. It will behave reasonably if you don't --resume. But also this is just straight up wonky anyways because you're using the same file as input and error output which is ... yeah, don't do that.
  3. [extra, extra pedantic as if the above wasn't already] if you want to re-download already processed fic (into the same file) starting from some point in the fic id list ... you can't. I don't know why you want to do this because you'd end up with two of the same fic in the same file, but I guess you also get an updated version.

NOTES

First, I recommend looking at the first commit by itself. I just really wanted the write path to be the same for whether you pass in a CSV list of ids or pass them all from the command line.

My best guess for this flag is that it's meant to kick the script when something has gone wrong in the middle and you Just Want More Fic From The List. If so, this is easier! Albeit a little more magic and therefore slightly less predictable.

If you can live with that, then this allows you to just run --resume without finding what the next fic id is! Hooray!

Hm, I could have it error out if you try to pass the error file as the input list.

Well, let me know if there are any bits I ought to revisit.

By the way, thanks for that fix in ao3_work_ids in case there is no file! Can't give you kudos for it, so, here I am. Added that check here, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant