FALocalRepo Plugins #1
Replies: 2 comments
-
|
I have a proof-of-concept Weasyl backend (mostly) implemented and functioning, using forks: Enabling an alternate backend is currently done by setting an extra environment variable: falocalrepo-server appears to be working, except that thumbnails aren't being displayed. I'm not sure why this is: the thumbnails are being downloaded properly and the database contains the correct path to them. Some fields in the db or in the objects returned by FAAPI contain a default value, typically an empty string. Usually this is because the field corresponds to something that exists in FA but not Weasyl (such as the dedicated "Species" field in FA entries.) In other cases, a field on Partial objects may be blank because that information isn't actually scrapable from Weasyl results pages (such as the "rating" field on SubmissionPartial, which isn't displayed on favorites pages.) Some notes:
Submission: user_title, category, species, gender, footer, mentions, favorite_toggle_link
Also, comments are not currently fetched because I used Weasyl's HTTP API when possible, which doesn't return comments. We could fetch comments by scraping the web pages instead. Lastly, I played around with the CLI and the local web server for a bit, and (barring the caveats above) things appear to be working. But this isn't battle hardened at all, so I wouldn't go using this for any data you aren't willing to lose. |
Beta Was this translation helpful? Give feedback.
-
|
So, after an order of magnitude more labor than I was anticipating, (@MatteoCampinoti94 warned me but I didn’t listen), I have a proof of concept backend for SoFurry that passes my tests (although there’s no doubt corner cases that my tests don’t cover.) SoFurry turned out to have several uniquely aggravating properties that didn’t jive with FALocalRepo’s current model and were non-trivial to accommodate. Hopefully this means that accommodating those same properties in future backends will be much easier: I actually expect that this was the most difficult backend to implement, and everything after this is going to be much easier. (I am also fully prepared to be wrong) The things that made SoFurry uniquely difficult:
In FA and most similar websites, galleries and favorites are displayed as a single paginated result set. In SF, galleries and favorites are separated based on submission type: there’s one endpoint for a user’s drawings, one endpoint for a user’s stories, etc. To complicate things even further, if a submission is part of a “folder” (akin to a pool or multi-file submission), that submission will not show up on the user’s gallery page. Instead, the user’s gallery page contains a list of all “folders”, and then a list of all submissions not in a folder. (Thankfully, a submission can only be in one folder) To solve both of these issues, I created the notion of sub-pages. FAAPI::gallery now returns a tuple containing:
A page is an opaque value that can be passed to a subsequent call to FAAPI::gallery. The difference between “next” pages and “sub” pages is that “next” pages are guaranteed to be older than the current page and don’t need to be visited if the current page contains submissions already in the database. No such assumption is made for “sub” pages. Implementing this required changes to my fork of FALocalRepo in addition to FAAPI.
FAAPI is currently hardcoded to expect a single URL root per backend, and all calls to the server use this backend. However, SoFurry uses a unique subdomain per user “{user}.sofurry.com”, as well as a common subdomain “www.sofurry.com”. I adjusted the logic for To make matters worse, some endpoints allow either subdomain, but have a different path or different parameters depending on the subdomain. Other endpoints enforce one or the other, and predicting which endpoints have which behavior was an exercise in frustration. Ideally I wanted to use the user-specific subdomain whenever possible, because it would allow me to construct the call without having to know a user’s unique numerical ID. Fortunately, I was able to find an endpoint with the user-specific subdomain for every query we care about (some of these are not actually used by the site but I was able to predict their path based on other endpoints). Follow-up investigation: What happens when a user changes their username? Will this break?Other follow-up investigation: What happens if a different backend does require a user-id that is different from the username in order to make queries? We may have to store such an ID in the database.However, in order to get subsequent pages of multi-page results, I had to identify the “next” button and scrape its url. The “page” variable for requests with multi-page results is a tuple of the submission type and this url, allowing it to work regardless of the subdomain used. This is potentially an attack vector depending on our threat model, since it means that FAAPI is resolving a url that was scraped from an untrusted page. I’m not personally concerned about this, but it is something to note. Other changes that I’ve done:FALocalRepo now checks that the backend implementation implements FAAPI_ABC nominally (no relying on duck typing here). The idea is that this forces third party backends to depend on FAAPI, and thus forces the backend to specify which version of FAAPI it depends on. If the backend is outdated, we’ll now get an error at install time when poetry can’t find a version of FAAPI that satisfies all dependencies, instead of things breaking at runtime. Some of the tests in test_faapi.py are testing certain assumptions that are FA-centric (For instance, that the “next_page” response is always an int for galleries and always a string for favorites). These checks have been removed. What hasn’t been done:The way I’m testing these backends is kind of a hack and hasn’t been fully pushed to GitHub. Cloning the repo and trying to run tests likely won’t work. I need to come up with something better for this. I didn’t implement FAAPI::frontpage for SoFurry because FALocalRepo doesn’t use it. SoFurry allows you to favorite journals as well as submissions. This data is not scraped. Like with my Weasyl implementation, certain elements of FAAPI’s data model get populated with dummy values because they don’t make sense for SoFurry or the data isn’t available on the scraped pages. These include:
Surprisingly, the rating of a submission is not displayed on the submission page itself, only on pages that include the submission in a list.
What happens next:I think that the Weasyl and SoFurry backends combined start to paint a picture of what a good backend interface requires. In particular, some of the methods that are currently in the abstract base class probably don’t need to be (like user_agent and crawl_delay). These were included because I originally wanted it to be possible to implement the interface without having FAAPI as a dependency, but given that I’ve changed my mind on that, I’m not sure it makes sense to require backends to implement this behavior themselves. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Continuation of discussion in FALocalRepo issue #3.
Beta Was this translation helpful? Give feedback.
All reactions