WIP: Support wpull as more effective alternative to wget#84
Draft
davidfraser wants to merge 6 commits intoSimonMo88:masterfrom
Draft
WIP: Support wpull as more effective alternative to wget#84davidfraser wants to merge 6 commits intoSimonMo88:masterfrom
davidfraser wants to merge 6 commits intoSimonMo88:masterfrom
Conversation
(cherry picked from commit bbaec63)
(cherry picked from commit 0ca5222)
but it's much better to give it one invocation with multiple URLs this should also be better at different themes as it finds things itself (cherry picked from commit 0a50c3f)
(cherry picked from commit cbeac8f)
(cherry picked from commit f0ded03)
…variables (cherry picked from commit 2da6ea6)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a work-in-progress effort, which I'd appreciate feedback on.
In my testing of ghost-static-site-generator on an out-of-the-box ghost setup, I found that certain URLs weren't retrieved because they were referred to using the production domain name (notably
/about/)This is actually tricky to correct using wget without some complicated work to first identify the URLs for another domain and then retrieve them as well
My solution here is to use wpull as an alternative to wget. This is a Python reimplementation for wget focused on mirroring,
which supports plugins to adjust the process. It may not be for everybody, but it certainly proved to be quite effective for this use case.
Note: The original wpull is not maintained any more, so I've been using the grab-site fork which is in active development and has a recent release 5.0.3. The original docs are generally still applicable.
In order to install wpull, I have the following commands in my docker file, which enables building the package:
It would obviously be more ideal if a package were directly available
In testing, I found that this is a little slower than wget (2 seconds instead of less than 1 to retrieve a site), but retrieves the site more accurately - with less parameters it finds more of the valid files, and there's less need to add things to the list of items to retrieve.