Add retries when fetching error/overloaded #110

kwei-zhang · 2025-03-21T21:28:20Z

resolved issue by adding retries when 429 or 500+ status returned

benoit74 · 2025-03-22T18:02:46Z

how did you tested this is really solving the issue?
doesn't this somehow conflict with the _get_retry_adapter which is already configured?
I don't think that retrying on all error codes is a good idea, many errors are not retryable at all (e.g. 404) ; our problem is with 429, so I would rather retry only on 429 for now

kwei-zhang · 2025-03-22T21:43:52Z

@kwei-zhang

how did you tested this is really solving the issue?

doesn't this somehow conflict with the _get_retry_adapter which is already configured?

I don't think that retrying on all error codes is a good idea, many errors are not retryable at all (e.g. 404) ; our problem is with 429, so I would rather retry only on 429 for now

Hi thanks for the feedback!

This is the error report before adding retries

ifixit2zim --output=output --language="zh"
...
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://zh.ifixit.com/Guide
[MainThread::2025-03-22 14:37:10,548] WARNING:Not supposed to add a redirect for a home item
[MainThread::2025-03-22 14:37:10,549] ERROR:Interrupting process due to error
Traceback (most recent call last):
  File "/Users/kaiwei/ZJN/code/kiwix/ifixit/src/ifixit2zim/scraper.py", line 362, in run
    scraper.scrape_items()
  File "/Users/kaiwei/ZJN/code/kiwix/ifixit/src/ifixit2zim/scraper_generic.py", line 173, in scrape_items
    raise FinalScrapingFailureError(
ifixit2zim.exceptions.FinalScrapingFailureError: Too many homes failed to be processed: 1
[MainThread::2025-03-22 14:37:10,550] INFO:Cleaning up

This error only occurs during fetching the list of guides https://zh.ifixit.com/api/2.0/guides?limit=200&offset=xx. After adding retries I runs for around 5 min after finish the fetching and it works perfectly fine.

Sorry about that, I didn't notice it. I was following the design of another function

    @backoff.on_exception(
        backoff.expo,
        requests.exceptions.RequestException,
        max_time=16,
        on_backoff=backoff_hdlr,
    )
    def get_api_content(self, path, **params):
        ...

Besides that, I think there might be some issue with _get_retry_adapter function. The function looks like:

def _get_retry_adapter(
    max_retries: Optional[int] = 5,
) -> requests.adapters.BaseAdapter:  # pyright: ignore
    retries = requests.packages.urllib3.util.retry.Retry(  # pyright: ignore
        total=max_retries,  # total number of retries
        connect=max_retries,  # connection errors
        read=max_retries,  # read errors
        status=2,  # failure HTTP status (only those bellow)
        redirect=False,  # don't fail on redirections
        backoff_factor=30,  # sleep factor between retries
        status_forcelist=[
            413,
            429,
            500,
            502,
            503,
            504,
        ],  # force retry on the following codes
    )

    return requests.adapters.HTTPAdapter(max_retries=retries)  # pyright: ignore

but in urllib3 documentation it seems the backoff_factor is in seconds and will cause the program to sleep for (backoff_factor * (2 ** retry time)) which means our program will sleep for [30s, 60s, 120s ...]. I wonder if this is a little bit too long?

It should now fixed 😄

benoit74 · 2025-03-24T08:29:59Z

which means our program will sleep for [30s, 60s, 120s ...]. I wonder if this is a little bit too long?

This is in fact intentional. When a server sends a 429 error or timeout on connection, retrying only few milliseconds or seconds later proved to not work at all. We need to let the upstream server "breath". A full ifixit scraper run typically takes hours, so if on some occasion we need to wait 30s because the server was overwhelmed, it is fine.

I will review your changes

benoit74

Few more changes, thank you

src/ifixit2zim/utils.py

CHANGELOG.md

benoit74 · 2025-03-24T08:35:03Z

Also please fix your first comment so that PR is properly linked to issue: https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue

benoit74

LGTM, please:

squash all your commits into one now that review is over
fix your first comment so that PR is properly linked to issue: https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue

I will merge it right after

kwei-zhang · 2025-03-25T14:02:10Z

Thanks for the review! I have changed the PR message, can you please confirm if it is linked properly?

benoit74

LGTM

kwei-zhang marked this pull request as draft March 21, 2025 23:14

kwei-zhang marked this pull request as ready for review March 21, 2025 23:14

benoit74 requested changes Mar 24, 2025

View reviewed changes

src/ifixit2zim/utils.py Show resolved Hide resolved

src/ifixit2zim/utils.py Outdated Show resolved Hide resolved

src/ifixit2zim/utils.py Outdated Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

benoit74 self-requested a review March 25, 2025 08:42

benoit74 requested changes Mar 25, 2025

View reviewed changes

Add retries to fetch

75d0e84

kwei-zhang force-pushed the main branch from 3641a66 to 75d0e84 Compare March 25, 2025 13:53

benoit74 approved these changes Mar 27, 2025

View reviewed changes

benoit74 assigned kwei-zhang Mar 27, 2025

benoit74 merged commit ec4cb27 into openzim:main Mar 27, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add retries when fetching error/overloaded #110

Add retries when fetching error/overloaded #110

Uh oh!

kwei-zhang commented Mar 21, 2025 •

edited

Loading

Uh oh!

benoit74 commented Mar 22, 2025

Uh oh!

kwei-zhang commented Mar 22, 2025

Uh oh!

benoit74 commented Mar 24, 2025

Uh oh!

benoit74 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

benoit74 commented Mar 24, 2025

Uh oh!

benoit74 left a comment

Uh oh!

kwei-zhang commented Mar 25, 2025

Uh oh!

benoit74 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add retries when fetching error/overloaded #110

Add retries when fetching error/overloaded #110

Uh oh!

Conversation

kwei-zhang commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benoit74 commented Mar 22, 2025

Uh oh!

kwei-zhang commented Mar 22, 2025

Uh oh!

benoit74 commented Mar 24, 2025

Uh oh!

benoit74 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

benoit74 commented Mar 24, 2025

Uh oh!

benoit74 left a comment

Choose a reason for hiding this comment

Uh oh!

kwei-zhang commented Mar 25, 2025

Uh oh!

benoit74 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kwei-zhang commented Mar 21, 2025 •

edited

Loading