-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
- Sonarr/Prowlarr currently see every AniBridge result as
0-days old because we stamp Torznab items withdatetime.now()when the response is rendered. - Parse the “Veröffentlicht bei uns” timestamp published below each AniWorld episode player, persist it in our cache, and emit it as the RSS
pubDate(and qBittorrentadded_on) so downstream automation can prioritise fresh releases correctly.
Context & Findings
-
Torznab feed hard-codes
now:_build_itemis always called withpubdate=now(app/api/torznab/api.py:85-175,285-400), so Sonarr/Prowlarr interpret every hit as brand new. There is no attempt to read AniWorld’s release metadata. -
AniWorld UI exposes release time: Episode pages render a banner under the player such as:
<div style="text-align: center; color: white; font-size: 14px; padding: 12px 0; ..."> Veröffentlicht bei uns: <strong>Freitag, 29.08.2025 18:46</strong> Uhr </div>
Example: https://aniworld.to/anime/stream/one-punch-man/staffel-3/episode-1
-
Library already fetches the HTML:
aniworld.models.Episodedownloads and caches the episode page when we callget_direct_link(.venv/lib/python3.13/site-packages/aniworld/models.py:675-790), so the markup is available without an extra HTTP request if we hook in before the response is discarded. -
No field to store the timestamp: Our availability cache (
EpisodeAvailability.extra,app/db/models.py:83-106) is empty today; we could reuse it to persistrelease_atper slug/season/episode/language. Client tasks (app/db/models.py:107-128) defaultadded_ontoutcnow(), which also leads to zero-age torrents in the qBittorrent shim. -
Operational risk: AniWorld’s ToS bans automated scraping. We should minimise redundant requests, respect rate limits, and document the legal risk surfaced in
LEGAL.md§5. Potential mitigation: piggyback on existing downloads instead of issuing an extra GET solely for the release date.
Proposed Changes
- Parse & normalise release timestamp
- Extend our downloader/availability probe pipeline to extract the “Veröffentlicht bei uns” text from the episode HTML (likely a simple BeautifulSoup selector on the cached
Episode.html). - Convert the (German) date string to a timezone-aware
datetime(they include day name + local time; we may need locale-aware parsing or a manual map of month names).
- Extend our downloader/availability probe pipeline to extract the “Veröffentlicht bei uns” text from the episode HTML (likely a simple BeautifulSoup selector on the cached
- Persist per-episode metadata
- Store the parsed timestamp in
EpisodeAvailability.extra(e.g.,{ "release_at": iso8601 }) when we probe availability, and surface it through helper functions so both Torznab and qBittorrent layers can reuse it without re-fetching the page. - Add defensive logic when the banner is missing or malformed (fallback to current behaviour).
- Store the parsed timestamp in
- Emit age-aware feed data
- Update Torznab
_build_itemcallers to passpubdate=release_atwhen available; keepnowas the fallback. - Mirror the value in the fake torrent payloads (
ClientTask.added_on, qBittorrentadded_onfield) so Sonarr’s activity view matches the RSS age.
- Update Torznab
- Cache invalidation & data refresh
- Ensure the cached timestamp respects our availability TTL (re-parse when the episode page is re-fetched) and optionally expose it via API/debug logs for observability.
- Documentation & guardrails
- Document the new scraping behaviour, reference AniWorld’s ToS warning, and mention rate-limit/backoff expectations in the troubleshooting section. Consider adding a feature flag to disable the scraping if operators prefer the current synthetic age.
Testing Ideas
- Unit tests for the HTML parser that feed captured AniWorld snippets (with and without the banner) and verify we obtain the correct UTC datetime.
- Integration tests that run
probe_episode_qualitywith a mocked Episode HTML and confirmEpisodeAvailability.extra["release_at"]is populated and reused by Torznab. - Torznab functional test asserting that
pubDatereflects the captured timestamp and that the resulting RSS age matches the expected value. - qBittorrent sync test ensuring
added_onis set to the release timestamp when available, and that the previous behaviour remains intact if parsing fails.
Open Questions
- AniWorld sometimes removes older episodes or adjusts timestamps—do we need to validate the stored value each time we refresh availability, or assume it is immutable after the first scrape?
- How should we handle timezone/localisation? The banner appears to use Central European time; should we treat it as Europe/Berlin and convert to UTC, or surface it as-is?
- Is the release date shown per language/provider or global? If language-specific, do we need to track separate timestamps per language variant?
- Should we apply the same mechanism to other catalogues (e.g., future s.to support (see Add s.to catalogue support alongside AniWorld #6 )) to keep behaviour consistent across sources?
- Do we need rate limiting/backoff logic around the initial HTML fetch to avoid triggering anti-bot systems, or is the existing library behaviour sufficient?