Skip to content

[FEATURE] Derive real release ages for synthetic torrents #7

@Zzackllack

Description

@Zzackllack

Summary

  • Sonarr/Prowlarr currently see every AniBridge result as 0-days old because we stamp Torznab items with datetime.now() when the response is rendered.
  • Parse the “Veröffentlicht bei uns” timestamp published below each AniWorld episode player, persist it in our cache, and emit it as the RSS pubDate (and qBittorrent added_on) so downstream automation can prioritise fresh releases correctly.

Context & Findings

  • Torznab feed hard-codes now: _build_item is always called with pubdate=now (app/api/torznab/api.py:85-175, 285-400), so Sonarr/Prowlarr interpret every hit as brand new. There is no attempt to read AniWorld’s release metadata.

  • AniWorld UI exposes release time: Episode pages render a banner under the player such as:

    <div style="text-align: center; color: white; font-size: 14px; padding: 12px 0; ...">
      Veröffentlicht bei uns: <strong>Freitag, 29.08.2025 18:46</strong> Uhr
    </div>

    Example: https://aniworld.to/anime/stream/one-punch-man/staffel-3/episode-1

  • Library already fetches the HTML: aniworld.models.Episode downloads and caches the episode page when we call get_direct_link (.venv/lib/python3.13/site-packages/aniworld/models.py:675-790), so the markup is available without an extra HTTP request if we hook in before the response is discarded.

  • No field to store the timestamp: Our availability cache (EpisodeAvailability.extra, app/db/models.py:83-106) is empty today; we could reuse it to persist release_at per slug/season/episode/language. Client tasks (app/db/models.py:107-128) default added_on to utcnow(), which also leads to zero-age torrents in the qBittorrent shim.

  • Operational risk: AniWorld’s ToS bans automated scraping. We should minimise redundant requests, respect rate limits, and document the legal risk surfaced in LEGAL.md §5. Potential mitigation: piggyback on existing downloads instead of issuing an extra GET solely for the release date.

Proposed Changes

  1. Parse & normalise release timestamp
    • Extend our downloader/availability probe pipeline to extract the “Veröffentlicht bei uns” text from the episode HTML (likely a simple BeautifulSoup selector on the cached Episode.html).
    • Convert the (German) date string to a timezone-aware datetime (they include day name + local time; we may need locale-aware parsing or a manual map of month names).
  2. Persist per-episode metadata
    • Store the parsed timestamp in EpisodeAvailability.extra (e.g., { "release_at": iso8601 }) when we probe availability, and surface it through helper functions so both Torznab and qBittorrent layers can reuse it without re-fetching the page.
    • Add defensive logic when the banner is missing or malformed (fallback to current behaviour).
  3. Emit age-aware feed data
    • Update Torznab _build_item callers to pass pubdate=release_at when available; keep now as the fallback.
    • Mirror the value in the fake torrent payloads (ClientTask.added_on, qBittorrent added_on field) so Sonarr’s activity view matches the RSS age.
  4. Cache invalidation & data refresh
    • Ensure the cached timestamp respects our availability TTL (re-parse when the episode page is re-fetched) and optionally expose it via API/debug logs for observability.
  5. Documentation & guardrails
    • Document the new scraping behaviour, reference AniWorld’s ToS warning, and mention rate-limit/backoff expectations in the troubleshooting section. Consider adding a feature flag to disable the scraping if operators prefer the current synthetic age.

Testing Ideas

  • Unit tests for the HTML parser that feed captured AniWorld snippets (with and without the banner) and verify we obtain the correct UTC datetime.
  • Integration tests that run probe_episode_quality with a mocked Episode HTML and confirm EpisodeAvailability.extra["release_at"] is populated and reused by Torznab.
  • Torznab functional test asserting that pubDate reflects the captured timestamp and that the resulting RSS age matches the expected value.
  • qBittorrent sync test ensuring added_on is set to the release timestamp when available, and that the previous behaviour remains intact if parsing fails.

Open Questions

  • AniWorld sometimes removes older episodes or adjusts timestamps—do we need to validate the stored value each time we refresh availability, or assume it is immutable after the first scrape?
  • How should we handle timezone/localisation? The banner appears to use Central European time; should we treat it as Europe/Berlin and convert to UTC, or surface it as-is?
  • Is the release date shown per language/provider or global? If language-specific, do we need to track separate timestamps per language variant?
  • Should we apply the same mechanism to other catalogues (e.g., future s.to support (see Add s.to catalogue support alongside AniWorld #6 )) to keep behaviour consistent across sources?
  • Do we need rate limiting/backoff logic around the initial HTML fetch to avoid triggering anti-bot systems, or is the existing library behaviour sufficient?

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationenhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions