Skip to content

Fix AioStreams metadata disambiguation using watch history and year#47

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/fix-wrong-metadata-issue
Draft

Fix AioStreams metadata disambiguation using watch history and year#47
Copilot wants to merge 3 commits intomainfrom
copilot/fix-wrong-metadata-issue

Conversation

Copy link
Contributor

Copilot AI commented Feb 2, 2026

AioStreams provides only filenames without media IDs, causing ambiguous metadata matches when multiple shows share titles (e.g., "Fallout" TV show vs "Fallout" anime).

Changes

Series continuity check - Query user's watch history to disambiguate TV show candidates:

  • When watching S02E08, check if user previously watched S02E07 of a candidate show
  • Match candidates by IMDb/TMDB/TVDB IDs from watch history
  • Validate episode continuity (next episode, later season, or rewatch within ±3 episodes)
  • Skip for first-time watches, fall back to standard matching

Year extraction - Verify existing PTT year parsing flows correctly:

  • Year from filename prioritized in search queries: ["Fallout 2024", "Fallout"]
  • Candidate selection prefers year matches

Impact

# Before: Ambiguous title match
# "Fallout S02E08" → matches first candidate (often wrong)

# After: Context-aware selection
# User watched Fallout S02E07 (IMDb: tt12345678)
# "Fallout S02E08" → prefers candidate with tt12345678

Implementation

  • Made _select_candidate_for_entry() async, added db/user_id params
  • Added _check_series_continuity() with SQLAlchemy query for watch history
  • Added null checks for max_season/max_episode in continuity logic
  • Updated tests for async signature, added continuity test case
Original prompt

This section details on the original issue you should resolve

<issue_title>[Bug] Wrong Metadata with AioStreams</issue_title>
<issue_description>AIOStreams only provides the filename but no ID. Occasionally it happens that I watch something (e.g. Fallout) but it picks up the wrong metadata (an Anime called fallout).

We need to make this more robust. If in the file name a year is present, use this for search.
Also for series do a "sanity check": if I previously watched Fallout (IMDb Id 102) S02E07 and now S02E08, it's more likely it's the same show and not a different "Fallout"</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits February 2, 2026 05:54
…metadata matching

Co-authored-by: willtho89 <4933503+willtho89@users.noreply.github.com>
Co-authored-by: willtho89 <4933503+willtho89@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix wrong metadata handling in AIOStreams Fix AioStreams metadata disambiguation using watch history and year Feb 2, 2026
Copilot AI requested a review from willtho89 February 2, 2026 05:59
@willtho89 willtho89 requested review from Copilot and removed request for willtho89 February 7, 2026 14:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves AioStreams metadata selection for ambiguous TV titles by using a user’s watch history (episode continuity + IDs) and ensuring year-in-filename influences search and candidate choice.

Changes:

  • Made _select_candidate_for_entry() async and added db/user_id so selection can consult watch history.
  • Added _check_series_continuity() to prefer candidates that match previously watched episodes (ID-based).
  • Updated tests to call the new async selector and added a continuity test case.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
backend/src/librarysync/jobs/aiostreams_import.py Adds async candidate selection and DB-backed continuity logic to disambiguate same-title TV shows.
backend/tests/test_aiostreams_import.py Updates tests for async signature and adds coverage for continuity-based selection.

Comment on lines +503 to +519
# First, find all TV shows the user has watched
# Use a subquery to get the max season and episode per show
# Note: This may return max_season from one episode and max_episode from another,
# but it's acceptable for our continuity check as we're looking for a general pattern
result = await db.execute(
select(
MediaItem,
func.max(EpisodeItem.season_number).label("max_season"),
func.max(EpisodeItem.episode_number).label("max_episode"),
)
.join(EpisodeItem, EpisodeItem.show_media_item_id == MediaItem.id)
.join(WatchedItem, and_(
WatchedItem.episode_item_id == EpisodeItem.id,
WatchedItem.user_id == user_id
))
.where(MediaItem.media_type == "tv")
.group_by(MediaItem.id)
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computing max_season and max_episode independently can produce an impossible (season, episode) pair (e.g., max season from one row and max episode from another). This can incorrectly reject true continuations (e.g., user watched S03E01 but max_episode comes from S01E22, causing the nearby/continuation checks to fail). Consider selecting the latest watched episode per show using a single ordering over (season_number DESC, episode_number DESC) (e.g., a window function / DISTINCT ON in Postgres), or a two-step aggregate (max season, then max episode constrained to that season).

Suggested change
# First, find all TV shows the user has watched
# Use a subquery to get the max season and episode per show
# Note: This may return max_season from one episode and max_episode from another,
# but it's acceptable for our continuity check as we're looking for a general pattern
result = await db.execute(
select(
MediaItem,
func.max(EpisodeItem.season_number).label("max_season"),
func.max(EpisodeItem.episode_number).label("max_episode"),
)
.join(EpisodeItem, EpisodeItem.show_media_item_id == MediaItem.id)
.join(WatchedItem, and_(
WatchedItem.episode_item_id == EpisodeItem.id,
WatchedItem.user_id == user_id
))
.where(MediaItem.media_type == "tv")
.group_by(MediaItem.id)
# First, find the latest watched episode per show for this user, in a way that
# guarantees (max_season, max_episode) comes from a real episode row.
#
# Step 1: find the max season per show that the user has watched.
max_season_per_show_subq = (
select(
EpisodeItem.show_media_item_id.label("show_id"),
func.max(EpisodeItem.season_number).label("max_season"),
)
.join(
WatchedItem,
and_(
WatchedItem.episode_item_id == EpisodeItem.id,
WatchedItem.user_id == user_id,
),
)
.group_by(EpisodeItem.show_media_item_id)
.subquery()
)
# Step 2: within that max season, find the max episode per show.
max_episode_in_max_season_subq = (
select(
EpisodeItem.show_media_item_id.label("show_id"),
func.max(EpisodeItem.episode_number).label("max_episode"),
)
.join(
max_season_per_show_subq,
and_(
max_season_per_show_subq.c.show_id == EpisodeItem.show_media_item_id,
max_season_per_show_subq.c.max_season == EpisodeItem.season_number,
),
)
.group_by(EpisodeItem.show_media_item_id)
.subquery()
)
# Step 3: join the aggregates back to MediaItem to get per-show latest season/episode.
result = await db.execute(
select(
MediaItem,
max_season_per_show_subq.c.max_season.label("max_season"),
max_episode_in_max_season_subq.c.max_episode.label("max_episode"),
)
.join(
max_season_per_show_subq,
max_season_per_show_subq.c.show_id == MediaItem.id,
)
.join(
max_episode_in_max_season_subq,
max_episode_in_max_season_subq.c.show_id == MediaItem.id,
)
.where(MediaItem.media_type == "tv")

Copilot uses AI. Check for mistakes.
Comment on lines +507 to +520
result = await db.execute(
select(
MediaItem,
func.max(EpisodeItem.season_number).label("max_season"),
func.max(EpisodeItem.episode_number).label("max_episode"),
)
.join(EpisodeItem, EpisodeItem.show_media_item_id == MediaItem.id)
.join(WatchedItem, and_(
WatchedItem.episode_item_id == EpisodeItem.id,
WatchedItem.user_id == user_id
))
.where(MediaItem.media_type == "tv")
.group_by(MediaItem.id)
)
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite computing title_key, the query fetches all TV shows a user has watched and only later filters by title in Python. For users with large histories this can be unnecessarily expensive. Consider pushing at least a coarse title filter into SQL (e.g., ilike / lower(title) equality on entry.title) so you only aggregate candidate-relevant shows, then keep the stricter _normalize_title_key check in Python if needed.

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +74
selected = asyncio.run(
aiostreams_import._select_candidate_for_entry(db, "test_user", entry, candidates)
)
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using asyncio.run() inside unittest.TestCase can break if tests are executed under an environment that already has a running event loop. Prefer unittest.IsolatedAsyncioTestCase with async def test_... methods (and await ...) for these async tests.

Copilot uses AI. Check for mistakes.
# 1. Are TV shows
# 2. Have a matching normalized title
# 3. Have episode watches by this user
from sqlalchemy import and_, func
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor cleanup: importing and_/func inside the function makes dependencies less discoverable; prefer module-level imports for consistency. Also, the if entry.season_number is not None: check is redundant because the function already returns early when season_number is None.

Copilot uses AI. Check for mistakes.
Comment on lines +536 to +537
if entry.season_number is not None:
is_continuation = (
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor cleanup: importing and_/func inside the function makes dependencies less discoverable; prefer module-level imports for consistency. Also, the if entry.season_number is not None: check is redundant because the function already returns early when season_number is None.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Wrong Metadata with AioStreams

2 participants