Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/skills/add-disc-fixture/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,11 @@ If counts don't match → proceed to step 4 (debug).
See [debugging guide](./references/debug-analysis.md) for systematic
investigation of episode/special count mismatches.

**Important**: When fixing mismatches, prefer structural signals over
numeric thresholds. Study chapter durations, IG menu structure, and
navigation data across multiple fixtures — don't just look at the one
that broke. The debugging guide's "How to Fix" section has details.

### 5. Extract ICS Menu Data

Find the menu clip (usually a short m2ts with IG streams, often clip 00003
Expand Down
41 changes: 41 additions & 0 deletions .github/skills/add-disc-fixture/references/debug-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,3 +121,44 @@ for h in sorted(ig_raw, key=lambda x: (x.page_id, x.button_id)):
- **Multi-feature playlists** with register-based chapter selection
(SET reg2 before JumpTitle) are supported, but only when `imm_op2=True`
(immediate value). Register-indirect chapter indices are not resolved.

## How to Fix Mismatches — Structural Signals, Not Thresholds

When analysis returns wrong counts, resist the urge to add a numeric
threshold or ratio guard that fixes the immediate disc. Thresholds are
"just happens to work" — they break on the next disc.

### The right process

1. **Dump data across fixtures** — compare the failing disc against
fixtures that work. Key data to examine:
- Chapter durations (look for repeating OP/body/ED cycles)
- IG menu buttons per page (episode pages ~5 buttons, scene grids ~10)
- IG chapter marks (JT + reg2 patterns)
- Segment labels, play item structure, title counts

2. **Find a structural signal** — something the disc data says about
itself. Ask: "What makes the working discs structurally different
from the failing disc?"

3. **Require positive evidence** — the code should ask "does the data
say this IS an episode compilation?" not "does the data say this is
NOT a movie?". Positive detection produces zero false positives
when the signal is absent.

4. **Validate across ALL fixtures** — run the new logic against every
fixture, not just the one that broke.

### Anti-patterns to avoid

- `if count <= N: return []` — arbitrary threshold, will break
- `if ratio > X: return []` — same problem
- Lowering/raising an existing threshold to accommodate one more disc
- Any fix that only looks at the failing disc without comparing others

### Example: Chapter-split detection

Bad (threshold): `if chapters_per_episode > 7: don't split`
Good (structural): detect repeating OP/body/ED chapter cycle via
`_detect_episode_periodicity()` — only split when positive evidence
of episode structure exists.
27 changes: 26 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,9 +175,34 @@ Output includes: `schema_version`, `disc`, `playlists`, `episodes`, `special_fea
- Special feature detection is in `_detect_special_features()` — uses IG JumpTitle buttons pointing to non-episode playlists
- `JumpTitle(N)` in HDMV commands is **1-based** — convert to 0-based index title with `N - 1`
- Chapter-split features: when a button sets `reg2` before `JumpTitle`, it selects a chapter within the target playlist (multi-feature playlists)
- Playlist classifications are heuristic-based; new disc patterns may need new rules
- Segment keys use quantization (default ±250ms) to handle tiny timing variances

### Fixing Analysis Mismatches — Structural Signals over Thresholds

When a new disc produces wrong episode or special counts, **do not** add numeric
thresholds or ratio guards. Instead:

1. **Study the data** — dump chapter durations, IG menu buttons, segment labels,
and MovieObject navigation across the failing disc AND existing fixtures that
work correctly. Look for structural patterns that differentiate the two cases.
2. **Identify a structural signal** — something the disc data tells you about its
own content type (e.g. repeating OP/body/ED chapter cycle for episodes,
IG button-per-page counts matching chapters-per-episode, title-hint references
in navigation commands).
3. **Require positive evidence** — the code should ask "does the data say this IS
X?" rather than "does the data say this is NOT X?". Negative guards based on
thresholds (like `max_chapters_per_episode = 7`) are brittle and will break on
the next disc that doesn't match the assumed range.
4. **Combine signals** — when one signal isn't sufficient alone, combine multiple
independent signals (e.g. IG marks + chapter periodicity + button-per-page).
Each signal lowers the confidence bar, but at least one must be present.

Examples of structural signals already in use:
- **Chapter periodicity** (`_detect_episode_periodicity`): detects repeating
OP (~90 s) / body / ED (~90 s) / preview (~30 s) cycle in chapter durations
- **IG chapter marks**: JT + reg2 buttons directly encode episode boundaries
- **Digital archive multi-signal**: item count + title hint + no-audio streams

## Copyright & Fixture Guidelines
- **NEVER commit copyrighted media content** (m2ts video/audio streams, full disc images, cover art, subtitle tracks, etc.) to the repository.
- **Test fixtures** in `tests/fixtures/` contain only small structural metadata files (MPLS, CLPI, index.bdmv, MovieObject.bdmv, ICS segments) — these are binary headers/indexes, not audiovisual content.
Expand Down
115 changes: 95 additions & 20 deletions bdpl/analyze/ordering.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,77 @@ def _episodes_from_play_all(
return episodes


def _chapter_durations_s(playlist: Playlist) -> list[float]:
"""Return chapter durations in seconds for a playlist."""
ch_times = [ticks_to_ms(ch.timestamp) for ch in playlist.chapters]
total_ms = playlist.duration_ms
durs: list[float] = []
for i in range(len(ch_times)):
end = ch_times[i + 1] if i + 1 < len(ch_times) else total_ms
durs.append((end - ch_times[i]) / 1000)
return durs


# Anime episode chapter structure ranges (in seconds)
_OP_MIN_S, _OP_MAX_S = 45, 160 # opening theme
_BODY_MIN_S_CH = 180 # body segment (scene)
_ED_MIN_S, _ED_MAX_S = 45, 160 # ending theme


def _detect_episode_periodicity(
ch_durs_s: list[float],
) -> tuple[int, int, float] | None:
"""Detect repeating episode structure in chapter durations.

Anime episode compilations embed a fixed structure per episode:
OP (~90 s) → Body segments → ED (~90 s) [→ Preview (~30 s)]. This
creates a periodic pattern visible in the chapter durations.

Tries periods 4–7 (chapters per episode). For each candidate period,
partitions chapters into groups and checks whether each group matches
the expected structure (OP-length first chapter, at least one long body
chapter, ED-length chapter near the end).

Returns ``(period, n_episodes, confidence)`` for the best match, where
*confidence* is the fraction of groups that match. Returns ``None``
when no period achieves ≥ 75 % match with ≥ 2 groups.
"""
n = len(ch_durs_s)
best: tuple[int, int, float] | None = None

for period in range(4, 8):
# Allow total chapters to be within ±1 of period × n_groups
for n_groups in range(2, n // period + 2):
total_expected = n_groups * period
if abs(total_expected - n) > 1:
continue

groups_matched = 0
for g in range(n_groups):
start = g * period
end = min(start + period, n)
group = ch_durs_s[start:end]
if len(group) < 3:
continue

op_ok = _OP_MIN_S <= group[0] <= _OP_MAX_S
body_ok = any(d > _BODY_MIN_S_CH for d in group[1:-1])
ed_ok = (_ED_MIN_S <= group[-1] <= _ED_MAX_S) or (
len(group) >= 3 and _ED_MIN_S <= group[-2] <= _ED_MAX_S
)

if op_ok and body_ok and ed_ok:
groups_matched += 1

if n_groups >= 2:
score = groups_matched / n_groups
if score >= 0.75:
if best is None or score > best[2] or (score == best[2] and n_groups > best[1]):
best = (period, n_groups, score)

return best


def _episodes_from_chapters(
playlist: Playlist,
ig_chapter_marks: list[int] | None = None,
Expand All @@ -113,44 +184,48 @@ def _episodes_from_chapters(
Used when a playlist contains one (or few) very long play item(s) with
multiple episodes encoded back-to-back, distinguishable only by chapters.

When *ig_chapter_marks* are provided (from IG menu buttons), they serve as
structural confirmation that the playlist contains multiple episodes.
Without such evidence, a minimum of 3 estimated episodes is required —
an ``est_count`` of 2 is ambiguous (could be a single ~50 min movie).
**Decision to split** requires positive structural evidence from at least
one of two signals:

1. **IG chapter marks** — buttons in the disc menu directly encode episode
start chapters (e.g. reg2 = [0, 5, 10, 15]). Definitive.
2. **Chapter periodicity** — chapter durations show a repeating
OP / body / ED cycle characteristic of anime episode compilations.

Heuristic: group consecutive chapters into blocks whose total duration
falls within episode range (10–45 min). When a running block exceeds the
expected episode length, start a new episode at the chapter boundary.
Without either signal the playlist is assumed to be a single movie or OVA
and is *not* split, regardless of total duration.

Splitting uses a greedy algorithm that groups consecutive chapters into
blocks whose total duration approaches the target episode length.
"""
if not playlist.chapters or len(playlist.chapters) < 4:
return []

# Only consider chapters on the main play item (item_ref=0 typically)
# Build list of (chapter_index, start_time_ms)
main_item = playlist.play_items[0]
ticks_to_ms(main_item.in_time)

ch_times: list[float] = []
for ch in playlist.chapters:
ch_ms = ticks_to_ms(ch.timestamp)
ch_times.append(ch_ms)
ch_times.append(ticks_to_ms(ch.timestamp))

# Compute total playlist duration
total_dur_ms = playlist.duration_ms
# Estimate episode count from total duration
# Typical anime episode: 22–26 min; try to find the best fit
est_ep_dur_ms = 25 * 60 * 1000 # 25 minutes as starting estimate
est_count = max(1, round(total_dur_ms / est_ep_dur_ms))

if est_count <= 1:
return [] # Not worth splitting

# IG chapter marks provide structural evidence of multiple episodes.
# Without such evidence, require est_count >= 3 because est_count == 2
# (~50 min total) is ambiguous — could be a single movie.
# --- Require positive structural evidence before splitting ---
has_ig_confirmation = ig_chapter_marks is not None and len(ig_chapter_marks) >= 2
if est_count <= 2 and not has_ig_confirmation:
return []
if not has_ig_confirmation:
ch_durs = _chapter_durations_s(playlist)
periodicity = _detect_episode_periodicity(ch_durs)
if periodicity is None:
return [] # No structural evidence of episodes
# Use the detected episode count from periodicity when it differs
# from the duration-based estimate.
_, periodic_count, _ = periodicity
if abs(periodic_count - est_count) <= 1:
est_count = periodic_count

# Target duration per episode
target_dur_ms = total_dur_ms / est_count
Expand Down
24 changes: 24 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,30 @@ def disc19_analysis(disc19_path):
return _analyze_fixture(disc19_path)


@pytest.fixture(scope="session")
def disc20_path() -> Path:
"""Return path to bundled disc20 fixture."""
return _fixture_path("disc20")


@pytest.fixture(scope="session")
def disc20_analysis(disc20_path):
"""Run and cache full analysis for the bundled disc20 fixture."""
return _analyze_fixture(disc20_path)


@pytest.fixture(scope="session")
def disc21_path() -> Path:
"""Return path to bundled disc21 fixture."""
return _fixture_path("disc21")


@pytest.fixture(scope="session")
def disc21_analysis(disc21_path):
"""Run and cache full analysis for the bundled disc21 fixture."""
return _analyze_fixture(disc21_path)


@pytest.fixture
def cli_runner() -> Callable[..., subprocess.CompletedProcess[str]]:
"""Return helper to invoke `python -m bdpl.cli` consistently in tests."""
Expand Down
Binary file added tests/fixtures/disc20/CLIPINF/00000.clpi
Binary file not shown.
Binary file added tests/fixtures/disc20/CLIPINF/00001.clpi
Binary file not shown.
Binary file added tests/fixtures/disc20/CLIPINF/00002.clpi
Binary file not shown.
Binary file added tests/fixtures/disc20/CLIPINF/00003.clpi
Binary file not shown.
Binary file added tests/fixtures/disc20/CLIPINF/00004.clpi
Binary file not shown.
Binary file added tests/fixtures/disc20/CLIPINF/00005.clpi
Binary file not shown.
Binary file added tests/fixtures/disc20/CLIPINF/00006.clpi
Binary file not shown.
Binary file added tests/fixtures/disc20/CLIPINF/00007.clpi
Binary file not shown.
Binary file added tests/fixtures/disc20/CLIPINF/00008.clpi
Binary file not shown.
6 changes: 6 additions & 0 deletions tests/fixtures/disc20/META/DL/bdmt_eng.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<disclib>
<di:discinfo xmlns:di="urn:BDA:bdmv;discinfo">
<di:title><di:name>TEST DISC 20</di:name></di:title>
</di:discinfo>
</disclib>
Binary file added tests/fixtures/disc20/MovieObject.bdmv
Binary file not shown.
Binary file added tests/fixtures/disc20/PLAYLIST/00000.mpls
Binary file not shown.
Binary file added tests/fixtures/disc20/PLAYLIST/00001.mpls
Binary file not shown.
Binary file added tests/fixtures/disc20/PLAYLIST/00002.mpls
Binary file not shown.
Binary file added tests/fixtures/disc20/PLAYLIST/00003.mpls
Binary file not shown.
Binary file added tests/fixtures/disc20/ics_menu.bin
Binary file not shown.
Binary file added tests/fixtures/disc20/index.bdmv
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00000.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00001.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00002.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00003.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00004.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00005.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00006.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00007.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00008.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00009.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00010.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00011.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00012.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00013.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00014.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00015.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00016.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00017.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00018.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00019.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00020.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00021.clpi
Binary file not shown.
Binary file added tests/fixtures/disc21/CLIPINF/00022.clpi
Binary file not shown.
6 changes: 6 additions & 0 deletions tests/fixtures/disc21/META/DL/bdmt_eng.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<disclib>
<di:discinfo xmlns:di="urn:BDA:bdmv;discinfo">
<di:title><di:name>TEST DISC 21</di:name></di:title>
</di:discinfo>
</disclib>
Binary file added tests/fixtures/disc21/MovieObject.bdmv
Binary file not shown.
Binary file added tests/fixtures/disc21/PLAYLIST/00000.mpls
Binary file not shown.
Binary file added tests/fixtures/disc21/PLAYLIST/00001.mpls
Binary file not shown.
Binary file added tests/fixtures/disc21/PLAYLIST/00002.mpls
Binary file not shown.
Binary file added tests/fixtures/disc21/PLAYLIST/00003.mpls
Binary file not shown.
Binary file added tests/fixtures/disc21/ics_menu.bin
Binary file not shown.
Binary file added tests/fixtures/disc21/index.bdmv
Binary file not shown.
44 changes: 44 additions & 0 deletions tests/test_disc20_scan.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
"""Tests for disc20 fixture — single compilation movie with scene chapters."""

from __future__ import annotations

import pytest

from bdpl.model import DiscAnalysis

pytestmark = pytest.mark.integration


class TestDisc20Episodes:
def test_episode_count(self, disc20_analysis: DiscAnalysis) -> None:
assert len(disc20_analysis.episodes) == 1

def test_episode_playlist(self, disc20_analysis: DiscAnalysis) -> None:
assert disc20_analysis.episodes[0].playlist == "00002.mpls"

def test_episode_duration_is_movie_length(self, disc20_analysis: DiscAnalysis) -> None:
dur_min = disc20_analysis.episodes[0].duration_ms / 60000
assert 100 < dur_min < 140, f"Movie duration {dur_min:.1f}min out of range"

def test_not_chapter_split(self, disc20_analysis: DiscAnalysis) -> None:
"""Movie with 41 scene chapters must NOT be split into episodes."""
assert len(disc20_analysis.episodes) == 1
assert disc20_analysis.episodes[0].confidence == 1.0


class TestDisc20Specials:
def test_special_feature_count(self, disc20_analysis: DiscAnalysis) -> None:
assert len(disc20_analysis.special_features) == 1

def test_special_category(self, disc20_analysis: DiscAnalysis) -> None:
sf = disc20_analysis.special_features[0]
assert sf.category == "extra"
assert sf.playlist == "00003.mpls"

def test_special_visible(self, disc20_analysis: DiscAnalysis) -> None:
assert disc20_analysis.special_features[0].menu_visible


class TestDisc20Metadata:
def test_disc_title(self, disc20_analysis: DiscAnalysis) -> None:
assert disc20_analysis.disc_title == "TEST DISC 20"
39 changes: 39 additions & 0 deletions tests/test_disc21_scan.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
"""Integration tests for the disc21 fixture — special disc with OVA + digital archive."""

from __future__ import annotations

import pytest

from bdpl.model import DiscAnalysis

pytestmark = pytest.mark.integration


class TestDisc21Episodes:
def test_episode_count(self, disc21_analysis: DiscAnalysis) -> None:
assert len(disc21_analysis.episodes) == 1

def test_episode_playlist(self, disc21_analysis: DiscAnalysis) -> None:
assert disc21_analysis.episodes[0].playlist == "00002.mpls"

def test_episode_duration(self, disc21_analysis: DiscAnalysis) -> None:
dur_min = disc21_analysis.episodes[0].duration_ms / 60000
assert 44.0 < dur_min < 44.2, f"OVA duration {dur_min:.2f}min, expected ~44:03"


class TestDisc21Specials:
def test_special_feature_count(self, disc21_analysis: DiscAnalysis) -> None:
assert len(disc21_analysis.special_features) == 1

def test_digital_archive(self, disc21_analysis: DiscAnalysis) -> None:
sf = disc21_analysis.special_features[0]
assert sf.category == "digital_archive"
assert sf.playlist == "00003.mpls"

def test_digital_archive_visible(self, disc21_analysis: DiscAnalysis) -> None:
assert disc21_analysis.special_features[0].menu_visible


class TestDisc21Metadata:
def test_disc_title(self, disc21_analysis: DiscAnalysis) -> None:
assert disc21_analysis.disc_title == "TEST DISC 21"
Loading