Skip to content

Propagate MARC 082 $2 (DDC source) to frontend for all networks; centralize DDC extraction#180

Open
qdgaraertgaer wants to merge 5 commits intoUB-Mannheim:masterfrom
qdgaraertgaer:feature/ddc-tooltips
Open

Propagate MARC 082 $2 (DDC source) to frontend for all networks; centralize DDC extraction#180
qdgaraertgaer wants to merge 5 commits intoUB-Mannheim:masterfrom
qdgaraertgaer:feature/ddc-tooltips

Conversation

@qdgaraertgaer
Copy link
Copy Markdown
Contributor

@qdgaraertgaer qdgaraertgaer commented Oct 8, 2025

/!\ Caution: modifications made with AI /!\

Summary

  • Move DDC extraction into the shared backend mapping so every Verbund exposes DDC entries with optional source metadata (MARC 082 subfield $2).
  • Ensure frontend/backends remain backward compatible by supporting both plain notation strings and enriched objects { notation, source }.
  • Key motivations: show DDC provenance (e.g., "22", "22sdnb") as hover/tooltips for all networks.
  • Add support for MARC 655 form/genre headings (indicator 7) that explicitly reference GND vocabulary by checking subfield $2 for the value "gnd-content". These 655 entries are now eligible for inclusion in the Schlagwörter pipeline where appropriate.

What this PR changes

  • lib.php
    • performMapping(): after cleanUp(), injects DDC entries extracted from the MARC XML (extractDdcEntriesFromXml).
    • cleanUp(): updated to accept and normalize ddc entries that are either strings or arrays with keys 'notation' and optional 'source'.
    • Added extractDdcEntriesFromXml($xml): extracts all 082 subfield $a values and optional $2 source, dedupes by notation|source.
    • Updated the shared sw mapping handling to accept MARC 655 entries with indicator 7 only when they include subfield[@code="2"]="gnd-content", ensuring we pick up GND form headings explicitly flagged as GND-sourced.
  • alma-sru.php
    • Rely on lib.php's shared extractor and mapping flow; added local mapping to include 655 _7/gnd-content where relevant.
  • dnb.php
    • Overrode sw.mainPart to include 655 _7/gnd-content entries alongside 689 and other 6xx sources.

Behavioral notes

  • DDC entries in JSON are now either:
    • "727.709" (string) — legacy/simple form, or
    • { "notation": "727.709", "source": "22" } — enriched form including MARC $2
  • Schlagwörter (subject) extraction now includes GND form/genre headings from MARC 655 ind2=7 only when subfield $2 equals "gnd-content". This avoids accidental inclusion of 655 entries from other vocabularies and keeps the mapping conservative and predictable.
  • Frontend rendering code (rendering.js) must handle both shapes for DDC; Schlagwörter rendering/cleanup logic remains the same but will now see additional keys for some records.
  • Duplicate notation+source entries are deduplicated.

Testing / How to verify

  1. Server endpoints (examples — adjust host/port as needed):
  2. Schlagwörter / MARC 655 verification:
  3. Open UI:
    • Hard-refresh / clear cache then load: /isbn/suche.html?isbn=9783775740913 (or other test ISBNs)
    • Hover DDC links in each verbund row; where available, tooltip should show the MARC 082 $2 value.
  4. Unit testing suggestion:
    • Add a small PHP test that feeds a sample MARC XML containing multiple 082$a/$2 combinations plus a 655 ind2=7 with $2="gnd-content" through performMapping() and asserts:
      • outputMap['ddc'] contains enriched objects for entries with source $2
      • deduplication of identical notation|source pairs
      • 655 entries with other $2 values are ignored by the 655->Schlagwörter pipeline

Backward-compatibility & risks

  • Code intentionally preserves legacy string DDC values; no breaking changes expected for consumers that only expect strings.
  • The gnd-content restriction on 655 prevents broad inclusion of non-GND form headings — if your local data uses a different $2 value (e.g. gnd), consider broadening the condition to subfield[@code="2"]="gnd" or subfield[@code="2"]="gnd-content".
  • Potential risk: any custom code that assumed ddc is always an array of strings should be updated to handle objects. Adding unit tests will help catch such assumptions.

Notes on verification performed

  • Ran PHP syntax checks:
    • php -l isbn/lib.php — No syntax errors
    • php -l isbn/dnb.php — No syntax errors
    • php -l isbn/alma-sru.php — No syntax errors
  • Reviewed XPath logic for the gnd-content clause in lib.php, dnb.php, and alma-sru.php to confirm it is conservative and will only accept 655 _7 entries explicitly flagged as GND form headings.

Next steps

  • Add unit tests for extractDdcEntriesFromXml and performMapping behaviour.
  • Optionally standardize the ddc output to always use objects (notation + optional source) and bump clients accordingly if desired.
  • If you want broader 655 acceptance for local datasets, I can broaden the XPath to accept both gnd and gnd-content or provide a configuration toggle per-verbund.

@zuphilip
Copy link
Copy Markdown
Member

Sorry @qdgaraertgaer for not getting to this earlier. Probably I was scared about the warning, that the content was created with the help of AI. 🤖

If I understand it correctly you have three different things here:

  1. Add the content of subfield 2 for field 089 (source metadata) as hover text for DDC notations (if available)
  2. Add Formschlagwörter from field 655
  3. Nits in markdown

Isn't point 1 most of the time only the edition of the DDC? E.g 23th or 22th edition of the DDC. For what is this valuable information to have?

I am not sure we want Formschlagwörter resp. mix them with the other Schlagwörter. As this is AFAIK mostly switched to something normal cataloguing is doing and no subject specialist is involved in that. But I need to consider this further...

Do you have more examples for showing the differences? Just a bunch of ISBN is fine to test.

@qdgaraertgaer
Copy link
Copy Markdown
Contributor Author

Dear @zuphilip no problem at all!

Those changes have a lot to do with our internal rules.

Regarding the content of subfield 2 for field 089 I need to check if the data is properly saved. For example I have seen books where DDC 296 is saved as 23sndb while we have to save it as 23 (for we can only use 23sndb for round numbers like 290). Seeing this information on hover means that I don't have to open the book in Alma.

As for the Formschlagwörter from field 655, I use it (for example) to make sure that Aufsatzsammlung is present, as our formal cataloger do not give it, but the subject specialists do (the former only use the enge Liste gemäß RDA DACH).

As for some examples, you could use:

  • 9781032695631
  • 9783161593253

If you need more examples I'll start collecting them.

Thank you in advance

@qdgaraertgaer
Copy link
Copy Markdown
Contributor Author

Dear @zuphilip

Maybe let's not merge this version. The goal would be for it to look like this (at least for swisscovery):

image

What do you think ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants