fix: harden venue dedup with normalized name matching and unified creation path#146
Merged
fix: harden venue dedup with normalized name matching and unified creation path#146
Conversation
…ation path - Unify venue creation: VenueService::get_or_create_venue() now delegates to Venue_Taxonomy::find_or_create_venue() instead of doing its own weak term_exists() lookup. Single venue creation path across the system. - Add normalized name matching to find_or_create_venue(): new step in the matching cascade strips punctuation, dashes, apostrophes, case, and articles before comparing. Catches variants like 'Saturn - Birmingham' vs 'Saturn Birmingham', 'Reggie\'s Rock Club' vs 'Reggies Rock Club', 'RADIO/EAST' vs 'Radio East'. - Add resolveVenueTerm() to EventDuplicateStrategy with same cascading lookup (exact → slug → normalized), so Strategy 2 (venue + date + fuzzy title) can match even when incoming venue name differs in punctuation. - Apply same normalized venue lookup to EventUpsert::findEventByVenueDateAndFuzzyTitle() for consistency. - Improve advisory lock reliability: retry up to 3 times with increasing timeouts (5s, 10s, 15s = 30s total) before proceeding unlocked. Log level upgraded from debug to warning on failure.
Contributor
Homeboy Results —
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Prevents future venue and event duplicates by hardening the dedup architecture at three levels.
Problem
Audit found 15 duplicate venue groups and 325 duplicate event pairs caused by:
VenueService::get_or_create_venue()did simpleterm_exists()(no address matching, no normalization), whileVenue_Taxonomy::find_or_create_venue()had the full cascadeChanges
1. Unified Venue Creation (
VenueService.php)get_or_create_venue()now delegates entirely toVenue_Taxonomy::find_or_create_venue(). Removes the duplicate code path with its weaker matching. One creation path, everywhere.2. Normalized Name Matching (
Venue_Taxonomy.php)New step in
find_or_create_venue()matching cascade:Normalization strips: HTML entities, case, articles, punctuation, dashes, apostrophes. Minimum 3-char match to avoid false positives.
3. Venue Resolution in Dedup Strategy (
EventDuplicateStrategy.php)New
resolveVenueTerm()helper with cascading lookup: exact name → slug → normalized name. Applied to Strategy 2 so event dedup works even when venue names differ in punctuation between sources.Same fix applied to
EventUpsert::findEventByVenueDateAndFuzzyTitle().4. Advisory Lock Retry (
EventUpsert.php)Retry up to 3 times with increasing timeouts (5s, 10s, 15s = 30s total). Log level upgraded from
debugtowarningon failure.Testing
homeboy test data-machine-events)