Skip to content

fix: harden venue dedup with normalized name matching and unified creation path#146

Merged
chubes4 merged 1 commit intomainfrom
fix/venue-dedup-hardening
Mar 23, 2026
Merged

fix: harden venue dedup with normalized name matching and unified creation path#146
chubes4 merged 1 commit intomainfrom
fix/venue-dedup-hardening

Conversation

@chubes4
Copy link
Member

@chubes4 chubes4 commented Mar 23, 2026

Summary

Prevents future venue and event duplicates by hardening the dedup architecture at three levels.

Problem

Audit found 15 duplicate venue groups and 325 duplicate event pairs caused by:

  1. Dual venue creation pathsVenueService::get_or_create_venue() did simple term_exists() (no address matching, no normalization), while Venue_Taxonomy::find_or_create_venue() had the full cascade
  2. Venue name variants slipping through — "Saturn - Birmingham" vs "Saturn Birmingham", "Reggie's Rock Club" vs "Reggies Rock Club" created separate terms
  3. Event dedup failing on unresolved venues — Strategy 2 (venue + date + fuzzy title) used exact name lookup, so if the incoming venue name had different punctuation, it couldn't resolve to the existing term
  4. Advisory lock timeouts — 10-second single-try timeout meant concurrent batch jobs proceeded unlocked and created duplicates

Changes

1. Unified Venue Creation (VenueService.php)

get_or_create_venue() now delegates entirely to Venue_Taxonomy::find_or_create_venue(). Removes the duplicate code path with its weaker matching. One creation path, everywhere.

2. Normalized Name Matching (Venue_Taxonomy.php)

New step in find_or_create_venue() matching cascade:

  • Address match → Exact name → "The" prefix toggle → Normalized name match → Create new

Normalization strips: HTML entities, case, articles, punctuation, dashes, apostrophes. Minimum 3-char match to avoid false positives.

3. Venue Resolution in Dedup Strategy (EventDuplicateStrategy.php)

New resolveVenueTerm() helper with cascading lookup: exact name → slug → normalized name. Applied to Strategy 2 so event dedup works even when venue names differ in punctuation between sources.

Same fix applied to EventUpsert::findEventByVenueDateAndFuzzyTitle().

4. Advisory Lock Retry (EventUpsert.php)

Retry up to 3 times with increasing timeouts (5s, 10s, 15s = 30s total). Log level upgraded from debug to warning on failure.

Testing

  • All existing tests pass (homeboy test data-machine-events)
  • Lint clean (no new issues)

…ation path

- Unify venue creation: VenueService::get_or_create_venue() now delegates
  to Venue_Taxonomy::find_or_create_venue() instead of doing its own
  weak term_exists() lookup. Single venue creation path across the system.

- Add normalized name matching to find_or_create_venue(): new step in the
  matching cascade strips punctuation, dashes, apostrophes, case, and
  articles before comparing. Catches variants like 'Saturn - Birmingham'
  vs 'Saturn Birmingham', 'Reggie\'s Rock Club' vs 'Reggies Rock Club',
  'RADIO/EAST' vs 'Radio East'.

- Add resolveVenueTerm() to EventDuplicateStrategy with same cascading
  lookup (exact → slug → normalized), so Strategy 2 (venue + date + fuzzy
  title) can match even when incoming venue name differs in punctuation.

- Apply same normalized venue lookup to EventUpsert::findEventByVenueDateAndFuzzyTitle()
  for consistency.

- Improve advisory lock reliability: retry up to 3 times with increasing
  timeouts (5s, 10s, 15s = 30s total) before proceeding unlocked. Log
  level upgraded from debug to warning on failure.
@homeboy-ci
Copy link
Contributor

homeboy-ci bot commented Mar 23, 2026

Homeboy Results — data-machine-events

Homeboy

Failure Digest

Lint Failure Digest

Test Failure Digest

Audit Failure Digest

Autofixability classification

  • Overall: auto_fixable
  • Autofix enabled: yes
  • Autofix attempted this run: no
  • Auto-fixable failed commands:
    • lint
    • test
  • Failed commands with available automated fixes:
    • lint
    • test

Machine-readable artifacts

  • homeboy-lint-summary.json
  • homeboy-test-failures.json
  • homeboy-audit-summary.json
  • homeboy-autofixability.json

⚡ Scope: changed files only

audit (changed files only)

  • Drift increased: no

lint (changed files only)

test (changed files only)

Tooling versions
  • Homeboy CLI: homeboy 0.85.3+93e8a13
  • Extension: wordpress from https://github.com/Extra-Chill/homeboy-extensions
  • Extension revision: unknown
  • Action: Extra-Chill/homeboy-action@v2

Homeboy Action v1

@chubes4 chubes4 merged commit 75ce980 into main Mar 23, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant