borisolver · borisolver · Dec 8, 2025 · Dec 8, 2025 · Dec 8, 2025
diff --git a/docs/physical_contact_email_inference.md b/docs/physical_contact_email_inference.md
@@ -0,0 +1,69 @@
+# Physical Location Contact Email Inference (Proposal)
+
+This document outlines a free, robust approach for inferring contact emails for physical locations (e.g., campus buildings) using OpenStreetMap (OSM) and public data. It complements the existing digital-contact inference pipeline.
+
+## Data sources (free)
+- **OpenStreetMap (primary)**
+  - Overpass API (read-only) to query objects around a coordinate or polygon.
+  - OSM tags often include `contact:email`, `email`, `operator`, `owner`, `brand`, and website URLs.
+  - OSM `addr:*` fields and `name`/`operator` labels help derive domains.
+- **Institutional websites (fallback)**
+  - Fetch public homepages returned by OSM (e.g., `website` tag) and parse mailto links or `Contact` pages with a small HTML scraper.
+- **Public campus directories (optional)**
+  - Many universities expose JSON/CSV directory endpoints or accessible HTML that can be scraped politely for role-based emails (e.g., `facilities@ucla.edu`).
+
+## Inference pipeline
+1. **Locate the feature**
+   - Reverse-geocode report coordinates with OSM Nominatim to get campus/building names and OSM IDs.
+   - Run an Overpass query for features within a small radius (e.g., 150–250m) ordered by distance that match expected facility types: `amenity=*`, `building=*`, `office=*`, `university`, `school`, `hospital`, `public_transport`, `shop`, `tourism`, etc.
+   - Prefer features whose `name`/`operator`/`brand` strings overlap with the report text (e.g., "UCLA Law" matches `name~"UCLA"` and `name~"Law"`).
+
+2. **Direct email extraction (OSM tags)**
+   - If any candidate has `contact:email` or `email`, collect them immediately.
+   - If missing, use `website`, `contact:website`, `operator:website`, or `brand:wikipedia` URLs to crawl for emails.
+
+3. **Domain inference (for campuses/businesses)**
+   - Derive a canonical domain from OSM tags:
+     - `website` or `operator:website` (strip path and protocol).
+     - If absent, construct from operator name via a ruleset (e.g., `University of California, Los Angeles` → `ucla.edu`; `City of Santa Monica` → `santamonica.gov`).
+   - Generate role-based emails using campus-aware heuristics:
+     - Default set: `info@<domain>`, `support@<domain>`, `contact@<domain>`, `help@<domain>`.
+     - Facilities/custodial: `facilities@<domain>`, `maintenance@<domain>`, `custodian@<domain>`, `grounds@<domain>`.
+     - Safety/security: `security@<domain>`, `police@<domain>`, `publicsafety@<domain>`.
+     - Accessibility: `ada@<domain>` or `accessibility@<domain>`.
+   - If the OSM feature name includes a subdivision (e.g., `School of Law`), prepend it when forming custodial contacts: `law@<domain>`, `custodian-law@<domain>`, `facilities-law@<domain>`.
+
+4. **Hierarchy-aware expansion**
+   - Walk up the place hierarchy from the OSM reverse-geocode result:
+     - Specific feature (building) → campus (e.g., `UCLA`) → operator/owner (e.g., `University of California` or `City of Los Angeles`).
+   - For each level, apply domain inference and generate role-based candidates. This yields multiple responsible parties (e.g., building facilities, campus facilities, city public works).
+
+5. **Scoring and de-duplication**
+   - Score candidates by evidence source:
+     1. OSM-tagged emails (exact) — highest.
+     2. Emails scraped from linked websites.
+     3. Role-based heuristics on feature domain.
+     4. Role-based heuristics on parent domain.
+   - Remove duplicates, normalize casing, and keep the top 3–5 distinct addresses for notification.
+
+6. **Safety and quality controls**
+   - Validate email syntax (RFC 5322 regex) and filter obvious placeholders (e.g., `test@`, `example@`).
+   - Enforce a allowlist for academic/government TLDs when inferring from place names (`.edu`, `.gov`, `.ca.gov`, `.org`).
+   - Rate-limit Overpass/HTTP calls and cache results per `(lat, lon)` and feature name to respect service limits.
+
+## Example (UCLA Law School report)
+- Reverse-geocode → `UCLA School of Law` building, operator `University of California, Los Angeles`, website `https://law.ucla.edu`.
+- OSM lacks direct email.
+- Domain inference yields `law.ucla.edu` (building) and `ucla.edu` (campus/owner).
+- Generated candidates:
+  - `contact@law.ucla.edu`, `facilities@law.ucla.edu`, `custodian-law@ucla.edu`.
+  - Campus-level: `facilities@ucla.edu`, `security@ucla.edu`, `accessibility@ucla.edu`.
+- Score by specificity; keep top 4–5 unique addresses for the notification batch.
+
+## Implementation notes
+- Add an `osm-contact-inference` module that:
+  - Accepts `(lat, lon, report_text)` and returns ranked emails with provenance.
+  - Uses an Overpass client (e.g., `requests` + Overpass QL) with a small template query.
+  - Integrates with existing analysis pipeline to populate `inferred_contact_emails` for physical reports, tagging source (`osm_tag`, `website_scrape`, `heuristic_feature`, `heuristic_parent`).
+- Provide unit tests with canned Overpass responses to keep deterministic.
+- Keep the system toggleable via config flag to guard against rate-limits.
diff --git a/email-service/config/config.go b/email-service/config/config.go
@@ -24,6 +24,9 @@ type Config struct {
 	OptOutURL    string
 	PollInterval string
 	HTTPPort     string
+
+	// Brand dashboard configuration
+	BrandDashboardURL string
 }
 
 // Load loads configuration from environment variables and flags
@@ -46,6 +49,7 @@ func Load() *Config {
 	cfg.OptOutURL = getEnv("OPT_OUT_URL", "http://localhost:8080/opt-out")
 	cfg.PollInterval = getEnv("POLL_INTERVAL", "10s")
 	cfg.HTTPPort = getEnv("HTTP_PORT", "8080")
+	cfg.BrandDashboardURL = getEnv("BRAND_DASHBOARD_URL", "https://dashboard.cleanapp.io/brand")
 
 	return cfg
 }

diff --git a/email-service/email/email_sender.go b/email-service/email/email_sender.go
@@ -154,8 +154,16 @@ func (e *EmailSender) sendOneEmailWithAnalysis(recipient string, reportImage, ma
 
 	// Create subject with analysis title
 	subject := "CleanApp Report"
+	isDigital := analysis != nil && analysis.Classification == "digital"
+	if isDigital {
+		subject = "CleanApp alert: major new issue reported for your brand"
+	}
 	if analysis.Title != "" {
-		subject = fmt.Sprintf("CleanApp Report: %s", analysis.Title)
+		if isDigital {
+			subject = fmt.Sprintf("CleanApp alert: major new issue — %s", analysis.Title)
+		} else {
+			subject = fmt.Sprintf("CleanApp Report: %s", analysis.Title)
+		}
 	}
 
 	to := mail.NewEmail(recipient, recipient)
@@ -279,43 +287,63 @@ func (e *EmailSender) getEmailHtml(recipient string, hasReport, hasMap bool) str
 
 // getEmailTextWithAnalysis returns the plain text content for emails with analysis data
 func (e *EmailSender) getEmailTextWithAnalysis(recipient string, analysis *models.ReportAnalysis, hasReport, hasMap bool) string {
-	var content string
+	if analysis.Classification == "digital" {
+		digitalSubject := "CleanApp alert: major new issue reported for your brand"
+		preheader := "Someone just submitted a brand-related digital report with photos."
 
-	attachments := ""
-	if hasReport || hasMap {
-		attachments = "\nThis email contains:\n"
+		heroReport := ""
 		if hasReport {
-			attachments += "- The report image\n"
+			heroReport = "\n- Hero: photo of report included."
 		}
+
+		heroLocation := ""
 		if hasMap {
-			attachments += "- A map showing the location\n"
+			heroLocation = "\n- Hero: photo of location included."
 		}
-		attachments += "- AI analysis results\n"
-	}
-	if analysis.Classification == "digital" {
-		content = fmt.Sprintf(`Hello,
 
-You have received a new CleanApp digital issue report with analysis.
+		return fmt.Sprintf(`%s
+Preheader: %s
 
-REPORT ANALYSIS:
-Title: %s
-Description: %s
-Type: Digital Issue
+Someone just submitted a new digital report mentioning your brand.
+CleanApp AI analyzed this issue to highlight potential legal and risk ranges connected to your brand presence.%s%s
+
+AI analysis summary:
+- Title: %s
+- Description: %s
+- Type: Digital Issue
+
+Open the Brand Dashboard to see the AI rationale, mapped areas, and supporting media:
 %s
-Note: This is a digital issue report. Physical metrics (litter/hazard probability) are not applicable.
 
 To unsubscribe from these emails, please visit: %s?email=%s
 You can also reply to this email with "UNSUBSCRIBE" in the subject line.
 
 Best regards,
 The CleanApp Team`,
+			digitalSubject,
+			preheader,
+			heroReport,
+			heroLocation,
 			analysis.Title,
 			analysis.Description,
-			attachments,
+			e.config.BrandDashboardURL,
 			e.config.OptOutURL,
 			recipient)
-	} else {
-		content = fmt.Sprintf(`Hello,
+	}
+
+	attachments := ""
+	if hasReport || hasMap {
+		attachments = "\nThis email contains:\n"
+		if hasReport {
+			attachments += "- The report image\n"
+		}
+		if hasMap {
+			attachments += "- A map showing the location\n"
+		}
+		attachments += "- AI analysis results\n"
+	}
+
+	return fmt.Sprintf(`Hello,
 
 You have received a new CleanApp report with analysis.
 
@@ -334,29 +362,116 @@ You can also reply to this email with "UNSUBSCRIBE" in the subject line.
 
 Best regards,
 The CleanApp Team`,
+		analysis.Title,
+		analysis.Description,
+		analysis.LitterProbability*100,
+		analysis.HazardProbability*100,
+		analysis.SeverityLevel,
+		attachments,
+		e.config.OptOutURL,
+		recipient)
+}
+
+// getEmailHtmlWithAnalysis returns the HTML content for emails with analysis data
+func (e *EmailSender) getEmailHtmlWithAnalysis(recipient string, analysis *models.ReportAnalysis, hasReport, hasMap bool) string {
+	isDigital := analysis.Classification == "digital"
+
+	if isDigital {
+		subjectLine := "CleanApp alert: major new issue reported for your brand"
+		preheader := "Someone just submitted a brand-related digital report. Review the AI analysis and risk ranges."
+
+		reportHero := ""
+		if hasReport {
+			reportHero = fmt.Sprintf(`
+            <div class="hero-card">
+                <div class="hero-label">Photo of report</div>
+                <img src="cid:%s" alt="Report Image" />
+            </div>`, reportImgCid)
+		}
+
+		locationHero := ""
+		if hasMap {
+			locationHero = fmt.Sprintf(`
+            <div class="hero-card">
+                <div class="hero-label">Photo of location</div>
+                <img src="cid:%s" alt="Location Map" />
+            </div>`, mapImgCid)
+		}
+
+		heroImages := ""
+		if reportHero != "" || locationHero != "" {
+			heroImages = fmt.Sprintf(`
+        <div class="hero-grid">%s%s
+        </div>`, reportHero, locationHero)
+		}
+
+		return fmt.Sprintf(`<!DOCTYPE html>
+<html>
+<head>
+    <meta charset="utf-8">
+    <title>%s</title>
+    <style>
+        body { font-family: Arial, sans-serif; line-height: 1.6; color: #1f2937; background: #f7f7f8; margin: 0; padding: 0; }
+        .preheader { display: none; visibility: hidden; opacity: 0; height: 0; width: 0; overflow: hidden; }
+        .container { max-width: 720px; margin: 0 auto; padding: 24px; background: #ffffff; }
+        .hero { background: linear-gradient(135deg, #0f766e, #14b8a6); color: #ffffff; padding: 28px; border-radius: 14px; box-shadow: 0 10px 30px rgba(0,0,0,0.12); }
+        .eyebrow { text-transform: uppercase; letter-spacing: 0.08em; font-weight: 700; font-size: 12px; margin: 0 0 6px 0; opacity: 0.85; }
+        h1 { margin: 0 0 10px 0; font-size: 26px; }
+        .subhead { margin: 0 0 12px 0; font-size: 16px; opacity: 0.95; }
+        .lede { margin: 0 0 18px 0; font-size: 15px; }
+        .cta { display: inline-block; background: #ffffff; color: #0f172a; padding: 12px 18px; border-radius: 10px; text-decoration: none; font-weight: 700; box-shadow: 0 8px 20px rgba(0,0,0,0.12); }
+        .card { margin-top: 24px; padding: 18px; border: 1px solid #e5e7eb; border-radius: 12px; background: #f8fafc; }
+        .card h3 { margin-top: 0; color: #0f172a; }
+        .card p { margin: 6px 0; }
+        .card .note { margin-top: 12px; color: #475569; }
+        .hero-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(220px, 1fr)); gap: 16px; margin-top: 18px; }
+        .hero-card { background: #0b766c0d; border: 1px solid #d1fae5; border-radius: 12px; padding: 12px; text-align: center; }
+        .hero-label { font-weight: 700; color: #0f766e; margin-bottom: 10px; }
+        .hero-card img { max-width: 100%%; border-radius: 10px; }
+        .footer { margin-top: 24px; font-size: 13px; color: #6b7280; text-align: left; }
+        .footer a { color: #0ea5e9; text-decoration: none; }
+    </style>
+</head>
+<body>
+    <div class="preheader">%s</div>
+    <div class="container">
+        <div class="hero">
+            <p class="eyebrow">CleanApp alert</p>
+            <h1>Major new issue reported for your brand</h1>
+            <p class="subhead">Someone just submitted a brand-related digital report.</p>
+            <p class="lede">CleanApp AI analyzed this issue to highlight potential legal and risk ranges connected to your brand presence.</p>
+            <a class="cta" href="%s">Open brand dashboard</a>
+        </div>
+
+        <div class="card">
+            <h3>AI analysis summary</h3>
+            <p><strong>Title:</strong> %s</p>
+            <p><strong>Description:</strong> %s</p>
+            <p><strong>Type:</strong> Digital Issue</p>
+            <p class="note">Review the dashboard to see the AI rationale, mapped legal/risk ranges, and supporting media.</p>
+        </div>%s
+
+        <div class="footer">
+            <p>To unsubscribe from these emails, please <a href="%s?email=%s">click here</a>.</p>
+        </div>
+    </div>
+</body>
+</html>`,
+			subjectLine,
+			preheader,
+			e.config.BrandDashboardURL,
 			analysis.Title,
 			analysis.Description,
-			analysis.LitterProbability*100,
-			analysis.HazardProbability*100,
-			analysis.SeverityLevel,
-			attachments,
+			heroImages,
 			e.config.OptOutURL,
 			recipient)
 	}
 
-	return content
-}
-
-// getEmailHtmlWithAnalysis returns the HTML content for emails with analysis data
-func (e *EmailSender) getEmailHtmlWithAnalysis(recipient string, analysis *models.ReportAnalysis, hasReport, hasMap bool) string {
 	// Calculate gauge colors based on values
 	litterColor := e.getGaugeColor(analysis.LitterProbability)
 	hazardColor := e.getGaugeColor(analysis.HazardProbability)
 	severityColor := e.getSeverityGaugeColor(analysis.SeverityLevel)
 
-	// Determine if this is a digital report
-	isDigital := analysis.Classification == "digital"
-
 	imagesSection := ""
 	if hasReport {
 		imagesSection += fmt.Sprintf(`
@@ -403,21 +518,21 @@ func (e *EmailSender) getEmailHtmlWithAnalysis(recipient string, analysis *model
         <h2>CleanApp Report Analysis</h2>
         <p>A new report has been analyzed and requires your attention.</p>
     </div>
-    
+
     <div class="analysis-section">
         <h3>Report Details</h3>
         <p><strong>Title:</strong> %s</p>
         <p><strong>Description:</strong> %s</p>
         <p><strong>Type:</strong> %s</p>
     </div>
-    
+
     %s
-    
+
     <div class="images">%s
     </div>
-    
+
     <p><em>Best regards,<br>The CleanApp Team</em></p>
-    
+
     <div style="margin-top: 30px; padding-top: 20px; border-top: 1px solid #eee; font-size: 0.9em; color: #666;">
         <p>To unsubscribe from these emails, please <a href="%s?email=%s" style="color: #007bff; text-decoration: none;">click here</a></p>
     </div>