Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions docs/physical_contact_email_inference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Physical Location Contact Email Inference (Proposal)

This document outlines a free, robust approach for inferring contact emails for physical locations (e.g., campus buildings) using OpenStreetMap (OSM) and public data. It complements the existing digital-contact inference pipeline.

## Data sources (free)
- **OpenStreetMap (primary)**
- Overpass API (read-only) to query objects around a coordinate or polygon.
- OSM tags often include `contact:email`, `email`, `operator`, `owner`, `brand`, and website URLs.
- OSM `addr:*` fields and `name`/`operator` labels help derive domains.
- **Institutional websites (fallback)**
- Fetch public homepages returned by OSM (e.g., `website` tag) and parse mailto links or `Contact` pages with a small HTML scraper.
- **Public campus directories (optional)**
- Many universities expose JSON/CSV directory endpoints or accessible HTML that can be scraped politely for role-based emails (e.g., `facilities@ucla.edu`).

## Inference pipeline
1. **Locate the feature**
- Reverse-geocode report coordinates with OSM Nominatim to get campus/building names and OSM IDs.
- Run an Overpass query for features within a small radius (e.g., 150–250m) ordered by distance that match expected facility types: `amenity=*`, `building=*`, `office=*`, `university`, `school`, `hospital`, `public_transport`, `shop`, `tourism`, etc.
- Prefer features whose `name`/`operator`/`brand` strings overlap with the report text (e.g., "UCLA Law" matches `name~"UCLA"` and `name~"Law"`).

2. **Direct email extraction (OSM tags)**
- If any candidate has `contact:email` or `email`, collect them immediately.
- If missing, use `website`, `contact:website`, `operator:website`, or `brand:wikipedia` URLs to crawl for emails.

3. **Domain inference (for campuses/businesses)**
- Derive a canonical domain from OSM tags:
- `website` or `operator:website` (strip path and protocol).
- If absent, construct from operator name via a ruleset (e.g., `University of California, Los Angeles` → `ucla.edu`; `City of Santa Monica` → `santamonica.gov`).
- Generate role-based emails using campus-aware heuristics:
- Default set: `info@<domain>`, `support@<domain>`, `contact@<domain>`, `help@<domain>`.
- Facilities/custodial: `facilities@<domain>`, `maintenance@<domain>`, `custodian@<domain>`, `grounds@<domain>`.
- Safety/security: `security@<domain>`, `police@<domain>`, `publicsafety@<domain>`.
- Accessibility: `ada@<domain>` or `accessibility@<domain>`.
- If the OSM feature name includes a subdivision (e.g., `School of Law`), prepend it when forming custodial contacts: `law@<domain>`, `custodian-law@<domain>`, `facilities-law@<domain>`.

4. **Hierarchy-aware expansion**
- Walk up the place hierarchy from the OSM reverse-geocode result:
- Specific feature (building) → campus (e.g., `UCLA`) → operator/owner (e.g., `University of California` or `City of Los Angeles`).
- For each level, apply domain inference and generate role-based candidates. This yields multiple responsible parties (e.g., building facilities, campus facilities, city public works).

5. **Scoring and de-duplication**
- Score candidates by evidence source:
1. OSM-tagged emails (exact) — highest.
2. Emails scraped from linked websites.
3. Role-based heuristics on feature domain.
4. Role-based heuristics on parent domain.
- Remove duplicates, normalize casing, and keep the top 3–5 distinct addresses for notification.

6. **Safety and quality controls**
- Validate email syntax (RFC 5322 regex) and filter obvious placeholders (e.g., `test@`, `example@`).
- Enforce a allowlist for academic/government TLDs when inferring from place names (`.edu`, `.gov`, `.ca.gov`, `.org`).
- Rate-limit Overpass/HTTP calls and cache results per `(lat, lon)` and feature name to respect service limits.

## Example (UCLA Law School report)
- Reverse-geocode → `UCLA School of Law` building, operator `University of California, Los Angeles`, website `https://law.ucla.edu`.
- OSM lacks direct email.
- Domain inference yields `law.ucla.edu` (building) and `ucla.edu` (campus/owner).
- Generated candidates:
- `contact@law.ucla.edu`, `facilities@law.ucla.edu`, `custodian-law@ucla.edu`.
- Campus-level: `facilities@ucla.edu`, `security@ucla.edu`, `accessibility@ucla.edu`.
- Score by specificity; keep top 4–5 unique addresses for the notification batch.

## Implementation notes
- Add an `osm-contact-inference` module that:
- Accepts `(lat, lon, report_text)` and returns ranked emails with provenance.
- Uses an Overpass client (e.g., `requests` + Overpass QL) with a small template query.
- Integrates with existing analysis pipeline to populate `inferred_contact_emails` for physical reports, tagging source (`osm_tag`, `website_scrape`, `heuristic_feature`, `heuristic_parent`).
- Provide unit tests with canned Overpass responses to keep deterministic.
- Keep the system toggleable via config flag to guard against rate-limits.
4 changes: 4 additions & 0 deletions email-service/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ type Config struct {
OptOutURL string
PollInterval string
HTTPPort string

// Brand dashboard configuration
BrandDashboardURL string
}

// Load loads configuration from environment variables and flags
Expand All @@ -46,6 +49,7 @@ func Load() *Config {
cfg.OptOutURL = getEnv("OPT_OUT_URL", "http://localhost:8080/opt-out")
cfg.PollInterval = getEnv("POLL_INTERVAL", "10s")
cfg.HTTPPort = getEnv("HTTP_PORT", "8080")
cfg.BrandDashboardURL = getEnv("BRAND_DASHBOARD_URL", "https://dashboard.cleanapp.io/brand")

return cfg
}
Expand Down
189 changes: 152 additions & 37 deletions email-service/email/email_sender.go
Original file line number Diff line number Diff line change
Expand Up @@ -154,8 +154,16 @@ func (e *EmailSender) sendOneEmailWithAnalysis(recipient string, reportImage, ma

// Create subject with analysis title
subject := "CleanApp Report"
isDigital := analysis != nil && analysis.Classification == "digital"
if isDigital {
subject = "CleanApp alert: major new issue reported for your brand"
}
if analysis.Title != "" {
subject = fmt.Sprintf("CleanApp Report: %s", analysis.Title)
if isDigital {
subject = fmt.Sprintf("CleanApp alert: major new issue — %s", analysis.Title)
} else {
subject = fmt.Sprintf("CleanApp Report: %s", analysis.Title)
}
}

to := mail.NewEmail(recipient, recipient)
Expand Down Expand Up @@ -279,43 +287,63 @@ func (e *EmailSender) getEmailHtml(recipient string, hasReport, hasMap bool) str

// getEmailTextWithAnalysis returns the plain text content for emails with analysis data
func (e *EmailSender) getEmailTextWithAnalysis(recipient string, analysis *models.ReportAnalysis, hasReport, hasMap bool) string {
var content string
if analysis.Classification == "digital" {
digitalSubject := "CleanApp alert: major new issue reported for your brand"
preheader := "Someone just submitted a brand-related digital report with photos."

attachments := ""
if hasReport || hasMap {
attachments = "\nThis email contains:\n"
heroReport := ""
if hasReport {
attachments += "- The report image\n"
heroReport = "\n- Hero: photo of report included."
}

heroLocation := ""
if hasMap {
attachments += "- A map showing the location\n"
heroLocation = "\n- Hero: photo of location included."
}
attachments += "- AI analysis results\n"
}
if analysis.Classification == "digital" {
content = fmt.Sprintf(`Hello,

You have received a new CleanApp digital issue report with analysis.
return fmt.Sprintf(`%s
Preheader: %s

REPORT ANALYSIS:
Title: %s
Description: %s
Type: Digital Issue
Someone just submitted a new digital report mentioning your brand.
CleanApp AI analyzed this issue to highlight potential legal and risk ranges connected to your brand presence.%s%s

AI analysis summary:
- Title: %s
- Description: %s
- Type: Digital Issue

Open the Brand Dashboard to see the AI rationale, mapped areas, and supporting media:
%s
Note: This is a digital issue report. Physical metrics (litter/hazard probability) are not applicable.

To unsubscribe from these emails, please visit: %s?email=%s
You can also reply to this email with "UNSUBSCRIBE" in the subject line.

Best regards,
The CleanApp Team`,
digitalSubject,
preheader,
heroReport,
heroLocation,
analysis.Title,
analysis.Description,
attachments,
e.config.BrandDashboardURL,
e.config.OptOutURL,
recipient)
} else {
content = fmt.Sprintf(`Hello,
}

attachments := ""
if hasReport || hasMap {
attachments = "\nThis email contains:\n"
if hasReport {
attachments += "- The report image\n"
}
if hasMap {
attachments += "- A map showing the location\n"
}
attachments += "- AI analysis results\n"
}

return fmt.Sprintf(`Hello,

You have received a new CleanApp report with analysis.

Expand All @@ -334,29 +362,116 @@ You can also reply to this email with "UNSUBSCRIBE" in the subject line.

Best regards,
The CleanApp Team`,
analysis.Title,
analysis.Description,
analysis.LitterProbability*100,
analysis.HazardProbability*100,
analysis.SeverityLevel,
attachments,
e.config.OptOutURL,
recipient)
}

// getEmailHtmlWithAnalysis returns the HTML content for emails with analysis data
func (e *EmailSender) getEmailHtmlWithAnalysis(recipient string, analysis *models.ReportAnalysis, hasReport, hasMap bool) string {
isDigital := analysis.Classification == "digital"

if isDigital {
subjectLine := "CleanApp alert: major new issue reported for your brand"
preheader := "Someone just submitted a brand-related digital report. Review the AI analysis and risk ranges."

reportHero := ""
if hasReport {
reportHero = fmt.Sprintf(`
<div class="hero-card">
<div class="hero-label">Photo of report</div>
<img src="cid:%s" alt="Report Image" />
</div>`, reportImgCid)
}

locationHero := ""
if hasMap {
locationHero = fmt.Sprintf(`
<div class="hero-card">
<div class="hero-label">Photo of location</div>
<img src="cid:%s" alt="Location Map" />
</div>`, mapImgCid)
}

heroImages := ""
if reportHero != "" || locationHero != "" {
heroImages = fmt.Sprintf(`
<div class="hero-grid">%s%s
</div>`, reportHero, locationHero)
}

return fmt.Sprintf(`<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>%s</title>
<style>
body { font-family: Arial, sans-serif; line-height: 1.6; color: #1f2937; background: #f7f7f8; margin: 0; padding: 0; }
.preheader { display: none; visibility: hidden; opacity: 0; height: 0; width: 0; overflow: hidden; }
.container { max-width: 720px; margin: 0 auto; padding: 24px; background: #ffffff; }
.hero { background: linear-gradient(135deg, #0f766e, #14b8a6); color: #ffffff; padding: 28px; border-radius: 14px; box-shadow: 0 10px 30px rgba(0,0,0,0.12); }
.eyebrow { text-transform: uppercase; letter-spacing: 0.08em; font-weight: 700; font-size: 12px; margin: 0 0 6px 0; opacity: 0.85; }
h1 { margin: 0 0 10px 0; font-size: 26px; }
.subhead { margin: 0 0 12px 0; font-size: 16px; opacity: 0.95; }
.lede { margin: 0 0 18px 0; font-size: 15px; }
.cta { display: inline-block; background: #ffffff; color: #0f172a; padding: 12px 18px; border-radius: 10px; text-decoration: none; font-weight: 700; box-shadow: 0 8px 20px rgba(0,0,0,0.12); }
.card { margin-top: 24px; padding: 18px; border: 1px solid #e5e7eb; border-radius: 12px; background: #f8fafc; }
.card h3 { margin-top: 0; color: #0f172a; }
.card p { margin: 6px 0; }
.card .note { margin-top: 12px; color: #475569; }
.hero-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(220px, 1fr)); gap: 16px; margin-top: 18px; }
.hero-card { background: #0b766c0d; border: 1px solid #d1fae5; border-radius: 12px; padding: 12px; text-align: center; }
.hero-label { font-weight: 700; color: #0f766e; margin-bottom: 10px; }
.hero-card img { max-width: 100%%; border-radius: 10px; }
.footer { margin-top: 24px; font-size: 13px; color: #6b7280; text-align: left; }
.footer a { color: #0ea5e9; text-decoration: none; }
</style>
</head>
<body>
<div class="preheader">%s</div>
<div class="container">
<div class="hero">
<p class="eyebrow">CleanApp alert</p>
<h1>Major new issue reported for your brand</h1>
<p class="subhead">Someone just submitted a brand-related digital report.</p>
<p class="lede">CleanApp AI analyzed this issue to highlight potential legal and risk ranges connected to your brand presence.</p>
<a class="cta" href="%s">Open brand dashboard</a>
</div>

<div class="card">
<h3>AI analysis summary</h3>
<p><strong>Title:</strong> %s</p>
<p><strong>Description:</strong> %s</p>
<p><strong>Type:</strong> Digital Issue</p>
<p class="note">Review the dashboard to see the AI rationale, mapped legal/risk ranges, and supporting media.</p>
</div>%s

<div class="footer">
<p>To unsubscribe from these emails, please <a href="%s?email=%s">click here</a>.</p>
</div>
</div>
</body>
</html>`,
subjectLine,
preheader,
e.config.BrandDashboardURL,
analysis.Title,
analysis.Description,
analysis.LitterProbability*100,
analysis.HazardProbability*100,
analysis.SeverityLevel,
attachments,
heroImages,
e.config.OptOutURL,
recipient)
}

return content
}

// getEmailHtmlWithAnalysis returns the HTML content for emails with analysis data
func (e *EmailSender) getEmailHtmlWithAnalysis(recipient string, analysis *models.ReportAnalysis, hasReport, hasMap bool) string {
// Calculate gauge colors based on values
litterColor := e.getGaugeColor(analysis.LitterProbability)
hazardColor := e.getGaugeColor(analysis.HazardProbability)
severityColor := e.getSeverityGaugeColor(analysis.SeverityLevel)

// Determine if this is a digital report
isDigital := analysis.Classification == "digital"

imagesSection := ""
if hasReport {
imagesSection += fmt.Sprintf(`
Expand Down Expand Up @@ -403,21 +518,21 @@ func (e *EmailSender) getEmailHtmlWithAnalysis(recipient string, analysis *model
<h2>CleanApp Report Analysis</h2>
<p>A new report has been analyzed and requires your attention.</p>
</div>

<div class="analysis-section">
<h3>Report Details</h3>
<p><strong>Title:</strong> %s</p>
<p><strong>Description:</strong> %s</p>
<p><strong>Type:</strong> %s</p>
</div>

%s

<div class="images">%s
</div>

<p><em>Best regards,<br>The CleanApp Team</em></p>

<div style="margin-top: 30px; padding-top: 20px; border-top: 1px solid #eee; font-size: 0.9em; color: #666;">
<p>To unsubscribe from these emails, please <a href="%s?email=%s" style="color: #007bff; text-decoration: none;">click here</a></p>
</div>
Expand Down