Skip to content

feat: Competitor-Aware Content Differentiation (#37)#54

Open
Weegy wants to merge 15 commits intodevfrom
feature/competitor-differentiation
Open

feat: Competitor-Aware Content Differentiation (#37)#54
Weegy wants to merge 15 commits intodevfrom
feature/competitor-differentiation

Conversation

@Weegy
Copy link
Copy Markdown
Contributor

@Weegy Weegy commented Mar 16, 2026

Multi-source Content Differentiation

Implements comprehensive competitor analysis and content differentiation:

Infrastructure

  • 6 new database tables (competitor sources, content items, fingerprints, analyses, alerts, events)
  • Idempotent migrations with ULID primary keys

Crawler Infrastructure

  • CrawlerService orchestrates RSS, sitemap, scrape, and API crawlers
  • Pluggable crawler classes with queue integration
  • CrawlCompetitorSourceJob with automatic retries and stale-check

Fingerprinting & Similarity Analysis

  • ContentFingerprintService — TF-IDF vectorization of content
  • SimilarityCalculator — cosine similarity between fingerprints
  • SimilarContentFinder — identifies top-N similar competitor items

Differentiation Analysis Engine

  • LLM-powered angle/gap/recommendation extraction
  • DifferentiationResult value object for structured output
  • CompetitorAnalysisStage pipeline integration for automatic enrichment

Alert System

  • CompetitorAlertService with configurable rules
  • Alert types: new_content, keyword, high_similarity
  • Email and Slack webhook notifications
  • Generic HTTP webhook channel for custom integrations

Knowledge Graph Integration

REST API

  • CompetitorSourceController — full CRUD for sources
  • CompetitorController — content listing, crawl trigger, alerts
  • DifferentiationController — analysis browsing and summary
  • Form requests with comprehensive validation
  • JSON:API-style responses

Security Hardening

  • URL whitelist validation on competitor sources
  • Rate limiting: 500 req/day per source
  • Permission gating: manage-competitors role required
  • Internal-only CORS configuration

Monitoring & Retention

  • CrawlerHealthMonitor detects stale/high-error sources
  • RetentionPolicyService prunes old data on schedule
  • Scheduler: hourly health checks, weekly retention (Sun 02:00)
  • OpenAPI 3.1 spec included
  • Blog post documenting feature

Configuration

New environment variables:

COMPETITOR_ANALYSIS_ENABLED=true
COMPETITOR_SIMILARITY_THRESHOLD=0.25
COMPETITOR_MAX_ANALYZE=5
COMPETITOR_AUTO_ENRICH_BRIEFS=true
COMPETITOR_CONTENT_RETENTION_DAYS=90
COMPETITOR_ANALYSIS_RETENTION_DAYS=180
COMPETITOR_ALERT_EVENT_RETENTION_DAYS=30

Closes #37

numen-bot added 15 commits March 16, 2026 06:16
- DifferentiationResult value object (similarityScore, differentiationScore, angles[], gaps[], recommendations[])
- DifferentiationAnalysisService: analyze() + enrichBrief()
  - Calculates similarity/differentiation scores via SimilarityCalculator
  - LLM-powered angle/gap/recommendation generation via LLMManager
  - Stores results in differentiation_analyses table
  - Brief enrichment injects competitor context into brief requirements
- AnalyzeContentDifferentiationJob on 'competitor' queue
- Unit tests: DifferentiationResultTest, DifferentiationAnalysisServiceTest, AnalyzeContentDifferentiationJobTest
- Add CompetitorAnalysisStage implementing PipelineStageContract
  - Stage type: 'competitor_analysis'
  - Runs after brief creation, before content generation
  - Calls DifferentiationAnalysisService::enrichBrief() to add
    competitor context (angles, gaps, recommendations) to brief metadata
  - Configurable per-stage: enabled, similarity_threshold, max_competitors
  - Skips gracefully when disabled or no similar competitors found
  - Updates PipelineRun context with competitor_analysis data

- Update config/numen.php with 'competitor_analysis' section:
  - enabled (COMPETITOR_ANALYSIS_ENABLED)
  - similarity_threshold (COMPETITOR_SIMILARITY_THRESHOLD)
  - max_competitors_to_analyze (COMPETITOR_MAX_ANALYZE)
  - auto_enrich_briefs (COMPETITOR_AUTO_ENRICH_BRIEFS)

- Register CompetitorAnalysisStage in AppServiceProvider via
  HookRegistry::registerPipelineStageClass() — works with existing
  PipelineExecutor + PluginStageJob infrastructure

- Integration tests (tests/Feature/Competitor/CompetitorAnalysisStageTest.php):
  - Stage type/label/schema contract
  - Skips when stage config disabled
  - Skips when global config disabled
  - Skips gracefully with no competitors in DB
  - Enriches brief when similar competitors exist
  - Updates run context with competitor data
  - Enriches brief requirements array
  - Stage registered in HookRegistry
…ob, email/Slack/webhook channels (chunk 6/10)
…r, competitor nodes + similarity edges (chunk 7/10)
…rceManager, DifferentiationScoreWidget, trend chart (chunk 9/10)
- Add space_id (string 26, indexed) to migration
- Add space_id to CompetitorContentItem $fillable and space() BelongsTo relation
- Update CompetitorContentItemFactory to include space_id via Space::factory()
- Fix ContentFingerprintFactory to use keyword => score format (not plain array)
- Fix ContentFingerprintService::fingerprint() to handle ContentBrief, using
  target_keywords as primary topics for accurate similarity matching
- Fix SimilarityCalculator::buildKeywordVector() to handle both numeric-indexed
  and associative keyword arrays
- Add RefreshDatabase to CrawlerServiceTest (needed for DB-backed dedup test)
- Add tests/bootstrap.php to fix APP_BASE_PATH for git worktree + symlinked vendor
- Set APP_BASE_PATH in phpunit.xml so Application::inferBasePath() resolves correctly
  (vendor/ is symlinked to main repo; without APP_BASE_PATH, migrations load from
  wrong directory)
- Add space_id column to migration (string 26, ULID-compatible)
- Add space_id to CompetitorContentItem fillable + space() BelongsTo
- Fix ContentFingerprintService: handle ContentBrief model, use
  firstOrCreate to preserve seeded fingerprints in tests
- Fix SimilarityCalculator: handle indexed keyword arrays (list format)
  alongside associative (term => score) format
- Add APP_BASE_PATH to phpunit.xml so RefreshDatabase finds correct
  migrations in the git worktree
…r differentiation

- IDOR: Add space ownership checks to CompetitorController (crawl, alerts, destroyAlert),
  CompetitorSourceController (index, store, show, update, destroy),
  DifferentiationController (index, show, summary), and GraphQL mutations
  (TriggerCompetitorCrawl, DeleteCompetitorSource, DeleteCompetitorAlert, UpdateCompetitorSource)
- SSRF: Add ExternalUrl rule to url/feed_url in StoreCompetitorSourceRequest and
  UpdateCompetitorSourceRequest; add ExternalUrl rule to slack_webhook/webhook_url
  in StoreCompetitorAlertRequest
- Space scoping: Verify space_id access on all collection endpoints
- Rate limiting: Add throttle:5,1 middleware to crawl trigger route
- Quota: Enforce max 50 competitor sources per space in store()
- Pre-existing: Remove duplicate match arm and duplicate extractFromBrief() method
  in ContentFingerprintService (caused phpstan errors)
@Weegy Weegy force-pushed the feature/competitor-differentiation branch from 0eed4fc to de44e9a Compare March 16, 2026 06:17
@Weegy Weegy added the enhancement New feature or request label Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant