Skip to content

feat: Client-side metadata contribution system for blocked URLs #679

@hellno

Description

@hellno

Summary

Enable users to contribute URL metadata when server-side fetching fails (blocked by Cloudflare, bot protection, etc.). Creates a crowdsourced metadata cache that improves over time.

Problem

Current server-side metadata fetching (Trek → Microlink → Neynar) fails for ~30% of URLs due to:

  • Cloudflare bot protection
  • Rate limiting
  • Datacenter IP blocking

Result: Many URLs show empty/broken previews.

Proposed Solution

When server fails, signal the client to attempt fetching via a CORS proxy (allorigins.win), parse with Trek WASM (same parser as server), and contribute metadata back to a persistent Supabase cache.

Server fails → Client tries allorigins.win → Parse with Trek WASM → POST to /api/embeds/metadata/contribute → Cached for all users

Key Design Decisions

  • Parser: Trek WASM on client (same as server, consistent results)
  • Storage: New url_metadata Supabase table with RLS
  • Source tracking: Track where metadata came from (server-trek, client-allorigins, client-manual)
  • Auth: Require login for contributions (existing Supabase auth)
  • Rate limiting: Skip for now, can add later
  • Manual fallback UI: Not in v1, but DB schema supports it

Database Schema

CREATE TABLE url_metadata (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  url TEXT NOT NULL,
  url_hash TEXT NOT NULL, -- SHA256 for indexing
  title TEXT,
  description TEXT,
  image TEXT,
  favicon TEXT,
  source TEXT NOT NULL, -- 'server-trek', 'client-allorigins', 'client-manual'
  contributed_by UUID REFERENCES auth.users(id),
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW(),
  status TEXT DEFAULT 'active',
  metadata JSONB, -- Future extensibility
  UNIQUE(url_hash)
);

API Changes

GET /api/embeds/metadata - Returns { needsClientFetch: true } when server fails

POST /api/embeds/metadata/contribute (NEW) - Accepts user-contributed metadata

Implementation Phases

  1. Database: Create url_metadata table + RLS policies
  2. Server: Check contributed cache, return needsClientFetch signal
  3. Client: Trek WASM loading, allorigins fetch, contribute endpoint call
  4. Future: Manual contribution UI, voting, moderation

Full PRD

See docs/prd-client-metadata-contribution.md for complete technical design.

Open Questions

  • WASM bundle size (~1.8MB) - lazy load only when needed?
  • Fallback CORS proxies if allorigins.win fails?
  • Contribution conflict resolution (currently: last write wins)
  • Cache expiration policy?

Related

  • Current metadata route: app/api/embeds/metadata/route.ts
  • Trek integration added in recent commits

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions