Skip to content

Production hardening: 6 improvements from live usage #1

@shyftai

Description

@shyftai

Context

All discovered during real campaign operations (Shyft workspace, 8,690 contacts across 3 lists, 6 Instantly campaigns, Serper.dev enrichment, Supabase reply polling). GTM:OS v1.4.0 (434422d), solo mode, AUTO execution.


1. Scripts: Enforce absolute paths in all scripts

Problem: Scripts use relative paths (workspaces/shyft/lists/...). When run from a subagent or different working directory, output files write to the wrong location or vanish silently. We lost 8,690 enriched contacts because background agents ran from a different CWD.

Fix: All scripts should resolve paths relative to __file__ or REPO_ROOT, never assume CWD. Add to RULES-GLOBAL.md as a script standard:

SCRIPT_DIR = Path(__file__).resolve().parent
REPO_ROOT = SCRIPT_DIR.parent

2. Enrichment: Write-through to cache/enrichments/

Problem: Enrichment scripts write only to lists/. If the session drops or output path is wrong, all work is lost. The new scrape cache system (v1.4.0) exists but enrichment scripts don't use it yet.

Fix: Every enrichment run should dual-write: output CSV in lists/ + backup in cache/enrichments/. Update the enrichment waterfall reference and script templates to include cache writes. Log in SCRAPE-JOURNAL.md.


3. Enrichment: Incremental checkpointing

Problem: Enrichment of 4,000+ contacts takes 20+ minutes. If it fails at contact 3,500, you restart from 0. Credits are wasted, time is lost.

Fix: Write results to disk every N contacts (e.g., every 100). The resume logic already exists (scripts skip contacts with filled last_name), but the file is only written once at the end. Move the file write into the loop. This aligns with the scrape cache rule "write after each page/batch."


4. API safety: Document destructive API endpoints

Problem: Instantly DELETE /api/v2/leads with delete_list flag wipes ALL leads in a campaign, not just the one specified. We lost 444 leads in a live campaign. The API docs don't warn about this. There is no safe single-lead suppression endpoint.

Fix:

  • Add a ## Dangerous endpoints section to api-reference.md
  • Flag any endpoint where blast radius exceeds expectation
  • GTM:OS should hard-gate (even in AUTO mode) before calling any endpoint on the dangerous list
  • For Instantly specifically: document that suppression must be done via campaign settings or lead status change, never via DELETE

5. Reply polling: Built-in edge function template

Problem: Reply polling (Instantly → Supabase) is a common need but requires building a custom edge function each time. We built one for FOUNDER:OS campaigns — it should be reusable.

Fix: Ship a reference poll-replies edge function in _template/supabase/functions/. Include:

  • Multi-campaign polling
  • Classification: positive, negative, OOO (with multilingual return date parsing), unsubscribe, redirect, wrong person
  • Dedup by instantly_email_id
  • Slack webhook notification for positive replies
  • Maps to reply_queue table with ai_intent enum

6. OOO re-touch: Automated follow-up after return date

Problem: OOO replies are detected and return dates parsed, but there's no automated follow-up. The operator must manually track return dates and write re-touch messages.

Fix: Add /gtm:re-engage --ooo (or extend existing /gtm:re-engage) that:

  1. Queries reply_queue for ai_intent = 'out_of_office' with ai_analysis->>'ooo_return_date' in the past
  2. Drafts a personalized re-touch referencing their absence without assuming too much
  3. Queues for approval (hard gate — outbound reply)
  4. On approval, sends via Instantly reply API

This closes the loop on OOO handling end-to-end.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions