A small, demo-ready prototype that mirrors an operator-first ecommerce analytics product: ingest messy ad and order data, normalize it, compute business metrics, and surface insights plus an optional LLM agent for natural-language Q&A.
- Loads mock data: Synthetic Shopify-style orders and Meta/TikTok-style ad CSVs with intentional schema and naming messiness.
- Normalizes and models: Raw CSVs are mapped into canonical campaign, order, and attribution tables so platform-specific fields (e.g.
adset_namevsadgroup_name) are unified. - Computes metrics: Spend, platform-attributed ROAS, blended ROAS, CAC, conversion rate, AOV, gross margin, margin-adjusted ROAS, and spend/revenue trends at campaign and channel level.
- Surfaces discrepancies: Highlights where platform-reported attribution overstates true business impact.
- Operator insights: Deterministic rules recommend which campaign to scale, which to pause, and where attribution is misleading.
- LLM agent (optional): Answers questions like "Which campaign is underperforming?" or "Where is attribution misleading?" using structured tools (no raw CSV access). Works in mock mode without an API key.
# Create venv and install
make install
# or: python3 -m venv .venv && .venv/bin/pip install -r requirements.txt
# Generate mock data (if not already present)
make data
# or: .venv/bin/python -c "from src.data_generation.generate_mock_data import generate_mock_data; generate_mock_data()"
# Run the Streamlit app
make run
# or: .venv/bin/streamlit run app.pyOpen http://localhost:8501. Use the Q&A tab to ask questions; try the suggested prompts to walk through the demo story.
- Mock data:
data/raw/— syntheticshopify_orders.csv,meta_ads.csv,tiktok_ads.csv,blended_attribution.csvwith a 30-day narrative (one campaign to scale, one to pause, one with inflated attribution, rising spend with weak outcome). - Ingestion:
src/ingestion/loaders.py— loads CSVs, ensures mock data exists, delegates to normalization. - Normalization:
src/normalization/schemas.py,transformers.py— canonical schemas and raw-to-canonical transforms (dates, campaign/product names, column mapping). - Metrics:
src/metrics/engine.py— campaign/channel/summary KPIs and daily trends. - Insights:
src/insights/rules.py— scale/pause/watch and attribution-inflated rules; revenue-drop explainer. - Agent:
src/agent/tools.py,llm_agent.py— structured tools (e.g.get_campaign_metrics,get_attribution_discrepancies); optional OpenAI tool-calling or mock keyword routing. - UI:
app.py+src/ui/components.py— Streamlit tabs: Overview, Campaigns, Attribution & insights, Trends, Q&A.
The synthetic data is designed so that:
- Meta | Prospecting | Hero Serum is the true winner (strong blended ROAS and margin-adjusted performance) — scale.
- TikTok | Broad US | Creator Hook wastes spend (high impressions, weak blended ROAS) — pause.
- Meta | Retargeting | Cart Return looks great on platform ROAS but blended ROAS is much lower — attribution is misleading.
- TikTok overall shows rising spend while blended revenue flattens or drops — channel efficiency declining.
So the prototype can clearly show why "correct modeling" and blended/margin metrics matter vs naive platform dashboards.
- Which campaign is underperforming and why?
- Which campaign should we scale next?
- Where is attribution misleading us?
- Why did revenue drop last week even though clicks increased?
- Which channel is increasing spend without improving business outcomes?
- What does margin-adjusted performance say versus platform ROAS?
- Which campaign has the healthiest blended ROAS and CAC combination?
- Give me a summary of operator insights and recommendations.
- Mock mode (default): No API key required; the agent uses keyword routing and the same tools to produce answers.
- OpenAI-compatible LLM: Copy
.env.exampleto.envand setOPENAI_API_KEY(and optionallyOPENAI_BASE_URL,OPENAI_MODEL). The agent will use tool calling when the key is present.
| Current (mock) | Replace later with |
|---|---|
| CSV files | Shopify Orders API, Meta/TikTok Ads API |
| Blended attribution CSV | Real attribution model / warehouse |
| Static insight rules | Tuned thresholds, more signals |
| Optional LLM | Always-on copilot, more tools |
make test
# or: .venv/bin/python -m pytest tests/ -vMIT.