Modern.ai Operator

A small, demo-ready prototype that mirrors an operator-first ecommerce analytics product: ingest messy ad and order data, normalize it, compute business metrics, and surface insights plus an optional LLM agent for natural-language Q&A.

What this prototype does

Loads mock data: Synthetic Shopify-style orders and Meta/TikTok-style ad CSVs with intentional schema and naming messiness.
Normalizes and models: Raw CSVs are mapped into canonical campaign, order, and attribution tables so platform-specific fields (e.g. adset_name vs adgroup_name) are unified.
Computes metrics: Spend, platform-attributed ROAS, blended ROAS, CAC, conversion rate, AOV, gross margin, margin-adjusted ROAS, and spend/revenue trends at campaign and channel level.
Surfaces discrepancies: Highlights where platform-reported attribution overstates true business impact.
Operator insights: Deterministic rules recommend which campaign to scale, which to pause, and where attribution is misleading.
LLM agent (optional): Answers questions like "Which campaign is underperforming?" or "Where is attribution misleading?" using structured tools (no raw CSV access). Works in mock mode without an API key.

Quick start

# Create venv and install
make install
# or: python3 -m venv .venv && .venv/bin/pip install -r requirements.txt

# Generate mock data (if not already present)
make data
# or: .venv/bin/python -c "from src.data_generation.generate_mock_data import generate_mock_data; generate_mock_data()"

# Run the Streamlit app
make run
# or: .venv/bin/streamlit run app.py

Open http://localhost:8501. Use the Q&A tab to ask questions; try the suggested prompts to walk through the demo story.

Architecture

Mock data: data/raw/ — synthetic shopify_orders.csv, meta_ads.csv, tiktok_ads.csv, blended_attribution.csv with a 30-day narrative (one campaign to scale, one to pause, one with inflated attribution, rising spend with weak outcome).
Ingestion: src/ingestion/loaders.py — loads CSVs, ensures mock data exists, delegates to normalization.
Normalization: src/normalization/schemas.py, transformers.py — canonical schemas and raw-to-canonical transforms (dates, campaign/product names, column mapping).
Metrics: src/metrics/engine.py — campaign/channel/summary KPIs and daily trends.
Insights: src/insights/rules.py — scale/pause/watch and attribution-inflated rules; revenue-drop explainer.
Agent: src/agent/tools.py, llm_agent.py — structured tools (e.g. get_campaign_metrics, get_attribution_discrepancies); optional OpenAI tool-calling or mock keyword routing.
UI: app.py + src/ui/components.py — Streamlit tabs: Overview, Campaigns, Attribution & insights, Trends, Q&A.

Demo story (mock data)

The synthetic data is designed so that:

Meta | Prospecting | Hero Serum is the true winner (strong blended ROAS and margin-adjusted performance) — scale.
TikTok | Broad US | Creator Hook wastes spend (high impressions, weak blended ROAS) — pause.
Meta | Retargeting | Cart Return looks great on platform ROAS but blended ROAS is much lower — attribution is misleading.
TikTok overall shows rising spend while blended revenue flattens or drops — channel efficiency declining.

So the prototype can clearly show why "correct modeling" and blended/margin metrics matter vs naive platform dashboards.

Demo questions (suggested)

Which campaign is underperforming and why?
Which campaign should we scale next?
Where is attribution misleading us?
Why did revenue drop last week even though clicks increased?
Which channel is increasing spend without improving business outcomes?
What does margin-adjusted performance say versus platform ROAS?
Which campaign has the healthiest blended ROAS and CAC combination?
Give me a summary of operator insights and recommendations.

Configuration

Mock mode (default): No API key required; the agent uses keyword routing and the same tools to produce answers.
OpenAI-compatible LLM: Copy .env.example to .env and set OPENAI_API_KEY (and optionally OPENAI_BASE_URL, OPENAI_MODEL). The agent will use tool calling when the key is present.

What is mocked vs replaceable later

Current (mock)	Replace later with
CSV files	Shopify Orders API, Meta/TikTok Ads API
Blended attribution CSV	Real attribution model / warehouse
Static insight rules	Tuned thresholds, more signals
Optional LLM	Always-on copilot, more tools

Tests

make test
# or: .venv/bin/python -m pytest tests/ -v

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
docs		docs
src		src
tests		tests
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modern.ai Operator

What this prototype does

Quick start

Architecture

Demo story (mock data)

Demo questions (suggested)

Configuration

What is mocked vs replaceable later

Tests

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Modern.ai Operator

What this prototype does

Quick start

Architecture

Demo story (mock data)

Demo questions (suggested)

Configuration

What is mocked vs replaceable later

Tests

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages