Production-ready time-series forecasting for retail-style data. Who it’s for: ML/data teams and planners who want a transparent, configurable baseline they can run locally or deploy as a service.
FastAPI backend + lightweight web UI + configurable XGBoost training (global or per-group), with holiday features and what-if price/promo simulations.
- Architecture
- Highlights
- Quickstart
- Configuration
- Training
- Forecasting
- REST API
- Data Schema
- Performance Tips
- Deploy
- Developing
- Troubleshooting & FAQ
- License
boost_sales/
├─ api/
│ ├─ server.py # FastAPI app factory & routes (UI + REST)
│ ├─ schemas.py # Pydantic models
│ ├─ service.py # Forecast/train service glue
│ └─ core/
│ └─ horizons.py # Horizon parsing helpers
├─ pipeline/
│ ├─ train.py # train_global / train_per_group
│ └─ ... # feature engineering, model IO
├─ templates/ # Jinja2 HTML (forecast.html, training.html, base.html)
├─ static/
│ ├─ main.css
│ └─ main.js # UI logic + friendly error handling
├─ config.py # AppConfig & training knobs
└─ data/
└─ sales.csv # (optional) demo CSV or generated synthetic data
- FastAPI serves both the UI and REST API.
- Jinja2 templates for the Forecast and Training pages.
- XGBoost models per horizon (global or per-group).
- Holiday features via country/subdivision.
- Future controls: price/promo plans for scenario testing.
- Configurable: global or per-group models; lags/rolling windows; holiday effects; future price/promo controls.
- Fast: XGBoost
hist(CPU) &gpu_hist(GPU) support; global modeling for scale. - Reproducible: typed configs (Pydantic), versionable settings, deterministic seeds.
- Usable: simple web UI for explore/train/forecast, and a typed REST API.
- Practical validation: time-based splits with validation windows and auto-suggested
early_stopping_rounds(derived fromn_estimatorsand window size). - Smart defaults: holiday region, validation windows, and early stopping suggestions.
- Python 3.9+ (x64 build required; x86 is not supported by many ML wheels)
- pip, venv (recommended)
- (Optional) CUDA-enabled XGBoost for GPU
Bash (macOS/Linux):
git clone https://github.com/dmsfiris/boost-sales.git
cd boost-sales
python -m venv .venv && source .venv/bin/activate
pip install -U pip
pip install -e ".[webui]"PowerShell (Windows):
git clone https://github.com/dmsfiris/boost-sales.git
cd boost-sales
python -m venv .venv; .\.venv\Scripts\Activate.ps1
python -m pip install -U pip
pip install -e ".[webui]"If you don’t need the web UI, you can install without extras:
pip install -e .
Start the web UI + REST API using the CLI.
Bash (macOS/Linux):
boost-sales serve-web --host 0.0.0.0 --port 8000 --reload
# Fallback if the command isn't on PATH:
python -m boost_sales.cli serve-web --host 0.0.0.0 --port 8000 --reloadPowerShell (Windows):
boost-sales serve-web --host 0.0.0.0 --port 8000 --reload
# Fallback:
python -m boost_sales.cli serve-web --host 0.0.0.0 --port 8000 --reload- UI: http://localhost:8000
- API docs: http://localhost:8000/docs
- Defaults: the models directory is
./models(auto-created). If./data/sales.csvexists, it is used automatically; otherwise you can upload a CSV from the UI.
You can provide data in two ways if you already have it:
-
Point the app to your CSV (no transformation):
- Set an environment variable before starting the server:
export SF_DATA_CSV=./data/sales.csv # PowerShell: $env:SF_DATA_CSV=".\\data\\sales.csv"
- CSV must already match the expected schema:
date, store_id, item_id, price, promo, sales
- Set an environment variable before starting the server:
-
Upload a CSV in the UI (Forecast page) or call the
/forecast/csvendpoint withmultipart/form-data.
If your CSV does not match the expected schema, use the generator to prepare it.
Dates are clipped to today by default (no future rows). Add --allow-future to override.
# From a flat panel CSV (clean/validate to expected columns; does NOT synthesize)
python -m boost_sales.data.generate from-flat \
--flat-csv ./data/my_flat_panel.csv \
--out ./data/sales.csv \
--parse-dates
# From transactions (aggregate POS/e-commerce lines to a daily panel; does NOT synthesize)
python -m boost_sales.data.generate from-transactions \
--tx-csv ./data/transactions.csv \
--out ./data/sales.csv \
--sales-as sum \
--price-strategy weighted_avg \
--promo-strategy column \
--parse-datesIf you don’t have real data yet, you can generate a synthetic dataset:
# Synthetic dataset (creates data from scratch)
python -m boost_sales.data.generate synthetic \
--out ./data/sales.csv \
--stores 3 \
--items 50 \
--days 365 \
--start 2025-01-01 \
--seed 42Tip: Install in editable mode
pip install -e .so-m boost_sales.data.generateworks in your venv. See all flags with:python -m boost_sales.data.generate -h.
Use the generated (or demo) CSV and the default configuration:
# Train (from the Training page), then request a simple forecast via API:
curl -X POST http://localhost:8000/forecast \
-H "Content-Type: application/json" \
-d '{
"scope":"single",
"store_id":"S01",
"item_id":"I01",
"horizons":"1-7",
"use_server_csv":true,
"unit_type":"integer",
"decimal_places":0
}'Main config lives in boost_sales/config.py (Pydantic models).
- Paths: CSV path & models directory.
- Columns: rename date/store/item/sales/price/promo if your dataset differs.
- Calendar features: year/quarter/month/day/dow/weekend.
- Lag & rolling: lags (1,7,14,28) and rolling stats (7,28).
- Future controls: price/promo futures and price ratio.
- Output formatting: integer vs float and decimal places.
- Training knobs:
n_estimators,max_depth,learning_rate,tree_method, regularization, etc. - Validation:
valid_tail_daysorvalid_cutoff_date(time-based split). - Early Stopping:
early_stopping_rounds; the UI can auto-suggest a value based on your window and estimator cap.
| Setting | Default | Notes |
|---|---|---|
SF_HOL_COUNTRY |
US |
Country-level holidays enabled |
SF_HOL_SUBDIV |
(none) | Add e.g. US-CA for state holidays |
SF_MODELS_DIR |
(required) | Where models are saved/loaded |
SF_DATA_CSV |
(optional) | Server-side CSV path |
| Horizons | 1-7 |
7 daily horizons |
| Unit type | integer |
Rounding applied |
| Validation window | ~10–20% tail |
Choose recent tail or cutoff date |
Environment overrides are read on startup.
From the Training page you can:
- Choose mode:
global(fastest) orper_group(by pair/item/store). - Wipe outputs (optional).
- Set horizons, holiday region, and XGBoost params.
- Choose validation via cutoff date or last N days.
- Use Auto-suggest to derive
early_stopping_roundsfromn_estimators& your validation window.
Tip: Prefer a reasonably large
n_estimatorswith early stopping. Let training stop when validation RMSE plateaus rather than guessing a small cap.
Use the Forecast page to generate predictions and simulate plans.
- Data source: server CSV (default) or upload your own.
- Models directory: pick which trained models to load.
- Paging: control page size and navigation.
- Units & decimals: render as integers or floats.
single: one(store_id, item_id)at its latest date.latest_per_pair: latest date for each pair across the dataset.latest_per_store: latest date for each item within a store.latest_per_item: latest date across stores for a given item.last_n_days: all rows within the last N days.since_date: all rows on/after a date.at_date: rows exactly at a date.
- Price: set a scalar (e.g.,
0.9or90%) or a CSV (12.5,12.7,12.9). - Promo: set a scalar (
0,0.5,1,50%) or CSV per horizon. - If both left blank, the model assumes no change (safe default).
unit_type:integerorfloat.decimal_places: only applies tofloatoutput.
All endpoints live under the same app as the UI.
- Swagger UI:
GET /docs - ReDoc:
GET /redoc
Request (application/json)
Optional body field:
models_dir(string) — override the models directory used for loading models.
Response (application/json)
{
"predictions": [
{
"store_id": "S01",
"item_id": "I01",
"base_date": "2024-05-31",
"target_date": "2024-06-01",
"horizon": 1,
"sales": 42
}
],
"page": { "page": 1, "page_size": 100, "total": 7 }
}For uploaded CSVs (or to avoid JSON).
Fields
file(optional; required ifuse_demo_csv=false)scope,store_id,item_id,horizons,use_demo_csvprice_future,promo_future,unit_type,decimal_placespage,page_sizemodels_dir(optional)
Response: same schema as JSON variant.
Fields (multipart form)
- Mode/Scope:
mode(global|per_group),train_scope(pair|item|store) - Data:
use_demo_csv(bool),file(csv) - Paths/Horizons/Holidays:
models_dir,horizons,hol_country,hol_subdiv,wipe - XGBoost:
nthread,n_estimators,max_depth,learning_rate,tree_method,subsample,colsample_bytree,min_child_weight,gamma,reg_alpha,reg_lambda,max_bin,random_state,required_feature_notna - Validation:
valid_cutoff_date,valid_tail_days,early_stopping_rounds,verbose_eval,enforce_single_thread_env
Response
{
"ok": true,
"trained": { "mode": "global", "groups": 1 },
"models_dir": "models",
"horizons": [1,2,3,4,5,6,7],
"holidays": { "country": "US", "subdiv": null },
"seconds": 12.34,
"note": "Training finished."
}Heuristic to recommend early_stopping_rounds given n_estimators and your validation window size.
Fields
n_estimators(int, required)valid_tail_days(int, optional)valid_cutoff_date(str, optional; assumes 28 days if provided without tail)cap_min(int, default 20)cap_max(int, default 120)
Response
{
"ok": true,
"suggestion": 36,
"percent": 0.12,
"n_estimators": 300,
"used_tail_days": 28,
"assumed_default": false,
"cap": { "min": 20, "max": 120 },
"note": "Based on n_estimators and validation window ≈ 28 days."
}{
"detail": [
{"loc":["body","store_id"],"msg":"Field required","type":"value_error.missing"}
]
}CSV must include these columns (rename via config.py if needed):
date(YYYY-MM-DD or ISO date)store_id(string)item_id(string)sales(float/int)price(float)promo(0/1/0.5)
- Cadence: daily rows per
(store_id, item_id); non-daily data should be resampled to daily. - Timezone: dates treated as naive local dates; convert to consistent local or UTC before ingest.
- Missing data: rows with missing
salesare excluded from training;price/promomissing values are imputed with hold-forward where appropriate. - Outliers: spikes/dips are not automatically clipped; handle in your preprocessing if needed.
- Start with global mode; switch to per-group for tricky segments only.
- Prefer
tree_method="hist"on CPU. Usegpu_histif available (e.g.,max_bin=256). - Set a large
n_estimatorswith early stopping. - Validation: use a recent window (e.g., last 10–20% of dates).
- Holidays: enable the appropriate country; optionally add subdivision (e.g.,
US-CA) for state holidays. - Reproducibility: set
random_state; enforce single thread (nthread=1) only if you need bit-for-bit parity.
Any ASGI host works. Minimal examples:
export SF_MODELS_DIR=/opt/models
export SF_DATA_CSV=/opt/data/sales.csv
uvicorn boost_sales.api.server:app --host 0.0.0.0 --port 8000
# or: gunicorn -k uvicorn.workers.UvicornWorker boost_sales.api.server:appFROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install -U pip && pip install -e .
ENV SF_MODELS_DIR=/models SF_DATA_CSV=/data/sales.csv
EXPOSE 8000
CMD ["uvicorn", "boost_sales.api.server:app", "--host", "0.0.0.0", "--port", "8000"]services:
sales-forecast:
build: .
ports: ["8000:8000"]
environment:
SF_MODELS_DIR: /models
SF_DATA_CSV: /data/sales.csv
SF_HOL_COUNTRY: US
volumes:
- ./models:/models
- ./data:/dataRun behind a reverse proxy / gateway with authentication if exposed outside a trusted network. Configure CORS appropriately for the UI domain.
- Code style: black/ruff (optional).
- Tests: add under
tests/(PyTest). - Static assets: edit
static/main.jsandstatic/main.css. - Templates:
templates/forecast.html&templates/training.html.
Q: Why do I get “Field required” on /forecast?
A: Ensure you provided the required params for your scope. For single, both store_id and item_id are required. The UI performs quick validation; the API enforces it too.
Q: Should I upload a CSV on the Forecast page?
A: Optional. If your server already has a configured CSV (demo or production), keep “Use server CSV” checked. Upload when you want to forecast against a different dataset ad-hoc.
Q: How do holidays help?
A: Holiday features often improve accuracy by capturing demand shifts (spikes and dips). Choose the matching country; optionally add a subdivision (e.g., US-CA) for state/province holidays. Leave subdivision blank to include national holidays only.
Q: Why 36 for early_stopping_rounds?
A: With valid_tail_days=28 we suggest ~12% of n_estimators (clamped). For n_estimators=300, that’s 36—balanced patience without over-waiting.
Q: Global vs per-group training?
A: Global is faster and generalizes across entities; per-group can capture idiosyncrasies but is slower and can overfit small groups.
MIT — see LICENSE for details. Contributions welcome!
{ "scope": "single", "store_id": "S01", "item_id": "I01", "horizons": "1-7", "use_server_csv": true, "price_future": "0.9", "promo_future": "0,0,0,0.5,0,0,0", "unit_type": "integer", "decimal_places": 0, "page": 1, "page_size": 100 }