Skip to content

Build AI-first AutoML platform: CSV → best model + insights#1

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/add-best-model-insights-feature
Draft

Build AI-first AutoML platform: CSV → best model + insights#1
Copilot wants to merge 2 commits intomainfrom
copilot/add-best-model-insights-feature

Conversation

Copy link

Copilot AI commented Mar 22, 2026

Replaces the manual model-selection Flask app with a fully automated ML pipeline. Users upload a CSV and receive the best trained model, evaluation metrics, feature importances, and a plain-English explanation — no configuration required beyond selecting the target column.

Backend — backend/ (FastAPI)

  • data_processor/profiler.py — dataset profiling, auto-detects problem type (classification vs regression) from target column cardinality/dtype, generates preprocessing suggestions
  • data_processor/cleaner.py — null placeholder normalisation, deduplication, high-missing/constant column drops; ColumnTransformer pipeline (median imputation + StandardScaler for numeric; mode imputation + OneHotEncoder for categorical)
  • model_trainer/trainer.py — trains 5 models (LogisticRegression/Ridge, RandomForest, GradientBoosting, XGBoost, LightGBM); runs Optuna tuning (20 trials) on the winner
  • evaluator/evaluator.py — accuracy/F1/precision/recall for classification; RMSE/MAE/R² for regression; sorted comparison table
  • explainer/explainer.py — feature importance via feature_importances_, coef magnitude, or SHAP fallback; template-based plain-English explanation
  • api/routes.pyPOST /api/upload, POST /api/run-pipeline/{job_id}, GET /api/results/{job_id}, GET /api/download-model/{job_id}
  • app.py — FastAPI + CORS; serves React static build from frontend/build/

Frontend — frontend/ (React + TypeScript)

  • Upload zone (drag-and-drop / click)
  • Pipeline config panel: target column selector, Optuna tune toggle, auto-generated preprocessing plan preview
  • Results dashboard: metric badges, plain-English explanation card, model comparison table, feature importance horizontal bar chart (Recharts), trained model .pkl download

Example pipeline flow

# Upload
POST /api/upload  →  { job_id, columns, detected_problem_type, preprocessing_suggestions }

# Run (tune=False skips Optuna for speed)
POST /api/run-pipeline/{job_id}  body: { target_col: "label", tune: true }
→ {
    best_model: "GradientBoosting",
    best_model_scores: { accuracy: 0.91, f1_weighted: 0.90, ... },
    comparison_table: [ { model, accuracy, f1_weighted, ... }, ... ],
    feature_importances: [ { feature, importance }, ... ],
    explanation: "Your dataset contains 5 000 rows... The best model achieved..."
  }

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI changed the title [WIP] Add feature to upload CSV and generate best model insights Build AI-first AutoML platform: CSV → best model + insights Mar 22, 2026
Copilot AI requested a review from vishalxtyagi March 22, 2026 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants