ML Pathways v1.0

An interactive web-based platform for learning and experimenting with foundational machine learning algorithms. Inspired by Andrew Ng's Machine Learning course, ML Pathways provides a hands-on environment where users can explore ML problems, interact with an AI agent, generate code, and execute experiments safely.

Features

Core Functionality

9 ML Problem Types: Linear regression, logistic regression, neural networks, clustering, and more
Interactive AI Assistant: Chat with GPT-4, Claude, or Gemini for guidance, EDA, and Q&A
Automated Code Generation: Generate production-ready Python code for ML tasks
Sandboxed Execution: Safe code execution using E2B Code Interpreter
Sample Datasets: Curated datasets for each problem type
Custom Dataset Upload: Bring your own CSV files
Automated EDA: Instant exploratory data analysis with statistics and insights
Interactive Visualizations: Charts and graphs using Plotly.js and Recharts
Experiment Tracking: Save and manage your ML experiments

Technology Stack

Layer	Technology
Frontend	Next.js 15, React 19, TypeScript
UI Components	Shadcn UI, Tailwind CSS
Backend	Next.js API Routes
Database	Neon Serverless Postgres
ORM	Drizzle ORM
Authentication	BetterAuth (ready to configure)
AI Providers	OpenAI GPT-4, Anthropic Claude, Google Gemini
Code Execution	E2B Code Interpreter
File Storage	Cloudflare R2 (ready to configure)
Charts	Plotly.js, Recharts
Monitoring	Sentry (ready to configure)
Deployment	Vercel

Getting Started

Prerequisites

Node.js 18+ and npm
PostgreSQL database (Neon recommended)
At least one AI provider API key (OpenAI, Anthropic, or Google)
E2B API key for code execution (optional but recommended)

Installation

Clone the repository

git clone https://github.com/yourusername/ml-pathways.git
cd ml-pathways

Install dependencies

npm install

Set up environment variables

Copy .env.example to .env and fill in your values:

cp .env.example .env

Required environment variables:

# Database
DATABASE_URL=your_neon_postgres_connection_string

# AI Provider (choose one or more)
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
GOOGLE_API_KEY=your_google_key

# Set your preferred provider (default: claude)
AI_PROVIDER=claude  # or openai, gemini

# Code Execution (optional)
E2B_API_KEY=your_e2b_key

# Authentication (optional)
BETTER_AUTH_SECRET=your_secret_key
BETTER_AUTH_URL=http://localhost:3000

# File Storage (optional)
CLOUDFLARE_R2_ACCOUNT_ID=
CLOUDFLARE_R2_ACCESS_KEY_ID=
CLOUDFLARE_R2_SECRET_ACCESS_KEY=
CLOUDFLARE_R2_BUCKET_NAME=

# Monitoring (optional)
SENTRY_DSN=

Set up the database

Run database migrations:

npm run db:push

Start the development server

npm run dev

Open http://localhost:3000 in your browser.

Project Structure

ml-pathways/
├── src/
│   ├── app/                      # Next.js app directory
│   │   ├── api/                  # API routes
│   │   │   ├── chat/            # AI chat endpoint
│   │   │   ├── generate-code/   # Code generation endpoint
│   │   │   └── execute/         # Code execution endpoint
│   │   ├── dashboard/           # User dashboard
│   │   ├── problems/            # ML problems listing
│   │   ├── datasets/            # Dataset management
│   │   ├── experiments/         # Experiment tracking
│   │   └── workspace/           # Experiment workspace
│   ├── components/              # React components
│   │   ├── ui/                  # Shadcn UI components
│   │   └── layout/              # Layout components
│   ├── lib/
│   │   ├── ai/                  # AI provider integrations
│   │   ├── eda/                 # Data analysis utilities
│   │   ├── constants/           # ML problem definitions
│   │   └── sample-data/         # Sample datasets
│   └── db/
│       ├── schema.ts            # Database schema
│       └── index.ts             # Database client
├── drizzle/                     # Database migrations
├── public/                      # Static assets
└── package.json

Supported ML Problems

Beginner Level

Linear Regression (Single Variable) - Predict housing prices by size
Linear Regression (Multiple Variables) - Multi-feature price prediction
Logistic Regression - Binary classification for university admissions

Intermediate Level

Regularized Regression - Prevent overfitting with L1/L2 regularization
Polynomial Regression - Model nonlinear relationships
Multi-class Classification - Handwritten digit recognition
K-Means Clustering - Customer segmentation

Advanced Level

Neural Networks - Deep learning for image classification
Principal Component Analysis (PCA) - Dimensionality reduction

User Workflow

Choose an ML Problem - Browse problems by difficulty or category
Select a Dataset - Use sample data or upload your own CSV
Explore with AI - Chat with the AI assistant about your data
Automated EDA - Get instant insights and visualizations
Generate Code - AI creates Python code for your experiment
Execute Safely - Run code in a sandboxed environment
Visualize Results - Interactive charts and performance metrics
Iterate & Learn - Refine your approach with AI guidance

API Routes

POST /api/chat

Chat with the AI assistant.

Request:

{
  "messages": [
    { "role": "user", "content": "Explain linear regression" }
  ],
  "problemType": "linear_regression_single",
  "context": "Optional additional context"
}

Response:

{
  "message": "AI response...",
  "provider": "claude"
}

POST /api/generate-code

Generate Python code for an ML task.

Request:

{
  "problemType": "linear_regression_single",
  "task": "Train a linear regression model on housing data",
  "datasetInfo": {
    "columns": ["size", "price"],
    "rowCount": 100
  }
}

Response:

{
  "code": "import pandas as pd...",
  "explanation": "Code explanation",
  "provider": "claude"
}

POST /api/execute

Execute Python code in a sandbox.

Request:

{
  "code": "print('Hello, ML!')",
  "datasetUrl": "https://example.com/data.csv"
}

Response:

{
  "status": "success",
  "output": "Hello, ML!",
  "charts": [],
  "logs": []
}

Database Schema

Key tables:

users - User accounts
datasets - Uploaded and sample datasets
experiments - ML experiments
executions - Code execution records
chat_messages - Conversation history
eda_results - Exploratory data analysis results
sample_datasets - Pre-loaded sample datasets

Development

Available Scripts

npm run dev          # Start development server
npm run build        # Build for production
npm run start        # Start production server
npm run lint         # Run ESLint
npm run db:generate  # Generate migrations
npm run db:push      # Push schema to database
npm run db:studio    # Open Drizzle Studio

Adding a New ML Problem

Add the problem type to the enum in src/db/schema.ts
Define the problem in src/lib/constants/ml-problems.ts
Create a sample dataset in src/lib/sample-data/
Add problem-specific context in src/lib/ai/prompts.ts

Deployment

Deploy to Vercel

Push your code to GitHub
Connect your repository to Vercel
Add environment variables in Vercel dashboard
Deploy

Database Setup (Neon)

Create a Neon account at neon.tech
Create a new project
Copy the connection string
Add to DATABASE_URL in your environment variables
Run npm run db:push to create tables

E2B Setup

Create an account at e2b.dev
Get your API key
Add to E2B_API_KEY in environment variables

Security Features

Sandboxed code execution prevents malicious code
API rate limiting (ready to configure)
Input validation on all endpoints
Secure file upload handling
Environment variable protection

Future Enhancements

Additional ML algorithms (SVM, Decision Trees, Random Forests)
Real-time collaboration features
Community dataset sharing
Experiment leaderboards
Advanced visualization options
Jupyter notebook export
Model deployment capabilities
Mobile app version

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting PRs.

License

MIT License - see LICENSE file for details

Acknowledgments

Inspired by Andrew Ng's Machine Learning course
Built with Next.js, Shadcn UI, and modern ML tools
Powered by OpenAI, Anthropic, and Google AI

Support

For issues and feature requests, please use the GitHub issue tracker.

ML Pathways - Learn machine learning by doing, guided by AI.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.claude		.claude
.github/workflows		.github/workflows
e2e		e2e
playwright-report		playwright-report
public		public
src		src
test-results		test-results
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
AUTHENTICATION.md		AUTHENTICATION.md
CLAUDE.md		CLAUDE.md
FIXES_SUMMARY.md		FIXES_SUMMARY.md
LICENSE		LICENSE
README.md		README.md
WORKSPACE_FIXES_COMPLETE.md		WORKSPACE_FIXES_COMPLETE.md
apply_fixes.js		apply_fixes.js
dev.log		dev.log
drizzle.config.ts		drizzle.config.ts
fix_line.py		fix_line.py
ml-pathways-prd.txt		ml-pathways-prd.txt
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

License

karthiknitt/ml-pathways

Folders and files

Latest commit

History

Repository files navigation

ML Pathways v1.0

Features

Core Functionality

Technology Stack

Getting Started

Prerequisites

Installation

Project Structure

Supported ML Problems

Beginner Level

Intermediate Level

Advanced Level

User Workflow

API Routes

POST /api/chat

POST /api/generate-code

POST /api/execute

Database Schema

Development

Available Scripts

Adding a New ML Problem

Deployment

Deploy to Vercel

Database Setup (Neon)

E2B Setup

Security Features

Future Enhancements

Contributing

License

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages