A self-sustaining marketplace that automates the collection of domain-specific data from the web, transforms it into high-quality training datasets, trains specialized AI models using QLoRA, and monetizes through dataset sales, API queries, and custom training services.
- Data Collection Pipeline: Automated scraping, extraction, AI-powered formatting, and deduplication
- Marketplace: Browse and purchase curated training datasets
- Model Training: QLoRA-based training on free infrastructure (Google Colab)
- Inference API: Query specialized AI models via REST API
- Custom Training: Upload your own data and train custom models
- Payment Integration: Stripe for dataset purchases and credit management
- FastAPI: Modern Python web framework
- SQLAlchemy: ORM for PostgreSQL
- Celery: Distributed task queue
- Redis: Caching and message broker
- PostgreSQL: Database with pgvector extension
- HuggingFace: Model hosting and inference
- Stripe: Payment processing
- React: UI library
- TypeScript: Type-safe JavaScript
- Vite: Build tool
- Zustand: State management
- Axios: HTTP client
- BeautifulSoup: HTML parsing
- PyPDF2: PDF extraction
- OpenAI/Claude: AI-powered formatting
- sentence-transformers: Semantic similarity
- Transformers: Model training with QLoRA
- Docker and Docker Compose
- Python 3.11+
- Node.js 20+
git clone <repository-url>
cd ai-training-platformcp .env.example .env
# Edit .env with your API keysdocker-compose up -dThis will start:
- PostgreSQL (port 5432)
- Redis (port 6379)
- Backend API (port 8000)
- Celery worker
- Frontend (port 3000)
docker-compose exec backend alembic upgrade head- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
cd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements/dev.txt
# Run migrations
alembic upgrade head
# Start development server
uvicorn api.main:app --reload
# Run tests
pytest
# Run Celery worker
celery -A workers.celery_app worker --loglevel=infocd frontend
# Install dependencies
npm install
# Start development server
npm run dev
# Run tests
npm test
# Build for production
npm run buildai-training-platform/
├── backend/
│ ├── api/ # FastAPI application
│ │ ├── routers/ # API endpoints
│ │ ├── models/ # Pydantic schemas
│ │ ├── services/ # Business logic
│ │ ├── db/ # Database models and migrations
│ │ ├── core/ # Security utilities
│ │ └── utils/ # Helper functions
│ ├── ml_pipeline/ # Data collection and processing
│ │ ├── scraper/ # Web scraping
│ │ ├── extractor/ # Content extraction
│ │ ├── formatter/ # AI formatting
│ │ ├── trainer/ # Model training
│ │ └── deployer/ # HuggingFace deployment
│ ├── workers/ # Celery tasks
│ ├── tests/ # Test suite
│ └── requirements/ # Python dependencies
├── frontend/
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── pages/ # Page components
│ │ ├── services/ # API clients
│ │ ├── store/ # Zustand stores
│ │ ├── types/ # TypeScript types
│ │ └── utils/ # Utility functions
│ └── public/ # Static assets
├── data/ # Data storage (gitignored)
├── docker-compose.yml # Docker services
└── .env.example # Environment template
Once the backend is running, visit http://localhost:8000/docs for interactive API documentation (Swagger UI).
# Run all tests
pytest
# Run with coverage
pytest --cov=api --cov=ml_pipeline
# Run specific test file
pytest tests/unit/test_auth.py
# Run property-based tests
pytest tests/property/# Run all tests
npm test
# Run with coverage
npm test -- --coverage
# Run specific test file
npm test -- src/components/Auth.test.tsxSee DEPLOYMENT.md for detailed deployment instructions for:
- AWS Lambda (Serverless)
- AWS ECS (Containers)
- Traditional VPS
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
For questions and support, please open an issue on GitHub.