Skip to content

webdpro-in/finxan-train

Repository files navigation

AI Training Platform

A self-sustaining marketplace that automates the collection of domain-specific data from the web, transforms it into high-quality training datasets, trains specialized AI models using QLoRA, and monetizes through dataset sales, API queries, and custom training services.

Features

  • Data Collection Pipeline: Automated scraping, extraction, AI-powered formatting, and deduplication
  • Marketplace: Browse and purchase curated training datasets
  • Model Training: QLoRA-based training on free infrastructure (Google Colab)
  • Inference API: Query specialized AI models via REST API
  • Custom Training: Upload your own data and train custom models
  • Payment Integration: Stripe for dataset purchases and credit management

Tech Stack

Backend

  • FastAPI: Modern Python web framework
  • SQLAlchemy: ORM for PostgreSQL
  • Celery: Distributed task queue
  • Redis: Caching and message broker
  • PostgreSQL: Database with pgvector extension
  • HuggingFace: Model hosting and inference
  • Stripe: Payment processing

Frontend

  • React: UI library
  • TypeScript: Type-safe JavaScript
  • Vite: Build tool
  • Zustand: State management
  • Axios: HTTP client

ML Pipeline

  • BeautifulSoup: HTML parsing
  • PyPDF2: PDF extraction
  • OpenAI/Claude: AI-powered formatting
  • sentence-transformers: Semantic similarity
  • Transformers: Model training with QLoRA

Quick Start

Prerequisites

  • Docker and Docker Compose
  • Python 3.11+
  • Node.js 20+

1. Clone the repository

git clone <repository-url>
cd ai-training-platform

2. Set up environment variables

cp .env.example .env
# Edit .env with your API keys

3. Start services with Docker Compose

docker-compose up -d

This will start:

  • PostgreSQL (port 5432)
  • Redis (port 6379)
  • Backend API (port 8000)
  • Celery worker
  • Frontend (port 3000)

4. Run database migrations

docker-compose exec backend alembic upgrade head

5. Access the application

Development Setup

Backend Development

cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements/dev.txt

# Run migrations
alembic upgrade head

# Start development server
uvicorn api.main:app --reload

# Run tests
pytest

# Run Celery worker
celery -A workers.celery_app worker --loglevel=info

Frontend Development

cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

# Run tests
npm test

# Build for production
npm run build

Project Structure

ai-training-platform/
├── backend/
│   ├── api/                    # FastAPI application
│   │   ├── routers/           # API endpoints
│   │   ├── models/            # Pydantic schemas
│   │   ├── services/          # Business logic
│   │   ├── db/                # Database models and migrations
│   │   ├── core/              # Security utilities
│   │   └── utils/             # Helper functions
│   ├── ml_pipeline/           # Data collection and processing
│   │   ├── scraper/           # Web scraping
│   │   ├── extractor/         # Content extraction
│   │   ├── formatter/         # AI formatting
│   │   ├── trainer/           # Model training
│   │   └── deployer/          # HuggingFace deployment
│   ├── workers/               # Celery tasks
│   ├── tests/                 # Test suite
│   └── requirements/          # Python dependencies
├── frontend/
│   ├── src/
│   │   ├── components/        # React components
│   │   ├── pages/             # Page components
│   │   ├── services/          # API clients
│   │   ├── store/             # Zustand stores
│   │   ├── types/             # TypeScript types
│   │   └── utils/             # Utility functions
│   └── public/                # Static assets
├── data/                      # Data storage (gitignored)
├── docker-compose.yml         # Docker services
└── .env.example               # Environment template

API Documentation

Once the backend is running, visit http://localhost:8000/docs for interactive API documentation (Swagger UI).

Testing

Backend Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=api --cov=ml_pipeline

# Run specific test file
pytest tests/unit/test_auth.py

# Run property-based tests
pytest tests/property/

Frontend Tests

# Run all tests
npm test

# Run with coverage
npm test -- --coverage

# Run specific test file
npm test -- src/components/Auth.test.tsx

Deployment

See DEPLOYMENT.md for detailed deployment instructions for:

  • AWS Lambda (Serverless)
  • AWS ECS (Containers)
  • Traditional VPS

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For questions and support, please open an issue on GitHub.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors