A complete web application for training text classification models entirely in the browser using TensorFlow.js. No server required - everything runs client-side!
- π Complete ML Pipeline: Upload dataset β Preprocess β Build model β Train β Test β Export
- π Browser-Only Training: All training happens in your browser using TensorFlow.js
- π Real-time Visualization: Live training progress with loss/accuracy charts
- π€ Custom Tokenizer: Built-in text tokenization and preprocessing
- π§ Neural Network: Configurable embedding + dense layers architecture
- πΎ Model Export: Download trained models for use in other applications
- π± Responsive Design: Works on desktop and mobile devices
- Node.js 18+
- Modern browser with JavaScript enabled
- Clone and install dependencies:
cd frontend
npm install- Start the development server:
npm run dev- Open your browser:
Navigate to
http://localhost:5173
- Supports CSV and JSON formats
- Required columns:
textandlabel - Example CSV:
text,label
"Hello, how are you?",greeting
"What's the weather like?",weather
"Thank you!",gratitudeThe tokenizer converts text to numerical sequences:
- Text Preprocessing: Converts to lowercase, removes punctuation
- Vocabulary Building: Creates word-to-number mappings
- Sequence Creation: Converts text to integer sequences
- Padding: Ensures uniform sequence length
Key Parameters:
- Vocabulary Size: Number of unique words to keep (1000-10000)
- Max Sequence Length: Padding/truncation length (10-200)
Neural network with:
- Embedding Layer: Converts word indices to dense vectors
- Global Average Pooling: Averages embeddings across sequence
- Dense Layer: Hidden layer with ReLU activation
- Dropout: 30% dropout for regularization
- Output Layer: Softmax for multi-class classification
Hyperparameters:
- Embedding Dimensions: Size of word vectors (8-128)
- Hidden Units: Dense layer neurons (16-256)
- Learning Rate: How fast the model learns (0.0001-0.01)
- Batch Size: Samples processed together (8-128)
- Epochs: Complete passes through dataset (1-100)
- Validation Split: Fraction for validation (0.1-0.4)
Real-time monitoring of:
- Training/validation loss
- Training/validation accuracy
- Training time
- Test model on custom text inputs
- View prediction confidence scores
- See probability distribution across all classes
Download:
- Complete Package: Model + tokenizer + training history
- Model Only: TensorFlow.js format (.json + .bin files)
- Tokenizer Config: For text preprocessing
- Training History: Performance metrics
Custom JavaScript tokenizer that:
- Builds vocabulary from training data
- Maps words to integer indices
- Handles out-of-vocabulary words
- Pads sequences to uniform length
// Example tokenizer usage
const tokenizer = new TextTokenizer(vocabSize=1000, maxLength=50);
tokenizer.fitOnTexts(texts);
const sequences = tokenizer.textsToSequences(texts);
const padded = tokenizer.padSequences(sequences);Input (text sequences) β Embedding β Global Avg Pool β Dense β Dropout β Softmax β Predictions
Memory Usage: ~1-10MB depending on vocabulary size and model complexity
- Chrome: Full support
- Firefox: Full support
- Safari: Full support
- Edge: Full support
- Mobile: Supported but may be slower
- Small (< 100 samples): Use smaller embedding dimensions (8-16)
- Medium (100-1000 samples): Default settings work well
- Large (1000+ samples): Can use larger models (32+ dimensions)
- Models automatically dispose tensors to prevent memory leaks
- Larger vocabularies use more memory
- Training is done in batches to manage memory usage
- Batch Size: Larger batches = faster training but more memory
- Model Complexity: More parameters = slower training
- Dataset Size: More samples = longer training time
Typical training times:
- 100 samples, 20 epochs: ~30 seconds
- 1000 samples, 20 epochs: ~2-5 minutes
- 10000 samples, 20 epochs: ~10-30 minutes
- Balanced Classes: Try to have similar numbers of examples per class
- Clean Text: Remove excessive punctuation, normalize text
- Sufficient Data: Aim for 50+ examples per class minimum
- Start Small: Begin with default parameters and adjust based on performance
- Monitor Overfitting: Watch for validation accuracy plateau
- Experiment: Try different embedding dimensions and hidden units
- Vocabulary Size: Balance between coverage and memory usage
- Sequence Length: Set based on your typical text length
- Out-of-Vocab: Monitor OOV tokens - high numbers may indicate need for larger vocabulary
Deploy to any static hosting service:
npm run build
# Upload dist/ folder to your hosting providerUse exported models in other applications:
// Load trained model
const model = await tf.loadLayersModel('/path/to/model.json');
// Load tokenizer config
const tokenizer = await fetch('/path/to/tokenizer.json').then(r => r.json());
// Make predictions
const prediction = await model.predict(preprocessedText);frontend/
βββ src/
β βββ components/ # React components
β β βββ DatasetUpload.tsx
β β βββ Tokenizer.tsx # Core tokenization logic
β β βββ ModelBuilder.tsx # TensorFlow.js model creation
β β βββ Training.tsx # Training loop and visualization
β β βββ Prediction.tsx # Model inference
β β βββ ModelExport.tsx # Export functionality
β βββ types.ts # TypeScript interfaces
β βββ App.tsx # Main application
β βββ App.css # Styling
βββ package.json
βββ vite.config.ts
- Frontend: React 18 + TypeScript + Vite
- ML: TensorFlow.js
- Visualization: Recharts
- File Handling: react-dropzone
- Styling: Custom CSS with modern design
- Customer Support: Classify support tickets by category
- Content Moderation: Detect spam or inappropriate content
- Sentiment Analysis: Classify text sentiment (positive/negative/neutral)
- Intent Recognition: Understand user intentions in chatbots
- Document Classification: Categorize documents by type or topic
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- TensorFlow.js team for making ML in the browser possible
- React team for the excellent framework
- The open-source community for inspiration and tools