A deep learning model for recognizing traditional medicinal plants, powering the DrGreen mobile application. Built with MobileNetV2 and optimized for accurate multi-class plant classification.
Green is a machine learning model designed to identify four traditional medicinal plants commonly used in herbal medicine. The model uses transfer learning with MobileNetV2 as the base architecture, combined with Focal Loss to handle class imbalance and extensive data augmentation to improve generalization.
The model can identify the following medicinal plants:
- Artemisia (Artemisia annua) - Known for antimalarial properties
- Carica (Carica papaya) - Papaya, used for digestive health
- Goyavier (Psidium guajava) - Guava, traditional remedy for various ailments
- Kinkeliba (Combretum micranthum) - West African medicinal plant
- Base Model: MobileNetV2 (pre-trained on ImageNet, frozen)
- Input Size: 224x224x3 RGB images
- Loss Function: Focal Loss (γ=2.0, α=0.25) with label smoothing (0.15)
- Optimizer: Adam with Cosine Decay learning rate schedule
- Regularization:
- Dropout (60% and 30%)
- L2 regularization (0.02)
- Batch Normalization
Input (224x224x3)
↓
MobileNetV2 (frozen, ImageNet weights)
↓
Global Average Pooling
↓
Dropout (0.6)
↓
Dense (64 units, ReLU, L2 reg)
↓
Batch Normalization
↓
Dropout (0.3)
↓
Dense (4 units, Softmax, L2 reg)
↓
Output (4 classes)
Model Parameters:
- Total parameters: 2,340,484
- Trainable parameters: 82,372
- Non-trainable parameters: 2,258,112
- Total Images: 1,164
- Train/Validation Split: 80/20 (stratified)
- Training Images: 931
- Validation Images: 233
| Class | Total Images | Train | Validation | Percentage |
|---|---|---|---|---|
| Artemisia | 275 | 220 | 55 | 23.6% |
| Carica | 356 | 285 | 71 | 30.6% |
| Goyavier | 241 | 193 | 48 | 20.7% |
| Kinkeliba | 292 | 233 | 59 | 25.1% |
To improve model robustness, the following augmentation techniques are applied during training:
- Random horizontal and vertical flips
- Random rotation (±30°)
- Random zoom (±20%)
- Random brightness adjustment (±20%)
- Random contrast adjustment (±20%)
- Random translation (±15%)
- Validation Accuracy: 69.10%
- Top-2 Accuracy: 88.41%
- Training Approach: Transfer learning with frozen base
- No Class Collapse: Predictions are well-distributed across all classes
- Stratified Splitting: Ensures all classes are properly represented in both training and validation sets
- Class Weighting: Addresses class imbalance during training
- Focal Loss: Focuses on hard-to-classify examples
- Early Stopping: Prevents overfitting with patience of 15 epochs
- Best Model Checkpoint: Automatically saves the best performing model
# Python 3.8 or higher
pip install tensorflow>=2.19.0
pip install numpy
pip install matplotlib
pip install seaborn
pip install pandas
pip install scikit-learn
pip install pillow
pip install gdown- Clone the repository
git clone https://github.com/armelyara/Green.git
cd Green- Open in Google Colab
Click the "Open in Colab" badge in the notebook or visit:
https://colab.research.google.com/github/armelyara/Green/blob/main/green-v2.ipynb
-
Enable GPU (Recommended)
- In Colab: Runtime → Change runtime type → GPU
- Training is much faster with GPU acceleration
-
Run the notebook
- The dataset will be automatically downloaded from Google Drive
- Training will begin automatically after dataset preparation
Key hyperparameters are defined in the CONFIG dictionary:
CONFIG = {
'img_height': 224,
'img_width': 224,
'batch_size': 16,
'epochs': 100,
'initial_lr': 0.0005,
'validation_split': 0.2,
'dropout_rate': 0.6,
'num_classes': 4,
'focal_gamma': 2.0,
'focal_alpha': 0.25,
'label_smoothing': 0.15,
}The notebook follows this workflow:
- Data Loading: Downloads and extracts the plant image dataset
- Stratified Split: Creates balanced train/validation sets using sklearn
- Data Pipeline: Sets up TensorFlow data pipeline with augmentation
- Model Building: Constructs MobileNetV2-based architecture
- Training: Trains with Focal Loss, class weights, and callbacks
- Evaluation: Generates confusion matrix and classification report
- Model Saving: Saves best model checkpoint
- Early Stopping: Stops training if validation accuracy doesn't improve for 15 epochs
- Model Checkpoint: Saves the best model based on validation accuracy
- CSV Logger: Records training metrics to CSV file
The trained model produces:
- Model File:
models/best_model_v7.keras- Best performing model - Training Log:
models/training_log_v7.csv- Epoch-by-epoch metrics - Visualizations:
- Training/validation accuracy curves
- Training/validation loss curves
- Top-2 accuracy curves
- Confusion matrix
- Per-class performance metrics
import tensorflow as tf
from tensorflow.keras.preprocessing import image
import numpy as np
# Load the model
model = tf.keras.models.load_model('models/best_model_v7.keras')
# Prepare an image
img_path = 'path/to/plant/image.jpg'
img = image.load_img(img_path, target_size=(224, 224))
img_array = image.img_to_array(img)
img_array = tf.keras.applications.mobilenet_v2.preprocess_input(img_array)
img_array = np.expand_dims(img_array, axis=0)
# Make prediction
predictions = model.predict(img_array)
class_names = ['artemisia', 'carica', 'goyavier', 'kinkeliba']
predicted_class = class_names[np.argmax(predictions[0])]
confidence = np.max(predictions[0]) * 100
print(f"Predicted: {predicted_class} ({confidence:.2f}% confidence)")This model is designed to be integrated into the DrGreen mobile application for real-time plant recognition. The lightweight MobileNetV2 architecture ensures efficient inference on mobile devices.
Deployment Options:
- TensorFlow Lite for on-device inference
- TensorFlow Serving for cloud-based API
- Export to ONNX for cross-platform compatibility
Green/
├── green-v2.ipynb # Main training notebook
├── models/ # Saved models and logs
│ ├── best_model_v7.keras # Best model checkpoint
│ └── training_log_v7.csv # Training metrics
├── README.md # This file
└── LICENSE # Project license
The current version (V2) includes critical improvements over the initial version:
- Stratified Split: Uses sklearn's
train_test_splitwith stratification to ensure all classes are represented in validation - Focal Loss: Addresses class imbalance by focusing on hard examples
- Balanced Predictions: No class collapse - predictions are well-distributed across all 4 classes
- Improved Regularization: Multiple dropout layers and L2 regularization prevent overfitting
- Class Weighting: Dynamic class weights during training to handle dataset imbalance
Potential enhancements for future versions:
- Expand dataset with more plant species
- Implement data collection pipeline for continuous learning
- Add explainability features (Grad-CAM visualization)
- Fine-tune base model layers for improved accuracy
- Implement ensemble methods
- Add confidence thresholding for uncertain predictions
- Support for plant part recognition (leaf, flower, stem)
- Multi-label classification for mixed plant images
If you use this model in your research or application, please cite:
Green: Traditional Plant Recognition Model
Repository: https://github.com/armelyara/Green
Model: MobileNetV2 + Focal Loss for Traditional Plant Classification
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Pre-trained MobileNetV2 weights from ImageNet
- TensorFlow and Keras teams for the excellent deep learning framework
- Google Colab for providing free GPU resources
- Contributors to the plant image dataset
For questions, issues, or collaboration opportunities related to the Green model or DrGreen application, please send an email to armelyara@thedayinfo.com or open an issue on this repository.
Note: This model is intended for educational and research purposes. For medical or health-related decisions, always consult qualified healthcare professionals.