A computer vision application that detects flooded vs non-flooded scenes in images using a fine-tuned Vision Transformer (ViT).
The model is trained using LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning and includes an interactive Gradio interface for real-time inference.
- Vision Transformer Backbone: Uses
google/vit-base-patch16-224-in21k. - Parameter-Efficient Training: Fine-tunes ~0.6% of model parameters using LoRA.
- Automated Dataset Handling: Downloads the Louisiana Flood 2016 dataset automatically via
kagglehub. - Interactive Inference: Upload images and get predictions via a Gradio UI.
- Deployment Ready: Compatible with Hugging Face Spaces.
- Model: Vision Transformer (ViT-Base)
- Fine-Tuning: LoRA (PEFT)
- UI: Gradio
- Frameworks: PyTorch, Transformers, PEFT
- Dataset: Louisiana Flood 2016 (Kaggle)
git clone https://github.com/arman1o1/flood-detection-ViT.git
cd flood-detection-ViTpip install -r requirements.txtRun the main script:
python flood_detection_vit.py-
Model Check
- Looks for trained LoRA adapters in
./flood_detection_vit_lora
- Looks for trained LoRA adapters in
-
Training (if needed)
- Downloads the dataset automatically
- Fine-tunes the ViT model for 3 epochs
- Saves adapters locally
-
Inference
- Launches a Gradio web interface (local or public link)
- Upload images to classify flood vs non-flood scenes
-
Base Model:
google/vit-base-patch16-224-in21k -
Task: Binary Image Classification
-
LoRA Configuration:
- Rank: 16
- Alpha: 16
- Target Modules: Query / Value
-
Execution: GPU recommended, CPU supported
-
Caching: Trained adapters reused on subsequent runs
- First run may take time due to dataset download and training
- GPU significantly speeds up training
- Intended for research and experimentation, not production deployment
This project is licensed under the MIT License.
