- Multi-Modal AI Models: Integration of cutting-edge models including Qwen3-VL, Falcon, EarthMind, RemoteSAM, etc.
- Advanced Pipelines: Specialized pipelines for remote sensing tasks combining multiple AI models
- Visual Question Answering: Natural language querying capabilities for satellite imagery
- Image Classification: Automated classification between SAR and optical satellite imagery
- FastAPI Backend: Production-ready REST API for model serving
- Comprehensive Evaluation: Built-in metrics and testing frameworks for model performance assessment
LG-SAM/
├── models/ # Model wrappers and implementations
├── pipelines/ # Task-specific pipeline combinations
├── vqa/ # Visual Question Answering modules
├── utils/ # Utility functions and metrics
├── app.py # FastAPI application
├── api.py # API endpoints
└── requirements.txt # Dependencies
The framework follows a modular architecture where individual AI models are wrapped in standardized interfaces and combined into sophisticated pipelines for complex computer vision tasks.
# Run the setup script
chmod +x setup.sh
./setup.shThe setup script will:
- Create a virtual environment using
uv - Install all dependencies
- Clone and install SAM3
- Extract model checkpoints
- Start the FastAPI server
uv run uvicorn app:app --host 0.0.0.0 --port 8001The API will be available at http://localhost:8001.
The API endpoints are:
-
POST
/classify-image/— Classifies the uploaded satellite image into SAR and OPTICAL. -
POST
/caption-query/— Generates a descriptive caption for the uploaded image. -
POST
/binary-query/— Answers yes/no (binary) questions about the uploaded image. -
POST
/semantic-query/— Provides an answer to questions about scene features. -
POST
/numeric-query/— Returns numerical answers such as counts or quantities from the image. -
POST
/grounding-query/— Generates bounding boxes and grounding results based on the prompt and image content.
from pipelines.remotesam_sam3 import RemoteSAMSAM3Pipeline
# Initialize pipeline
pipeline = RemoteSAMSAM3Pipeline(device="cuda")
# Process image with text prompt
results = pipeline.process_image(
image="satellite.jpg",
text_prompt="locate the airport runway",
)
print(f"Found {len(results)} objects")
for result in results:
print(f"Score: {result['score']:.3f}")
print(f"Bounding box: {result['oriented_bbox']}")# Test pipelines
python test_pipeline.py --annotations_dir data/vrsbench/annotations --images_dir data/vrsbench/images --num_gpus 4 --num_workers_per_gpu 2 --num_images 100 --batch_size 4 --viz_dir results
# VQA evaluation
python vqa/object_count.py annotations.json results.json data/vrsbench/imagesBuilt with ❤️ for advancing India's space technology capabilities