Advanced Object Detection / Grounding

Open-Vocabulary

Locate anything on earth: Advancing open-vocabulary object detection for remote sensing community. AAAI'2025. [Paper | Code]
Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning. ECCV'2024. [Paper | Code]
LLaMA-Unidetector: A LLaMA-Based Universal Framework for Open-Vocabulary Object Detection in Remote Sensing Imagery. TGRS'2025. [Paper | Code]
InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition. arXiv'2025. [Paper | Code]
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images. ICCV'2025. [Paper | Code]
Cross-Modal Enhancement and Benchmark for UAV-based Open-Vocabulary Object Detection. arXiv'2025. [Paper]
FASE: Feature-Aligned Scene Encoding for Open-Vocabulary Object Detection in Remote Sensing. CIKM'2025. [Paper]
Cross-View Open-Vocabulary Object Detection in Aerial Imagery. arXiv'2025. [Paper]
RSVG-ZeroOV: Exploring a Training-Free Framework for Zero-Shot Open-Vocabulary Visual Grounding in Remote Sensing Images. AAAI'2026. [Paper | Code]

RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts. arXiv'2024. [Paper | Code]]
GeoGround: A Unified Large Vision-Language Model. for Remote Sensing Visual Grounding. arXiv'2024. [Paper | Code]
Falcon: A Remote Sensing Vision-Language Foundation Model. arXiv'2025. [Paper | Code]
EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models. arXiv'2025. [Paper | Code]
RemoteSAM: Towards Segment Anything for Earth Observation. ACMMM'2025. [Paper | Code]
GeoMag: A Vision-Language Model for Pixel-level Fine-Grained Remote Sensing Image Parsing. arXiv'2025. [Paper]