Skip to content

DJLin0219/vision-language-safety-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

SafeVLM: Fine-Tuning and Evaluation Framework for Vision-Language Models

This repository provides a sample implementation inspired by my prior internship work on vision-language model (VLM) benchmarking and fine-tuning for safety-critical perception tasks (e.g., traffic light and scene understanding).
All data and code are synthetic and intended solely for research demonstration purposes.


πŸ” Overview

This project presents a reproducible framework for fine-tuning, evaluating, and benchmarking modern Vision-Language Models (VLMs) on perception-aligned tasks.
The design focuses on robustness under visual uncertainty and cross-modal consistency, resembling the core ideas used in large-scale safety perception evaluation pipelines.

Key Features

  • 🧠 VLM Integration Layer β€” unified interface for models such as LLaVA-NeXT, PaliGemma, Fuyu, InternVideo2.
  • 🎯 Fine-tuning Pipeline β€” modular PyTorch pipeline for supervised or instruction-tuned adaptation using multimodal data.
  • πŸ“ˆ Evaluation Metrics β€” supports visual-textual accuracy, consistency, and reasoning alignment metrics.
  • 🧩 Prompt and Response Benchmarking β€” evaluates LLM reasoning coherence under degraded perception (e.g., low-light or occluded scenes).
  • ⚑ Fully open and synthetic β€” no private datasets or internal assets are included.

🧰 Repository Structure

About

A style sample similar to the internship project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors