Human versus ViT

A comparative study examining the performance differences between human visual perception and Vision Transformer (ViT) models in image recognition and classification tasks.

Overview

This repository contains the implementation and analysis code for comparing human cognitive abilities with Vision Transformer neural networks. The project explores how humans and state-of-the-art computer vision models perform on various visual recognition tasks, providing insights into the strengths and limitations of both biological and artificial vision systems.

Research Objectives

Performance Comparison: Quantitative analysis of human vs. ViT performance across different image classification tasks
Robustness Analysis: Evaluation of both systems under various image distortions and challenging conditions
Cognitive Insights: Understanding the fundamental differences in how humans and transformers process visual information
Benchmark Development: Creating standardized evaluation protocols for human-AI comparison studies

Key Features

Experimental Framework

Controlled experimental setup for human subject testing
Standardized evaluation protocols for ViT models
Statistical analysis tools for performance comparison
Comprehensive data collection and processing pipeline

Model Implementations

Pre-trained Vision Transformer models (ViT-Base, ViT-Large)
Fine-tuning capabilities for domain-specific tasks
Support for various ViT architectures and configurations
Integration with popular deep learning frameworks

Human Study Components

Psychophysical experiment design
Response time measurement
Accuracy assessment protocols
User interface for human testing sessions

Dataset Support

The framework supports evaluation on multiple standard datasets:

ImageNet: Large-scale image classification
CIFAR-10/100: Small-scale natural image classification
Custom Datasets: Domain-specific evaluation sets
Distorted Images: Robustness testing with various image corruptions

Installation

Requirements

Python >= 3.8
PyTorch >= 1.9.0
torchvision >= 0.10.0
transformers >= 4.0.0
numpy >= 1.21.0
matplotlib >= 3.3.0
scipy >= 1.7.0
pandas >= 1.3.0

Setup

# Clone the repository
git clone https://github.com/mlacarrasco/human_versus_vit.git
cd human_versus_vit

# Install dependencies
pip install -r requirements.txt

# Download pre-trained models (optional)
python scripts/download_models.py

Usage

Running ViT Evaluation

from src.models import ViTEvaluator
from src.datasets import load_dataset

# Initialize evaluator
evaluator = ViTEvaluator(model_name='vit_base_patch16_224')

# Load dataset
dataset = load_dataset('imagenet', split='validation')

# Run evaluation
results = evaluator.evaluate(dataset)
print(f"ViT Accuracy: {results['accuracy']:.2f}%")

Human Study Interface

# Launch human study interface
python human_study/run_experiment.py --config configs/human_study.yaml

Comparative Analysis

from src.analysis import compare_performance

# Load results from both human and ViT evaluations
human_results = load_human_results('data/human_results.json')
vit_results = load_vit_results('data/vit_results.json')

# Generate comparison
comparison = compare_performance(human_results, vit_results)
comparison.plot_results()

Experimental Protocol

Human Study Design

Participant Recruitment: Controlled selection criteria for human subjects
Task Design: Standardized image classification tasks with time constraints
Data Collection: Systematic recording of responses and reaction times
Quality Control: Validation protocols to ensure data reliability

ViT Evaluation Protocol

Model Selection: Systematic evaluation across different ViT architectures
Preprocessing: Standardized image preprocessing pipeline
Inference: Controlled evaluation environment with consistent parameters
Performance Metrics: Comprehensive accuracy and efficiency measurements

Results Structure

results/
├── human_studies/
│   ├── raw_data/
│   ├── processed/
│   └── analysis/
├── vit_evaluation/
│   ├── model_outputs/
│   ├── performance_metrics/
│   └── ablation_studies/
└── comparative_analysis/
    ├── statistical_tests/
    ├── visualizations/
    └── reports/

Key Findings

Performance Metrics

Overall Accuracy: Comparative analysis across different image categories
Response Time: Speed comparison between human cognition and model inference
Robustness: Performance degradation under various image distortions
Category-specific Performance: Detailed analysis of performance across object categories

Statistical Analysis

Significance Testing: Statistical validation of performance differences
Correlation Analysis: Relationship between human and ViT performance patterns
Error Analysis: Systematic examination of failure modes in both systems

Configuration

Model Configuration

# config/vit_config.yaml
model:
  name: "vit_base_patch16_224"
  pretrained: true
  num_classes: 1000
  
evaluation:
  batch_size: 32
  num_workers: 4
  device: "cuda"

Human Study Configuration

# config/human_study.yaml
experiment:
  num_participants: 50
  trials_per_participant: 200
  time_limit: 2.0  # seconds
  
display:
  image_size: [224, 224]
  presentation_time: 1.0
  inter_trial_interval: 0.5

Contributing

We welcome contributions to improve the experimental framework and analysis tools. Please follow these guidelines:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Vision Transformer implementation based on the original paper by Dosovitskiy et al.
Human psychophysical experiment design inspired by cognitive psychology literature
Statistical analysis methods adapted from comparative psychology research
Special thanks to all human participants in the study

Contact

Miguel Carrasco

Email: [contact information]
Website: https://mlacarrasco.github.io/
LinkedIn: [profile link]

For questions about the research methodology or experimental design, please open an issue or contact the author directly.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
resultados_vit		resultados_vit
.gitignore		.gitignore
README.md		README.md
step_1_avg_hum_att.py		step_1_avg_hum_att.py
step_2_VIT_versus_multiple_HUM.py		step_2_VIT_versus_multiple_HUM.py
step_3_multiple_differences.py		step_3_multiple_differences.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Human versus ViT

Overview

Research Objectives

Key Features

Experimental Framework

Model Implementations

Human Study Components

Dataset Support

Installation

Requirements

Setup

Usage

Running ViT Evaluation

Human Study Interface

Comparative Analysis

Experimental Protocol

Human Study Design

ViT Evaluation Protocol

Results Structure

Key Findings

Performance Metrics

Statistical Analysis

Configuration

Model Configuration

Human Study Configuration

Contributing

License

Acknowledgments

Contact

Related Work

About

Uh oh!

Releases

Packages

Languages

mlacarrasco/human_versus_vit

Folders and files

Latest commit

History

Repository files navigation

Human versus ViT

Overview

Research Objectives

Key Features

Experimental Framework

Model Implementations

Human Study Components

Dataset Support

Installation

Requirements

Setup

Usage

Running ViT Evaluation

Human Study Interface

Comparative Analysis

Experimental Protocol

Human Study Design

ViT Evaluation Protocol

Results Structure

Key Findings

Performance Metrics

Statistical Analysis

Configuration

Model Configuration

Human Study Configuration

Contributing

License

Acknowledgments

Contact

Related Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages