A system for analyzing squash match videos, automatically detecting game phases (primarily rally and rest periods), and generating structured data for further analysis. The system combines SAM2 for player segmentation, YOLO-Pose for pose estimation, and a custom game state detection algorithm.
SquashPhaseDetector/
├── backend/ # FastAPI backend
│ ├── routers/ # API endpoints (video.py, segmentation.py, etc.)
│ ├── models/ # ML model integration (sam2_model.py, etc.)
│ ├── utils/ # Utility functions (video.py, etc.)
│ └── app.py # Main FastAPI application
├── frontend/
│ ├── src/
│ │ ├── components/ # React components (video/, segmentation/, etc.)
│ │ ├── services/ # API service functions (api/video.ts, api/segmentation.ts)
│ │ ├── pages/ # Top-level page components (VideoDetailPage.tsx)
│ │ ├── utils/ # Frontend utility functions (rleUtils.ts)
│ │ └── ...
│ └── Dockerfile
├── data/
│ ├── uploads/ # Directory for uploaded videos and extracted frames
│ │ └── [uuid]/
│ │ ├── frames/ # Extracted .jpg frames
│ │ ├── segmentation/ # Segmentation results in numpy arrays
│ │ │ ├── results/ # Results of segmentation masks for each frame
│ │ │ │ ├── 1/ #
│ │ │ │ ├── 2/ #
│ │ ├── pose/ # Pose detection results in numpy arrays
│ │ │ ├── results/ # { "frame_name": ..., "keypoints_data": ..., "keypoints_conf": ..., "bounding_box": ... }
│ │ │ │ ├── 1/ # Player 1
│ │ │ │ ├── 2/ # Player 2
│ │ ├── metadata.json # Video metadata
│ │ ├── pose.json # Pose detection frames indices for frontend
│ │ ├── segmentation.json # Segmentation marker inputs and frames indices for frontend
│ │ └── mainview_timestamp.csv
│ └── exports/
├── docker-compose.yml
├── README.md
└── ...
- Docker and Docker Compose
- NVIDIA GPU with CUDA support (for optimal performance)
- NVIDIA Container Toolkit installed
- SAM2 model weights (download instructions below)
- Clone the repository
- Start the containers:
docker-compose up --build- Access the API at
http://localhost:8000 - API documentation at
http://localhost:8000/docs
- Upload squash match video
- Find frames with main views
- Mark players in specific frames
- Generate player masks using SAM2
- Apply YOLO-Pose to detect body landmarks
The project uses Meta AI's Segment Anything Model 2 (SAM2) for high-quality player segmentation. SAM2 is a powerful foundation model for image segmentation that can be guided with prompts (points, boxes, or masks).
- After uploading a video, navigate to the segmentation tab
- Select a frame and mark positive points on the players (click on the player)
- Add negative points if needed (click on background or other elements)
- Process segmentation - SAM2 will generate masks for both players
- Masks are stored and linked to player tracking throughout the video
The project uses Ultralytics YOLO for pose detection, which provides accurate and efficient estimation of player poses during the match. Additionally, it integrates seamlessly with the segmentation results to enhance player tracking and analysis.