A computer vision application that detects objects through your camera and provides spatial audio alerts based on where each object is located in the frame (left, center, or right).
- Real-time object detection using YOLOv8 (80 COCO classes)
- Spatial audio alerts — spoken announcements like "person on the left"
- Configurable cooldown — avoids repeating the same alert too often
- Visual overlay — bounding boxes, labels, and a left/center/right zone guide
- Distance estimation — rough near/mid/far classification based on box size
# 1. Install dependencies
pip install -r requirements.txt
# 2. Run the app
python main.py| Key | Action |
|---|---|
q |
Quit |
m |
Mute / unmute audio alerts |
+ / - |
Increase / decrease detection confidence threshold |
Edit config.py or pass CLI flags:
python main.py --confidence 0.5 --cooldown 4 --camera 0| Flag | Default | Description |
|---|---|---|
--camera |
0 |
Camera device index |
--confidence |
0.45 |
Minimum detection confidence |
--cooldown |
3 |
Seconds before re-announcing the same object+zone |
--no-audio |
false |
Start with audio muted |
bet320/
├── main.py # Entry point — capture loop & display
├── detector.py # YOLOv8 object detection wrapper
├── audio_alert.py # Text-to-speech spatial alert engine
├── config.py # Default configuration & CLI arg parsing
├── requirements.txt
└── README.md
- Python 3.9+
- Webcam / camera
- macOS (uses
saycommand for TTS; easily adaptable to other OS viapyttsx3)
- Each frame is divided into three vertical zones: Left (0–33%), Center (33–66%), Right (66–100%).
- YOLOv8 detects objects and returns bounding boxes.
- The center-x of each bounding box determines which zone the object is in.
- The relative height of the bounding box estimates distance (near / mid / far).
- An audio alert is spoken in a background thread, e.g. "car, center, near".
- A cooldown timer prevents the same object+zone from being repeated too quickly.