Real-time bird detection for your backyard, porch, or feeder β powered by YOLOv11 and local AI.
An OpenClaw skill that turns any Mac with a webcam into a live bird detection station. YOLO identifies birds in real-time with bounding boxes, Moondream VLM identifies the species, and you watch it all from your phone, tablet, or TV via a simple web link.
Everything runs locally on your machine. No cloud. No subscriptions. No data leaves your network.
Real detection: bird in flight captured at 64% confidence with green bounding box, HUD overlay showing bird count and fps. Forestville, CA.
Second detection: bird at 53% confidence against the treeline. Outdoor deck setup, Sonoma County.
You point a camera at your bird feeder. Bird Watcher does the rest:
- Detects birds in real-time using YOLOv11 β green bounding boxes appear the instant a bird enters the frame, even mid-flight
- Identifies species when close enough β when a detected bird is large enough in frame (50+ pixels), the system crops the region and asks a local VLM (Moondream) for species identification. Distant or fast fly-by birds get labeled "Bird" with confidence percentage. Species ID works best with birds perched nearby (feeders, railings, branches within ~15 feet of the camera).
- Streams live video to any device on your network β phone, tablet, laptop, or AirPlay to your TV
- Saves every detection β both the original frame and the annotated version with bounding boxes, timestamped
- Logs to wildlife census β if the OpenClaw
wildlife-censusskill is installed, sightings are recorded automatically
The camera feed and YOLO detection run in separate threads. The video is always smooth at full camera fps (~30fps). YOLO processes independently at ~10-15fps on Apple Silicon. You never see lag.
When running, the stream shows:
- Live camera feed at native resolution (1280Γ720)
- Green bounding boxes on detected birds with confidence percentage
- Species label (from Moondream VLM) on each box
- HUD overlay: bird count, camera fps, YOLO fps, detection counter, last species identified
- Timestamp in the corner
| Component | Minimum | Recommended |
|---|---|---|
| Mac | Any Mac with webcam | Apple Silicon (M1/M2/M3/M4) |
| Camera | Built-in MacBook camera | USB webcam for dedicated outdoor setup |
| RAM | 8GB | 16GB |
| Network | Not required for local use | WiFi for streaming to other devices |
Intel Macs work but expect ~5-8fps YOLO processing instead of 10-15fps.
- Python 3.10 or newer β check with
python3 --version - macOS β camera permissions require macOS Security & Privacy settings
- Moondream Station (optional) β for species identification. Without it, you still get bird detection with bounding boxes, just no species names.
- It cannot access the camera without your explicit permission. macOS requires you to grant camera access interactively β no script can bypass this. You must run a command in Terminal and click "Allow."
- It cannot stream to the internet by default. The feed is only accessible on your local network. This is intentional for privacy. See the Security section below if you want remote access.
- Species ID requires proximity. Moondream is a general-purpose VLM, not a bird-specific model. It works best when birds are close to the camera β perched on a feeder, railing, or branch within about 15 feet. Distant birds and fast fly-bys are detected with bounding boxes but labeled generically as "Bird" rather than attempting an inaccurate species guess. Common backyard birds (jays, sparrows, robins, finches, hawks) at close range get reliable species identification.
- It cannot run in the background on macOS. The camera permission is tied to the foreground Terminal process. The script must run in an open Terminal window.
git clone https://github.com/MS-707/bird-watcher-skill.git
cd bird-watcher-skillRecommended: use a virtual environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtAlternative: system-wide install (not recommended)
pip3 install -r requirements.txtIf you get an externally-managed-environment error on newer Python:
pip3 install --break-system-packages -r requirements.txt
β οΈ Warning:--break-system-packagesbypasses Python's environment isolation and can cause conflicts with system packages. Use a virtual environment instead whenever possible.
Verify everything installed:
python3 -c "import cv2, flask, ultralytics, requests; print('All dependencies OK')"Species identification requires Moondream Station running locally. Without it, birds are still detected with bounding boxes but labeled generically as "Bird" instead of by species name.
- Visit https://moondream.ai for installation instructions
- Once installed and running, verify with:
You should see a JSON response with
curl http://localhost:2020/health
"server": "moondream-station". - Bird Watcher will automatically detect Moondream on startup and enable species ID.
This is the most important step. Run this command in Terminal (not from a script):
python3 -c "import cv2; cap = cv2.VideoCapture(0); print('Camera:', cap.isOpened()); cap.release()"macOS will show a permission dialog. Click Allow. You only need to do this once.
If it prints Camera: True β you're good. If Camera: False β go to System Settings β Privacy & Security β Camera and enable Terminal (or Python).
python3 main.pyThe script will print a URL like:
π¦ Bird Watcher Live Stream v3
π Stream URL: http://YOUR_LOCAL_IP:8888?token=abc123xyz
Open that URL on your phone or any device on the same WiFi. The token is generated fresh each time you start the stream β only people with the URL can view your camera feed.
Note:
bird_watcher_stream.pyis kept as a backwards-compatible wrapper that callsmain.py.
All settings can be passed via CLI flags or environment variables:
# CLI flags
python3 main.py --port 9999 # Custom port
python3 main.py --model yolo11n.pt # Nano β fastest, less accurate
python3 main.py --model yolo11m.pt # Medium β slower, more accurate
python3 main.py --confidence 0.20 # Higher = fewer false positives
python3 main.py --persist 5 # Seconds to keep bounding box visible
python3 main.py --no-save # Don't save detection frames to disk
# Environment variables
BIRDWATCH_PORT=9999 python3 main.py
BIRDWATCH_MODEL=yolo11n.pt python3 main.py
BIRDWATCH_CONFIDENCE=0.20 python3 main.py
MOONDREAM_URL=http://192.168.1.50:2020 python3 main.py| Environment Variable | Default | Description |
|---|---|---|
BIRDWATCH_PORT |
8888 | HTTP port for stream server |
BIRDWATCH_MODEL |
yolo11s.pt | YOLO model file |
BIRDWATCH_CONFIDENCE |
0.15 | Detection confidence threshold |
BIRDWATCH_PERSIST |
3 | Seconds to keep bounding boxes visible |
BIRDWATCH_TOKEN |
(random) | Stream authentication token |
BIRDWATCH_MAX_FILES |
500 | Max saved detection frames before cleanup |
BIRDWATCH_MAX_VIEWERS |
5 | Max concurrent stream viewers |
BIRDWATCH_MIN_BIRD_SIZE |
50 | Min pixel size to trigger species ID |
MOONDREAM_URL |
http://localhost:2020 | Moondream VLM endpoint |
BIRDWATCH_DURATION |
1800 | Batch mode: total run time in seconds |
BIRDWATCH_INTERVAL |
8 | Batch mode: seconds between captures |
| Model | Size | FPS (M1) | FPS (Intel) | Accuracy | Use Case |
|---|---|---|---|---|---|
yolo11n.pt |
5MB | 15-25 | 8-12 | Good | Smooth streaming, less accurate |
yolo11s.pt |
18MB | 10-15 | 5-8 | Better | Recommended balance |
yolo11m.pt |
39MB | 5-8 | 2-4 | Great | Serious detection, still watchable |
yolo11l.pt |
87MB | 2-4 | <2 | Excellent | Maximum accuracy, slideshow fps |
Models auto-download on first run. They detect "bird" as one of 80 COCO object classes. No custom training needed for general bird detection.
bird-watcher-skill/
βββ main.py β Primary entry point (live stream)
βββ bird_watcher_stream.py β Backwards-compatible wrapper β main.py
βββ bird_watcher_batch.py β Batch detection mode (interval captures)
βββ config.py β argparse CLI + env var configuration
βββ camera.py β Camera capture thread + HUD overlay
βββ detector.py β YOLO detection thread + frame saving
βββ species_id.py β Moondream VLM species identification
βββ storage.py β Directory management + cleanup
βββ stream_server.py β Flask MJPEG server + auth
βββ requirements.txt
βββ yolo11s.pt β YOLO model (auto-downloaded)
βββ SKILL.md β OpenClaw skill definition
βββ detections/ β Saved detection frames (auto-managed)
βββββββββββββββββββββββββββββββββββββββββββββββ
β Camera Thread β
β cv2.VideoCapture(0) β 30fps raw frames β
β Overlays latest YOLO boxes onto each frame β
β Encodes as JPEG β MJPEG stream β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββ
β shares frames via lock
ββββββββββββββββββββΌβββββββββββββββββββββββββββ
β YOLO Thread β
β Pulls latest frame independently β
β Runs YOLOv11 detection (bird class only) β
β Stores bounding box coordinates + conf β
β Saves detection frames to disk β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββ
β on bird detection (5s cooldown)
ββββββββββββββββββββΌβββββββββββββββββββββββββββ
β Moondream Thread β
β Crops detected bird region + padding β
β Sends to local VLM for species ID β
β Updates species label on HUD β
βββββββββββββββββββββββββββββββββββββββββββββββ
The camera thread never waits for YOLO. YOLO never waits for Moondream. Each runs at its own natural speed. The video feed is always smooth.
Every time a bird is detected, two files are saved to ./detections/:
detections/
βββ orig_20260321_152401_728656.jpg β original frame, no annotations
βββ det_20260321_152401_728656.jpg β annotated with bounding boxes
βββ orig_20260321_153211_614977.jpg
βββ det_20260321_153211_614977.jpg
βββ session_20260321_160000.json β session summary (batch mode only)
Files auto-rotate after 500 frames to prevent filling your disk. Oldest files are deleted first. Adjust via the BIRDWATCH_MAX_FILES environment variable or --max-files flag.
For unattended monitoring (no live stream, just detection logging):
python3 bird_watcher_batch.py --duration 3600 --interval 10Captures a frame every 10 seconds for 1 hour. Runs YOLO + Moondream on each frame. Saves detections. Prints a summary at the end. Good for understanding when birds visit your feeder.
This skill accesses your camera and streams video on your local network. Please understand:
- Camera access is gated by macOS permissions. No script can access your camera without your explicit consent via the system dialog.
- Stream authentication β a random token is generated each time you start the stream. Only devices with the full URL (including token) can view the feed. The token is printed in your Terminal when the stream starts.
- Local network only β the stream is NOT accessible from the internet by default. It binds to your local IP address. Only devices on your WiFi can connect.
- No cloud services β YOLO runs locally via PyTorch. Moondream runs locally. No images or video are sent to any external server. Everything stays on your machine.
- Detection frames on disk β saved frames contain images from your camera. They're stored in the
detections/folder. Be aware of this if you share your computer or back up to cloud storage. Auto-cleanup removes old files after 500 frames. - Viewer limit β maximum 5 concurrent viewers to prevent resource exhaustion.
- Flask development server β the built-in web server is suitable for home use but not hardened for public internet exposure. Do not expose this directly to the internet without a reverse proxy and proper TLS.
If you want remote access (viewing from outside your home network), consider:
- Tailscale β free, creates a private VPN between your devices
- An SSH tunnel β
ssh -L 8888:localhost:8888 your-mac-ip - Do NOT use ngrok or port forwarding without understanding the privacy implications of exposing your camera feed
Bird Watcher works standalone, but it's designed to integrate with the OpenClaw agent ecosystem:
-
Wildlife Census β separate OpenClaw skill for species logging and life lists. If installed, every Bird Watcher detection is automatically logged with species, count, and timestamp
-
Telegram Alerts β your OpenClaw agent can send detection photos to Telegram when a bird is spotted
-
Scheduled Sessions β set up cron jobs to run batch detection during peak feeding times (early morning, late afternoon)
| Problem | Solution |
|---|---|
Camera: False or black screen |
Run the camera permission command interactively in Terminal. Click Allow. |
OpenCV: not authorized to capture video |
Must run in foreground Terminal, not via nohup, background, or exec. |
| Stream works on Mac but not phone | macOS firewall is blocking. Disable temporarily or add Python to allowed apps. |
| Low fps (<5) | Use --model yolo11n.pt for faster processing. Close GPU-heavy apps. |
| No species identification | Moondream Station not running. The stream still works β you just get "Bird" instead of species names. |
Address already in use |
Previous instance still running. kill $(lsof -ti:8888) or use --port 9999. |
| Birds not being detected | Lower threshold: --confidence 0.10. Move camera closer to feeder. Ensure birds aren't too small in frame. |
ModuleNotFoundError |
Run pip3 install -r requirements.txt again. Check you're using the right Python. |
-
YOLOv11 by Ultralytics β the object detection model
-
Moondream β local vision-language model for species identification
-
OpenClaw β the agent framework this skill is designed for
-
Birds-YOLO β research paper on YOLOv11 bird detection that inspired the approach
Issues and pull requests welcome. If you improve the detection, add support for new platforms (Linux, Windows), or create a better dashboard UI, we'd love to see it.
MIT β see LICENSE for details.
Created by Mark Starr & Victor as an open-source OpenClaw skill. Free to use, modify, and share.