Skip to content

SohamB-42/Gesture-Controlled-Media-Player

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

✋ Gesture-Controlled Media Player

A Practical Computer Vision–Based HCI Project

Ever found yourself wanting to pause or skip a video without reaching for the keyboard—maybe while just being lazy in the best possible way?
This project attempts to turn that idea into a working system.

The Gesture-Controlled Media Player is a real-time hand gesture–based media control interface that uses a webcam to interpret simple, intuitive hand gestures and translate them into playback commands such as play/pause and seek forward/backward.

Under the hood, it combines MediaPipe hand landmark detection, geometric reasoning, and OS-level keyboard automation. The focus is on robustness, clarity, and real-time performance, not flashy but fragile tricks.


What It Can Do

Gesture Action
✊ Closed fist (held briefly) Play / Pause
☝️ Index finger on left half of the screen Seek backward
☝️ Index finger on right half of the screen Seek forward

Other nice-to-haves:

  • Built-in cooldowns to avoid accidental multiple triggers
  • Live visual overlays so you know what the system is detecting
  • Works with any browser-based video player, not just YouTube

🧠 Project Idea (In Plain Terms)

This project explores vision-based human–computer interaction (HCI) by mapping where your hand is in space to what action the system performs over time.

Instead of relying on heavy, opaque gesture classifiers, the system uses:

  • Explicit hand landmark geometry
  • Per-finger state estimation
  • Simple spatial reasoning
  • Time-based stability checks

The result is a system that is easier to understand, debug, and extend.


▶️ Setup and Execution

1. Clone the Repository

git clone https://github.com/SohamB-42/gesture-controlled-media-player.git
cd gesture-controlled-media-player

2. Install Dependencies

pip install opencv-python mediapipe pyautogui

3. Project Requirements

Ensure that HandTrackingModule.py (MediaPipe wrapper) is present in the project directory.

4. Run the Application

python main.py

Press q to exit cleanly.


Application Compatibility

The system controls media playback by simulating standard keyboard inputs using pyautogui. As a result, it works with any application that responds to common media keys, provided the application window is in focus.

Supported Examples

  • Web-based players (YouTube, Netflix, Prime Video, Coursera, etc.)
  • Desktop media players (VLC, Windows Media Player)
  • Presentation software (PowerPoint, Google Slides)
  • Custom video players and demo applications

How Compatibility Is Achieved

The system sends the following key events:

  • Space to Play / Pause
  • / to Seek backward / forward

This design keeps the project platform-agnostic and application-independent, avoiding tight coupling with any specific media player or API.


🔍 How It Works Internally

1. Hand Detection and Tracking

  • Detects and tracks up to two hands simultaneously
  • Each hand is represented by 21 two-dimensional landmarks
  • Frames are horizontally flipped for intuitive left–right interaction

2. Finger State Estimation

Each finger is classified as extended or not extended using relative landmark positions:

  • Index to pinky fingers use vertical (Y-axis) comparisons
  • The thumb uses horizontal (X-axis) comparisons to account for mirrored camera input

This produces a compact representation:

[thumb, index, middle, ring, pinky]

3. Closed Fist Detection (Play / Pause)

A closed fist gesture is identified using multiple checks:

  • Number of extended non-thumb fingers
  • Palm compactness (distance between index and pinky fingertips)
  • Thresholds scaled relative to frame width

The gesture must be held briefly, and a cooldown is enforced to prevent rapid toggling.
This makes play/pause intentional rather than accidental.


4. Directional Seeking Using Spatial Context

The camera frame is divided vertically into two regions:

|  SEEK BACKWARD  |  SEEK FORWARD  |

When the index finger is extended:

  • Presence in the left region triggers a backward seek
  • Presence in the right region triggers a forward seek

Each direction has its own cooldown for controlled interaction.


5. Media Control via Keyboard Automation

The system simulates standard keyboard inputs using pyautogui:

Action Key
Play / Pause Space
Seek backward
Seek forward

Because it uses standard keys, the system is platform- and application-agnostic.


⚙️ Configuration & Customization

The system is designed to be user-configurable, allowing gesture sensitivity and responsiveness to be tuned based on personal preference, camera quality, and lighting conditions.

Key parameters can be adjusted directly in the source code:

CAMERA_ID = 0          # Change if using an external webcam
DRAW = True            # Toggle on-screen visual overlays

SEEK_COOLDOWN = 0.45      # Delay between consecutive seek actions (seconds)
FIST_HOLD_SECONDS = 0.14  # How long a fist must be held to trigger play/pause
FIST_COOLDOWN = 0.9       # Minimum delay between play/pause toggles

🛠️ Tech Stack

  • Python
  • OpenCV – video capture and rendering
  • MediaPipe Hands – real-time hand landmark detection
  • pyautogui – keyboard automation
  • Basic geometry & timing logic

⚠️ Known Limitations

  • Requires reasonable lighting conditions
  • Extreme hand rotations can reduce accuracy
  • Gesture set is intentionally minimal to prioritize reliability

📄 License

Open-source and free to use, modify, and extend.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages