GitHub - cindyzli/wubwub: Won first place at Big Red Hacks '25

Anyone can be a DJ!
Devpost »

Cindy Li · Cindy Yang · Elise Zhu

Table of Contents

About The Project
- Built With
Technologies
Contact
Acknowledgments

About The Project

wubwub is a web-based, gesture-controlled music mixing interface that brings the energy of a DJ setup straight into the browser. Built with computer vision, no keyboard or mouse is required. Every control is fully responsive to hand movements!

Features

Dual Spinning CDs – Showcases album art for the current and upcoming songs.
Gesture-Controlled Sliders – Adjust bass boost and volume with vertical hand movements.
Nightcore Switch – Triggerable by gesture, this not only alters playback but also transforms the site’s theme (day ↔ night) in neon DJ style.

Dynamic Song Queue – Paste YouTube links to add tracks to the queue on the fly.
Sound Bites – Quick-trigger sound effect buttons for live mixing, editable through a pop-up menu.

LED Visualizer – A color bar displays dynamic LED color matching the mood of the song, with synced flashing along the border of the interface.

Built With

(back to top)

Technologies

MongoDB

MongoDB stores the active song queue, sound bite configurations, and user session data. It allows the system to persist state across reloads and ensures seamless queue management when multiple tracks are added.

Gemini

Gemini is used to enrich the music experience by generating color hues that match the mood of each song. These hues are displayed on both the website and LED strip and flash to the beat!

Web Audio API

The Web Audio API is a low-level audio processing interface built into modern browsers. Instead of just playing audio files, it exposes a graph-based system of audio nodes that can be connected, modified, and rerouted in real time.

In our project, audio streams from YouTube-DLP are fed into AudioContext nodes, where we apply effects like biquad filters for bass boosting, playback rate adjustments for Nightcore, and gain nodes for volume control. Because the API runs natively in the browser, these transformations are highly performant and sample-accurate, giving us professional-grade sound manipulation without external software.

Mediapipe and OpenCV

Mediapipe provides a pretrained, GPU-optimized pipeline for tracking hand landmarks in real time (21 key points per hand). Under the hood, it uses deep learning models to estimate 3D positions of these landmarks from a webcam feed. We use these points to detect gestures. For example, vertical hand movement mapped to volume, or a toggle gesture mapped to the Nightcore switch.

Meanwhile, OpenCV handles lower-level video stream preprocessing (frame capture, smoothing, thresholding). Mediapipe alone can track landmarks, but OpenCV gives us fine control to stabilize and clean the input data. Together, they provide a fast, accurate gesture recognition system that works entirely on consumer webcams.

Youtube DLP

YouTube-DLP is a Python-based tool that bypasses YouTube’s standard playback UI to fetch the direct media streams and metadata. Behind the hood, it parses YouTube’s page structure, extracts the dynamic streaming manifests (DASH/HLS), and resolves the highest-quality audio link.

We use this to retrieve audio in a format directly consumable by the Web Audio API, along with thumbnails/metadata for the spinning CD visuals. Since fetching and parsing manifests can take time, we threaded this process so the user doesn’t experience UI freezes while adding new tracks.

Arduino and Adafruit

Arduino drives connected Adafruit hardware to sync the digital experience with the physical world. LED strips update in real-time with the color visualizer, bringing the board’s energy into a live glowing border. This is communicated through serial instructions from the python server.

Sockets and Multithreading

Instead of having the front end constantly poll a REST API for updates (which is inefficient and introduces latency), we used WebSockets to push gesture recognition results directly from the CV backend to the React interface. This gives near real-time responsiveness — the moment a hand moves, the slider updates.

We also applied multithreading in the backend: YouTube-DLP runs in its own thread so video metadata extraction and audio URL resolution don’t block the main Flask event loop. This ensures that gesture control and LED updates remain smooth while new songs are loading in parallel.

Figma and Tailwind

Figma was used to design the UI with an emphasis on neon, futuristic DJ vibes. Tailwind CSS translated those designs into a clean, responsive, and customizable interface that feels both modern and immersive.

Vite & React

React powers the interactive UI — from the dual spinning CDs to the live queue and sound bite grid. Vite makes development fast and efficient with hot reloading and optimized builds for smooth browser performance.

Flask

Flask serves as the project’s backend framework, connecting all the pieces together. It manages the API endpoints, handles YouTube-DLP calls, talks to MongoDB, and coordinates real-time communication with the front end and hardware.

Contact

Cindy Li (audio processing, hardware, youtube extraction) - cl2674@cornell.edu

Cindy Yang (design, mongodb, hardware, gemini, threading) - cwyang@umich.edu

Elise Zhu (gesture recognition, sockets, sound bites) - eyz7@georgetown.edu

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
dj-player-fr		dj-player-fr
dj-player		dj-player
downloads		downloads
server		server
.gitignore		.gitignore
README.md		README.md
arduino.ino		arduino.ino
darklogo.jpg		darklogo.jpg
gesturedetector.py		gesturedetector.py
image-1.png		image-1.png
image-2.png		image-2.png
image-3.png		image-3.png
image-4.png		image-4.png
image-5.png		image-5.png
image-6.png		image-6.png
image.png		image.png
youtubemp3converter.py		youtubemp3converter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation