Skip to content

cindyzli/wubwub

Repository files navigation


Logo

Anyone can be a DJ!
Devpost »

Cindy Li · Cindy Yang · Elise Zhu

Table of Contents
  1. About The Project
  2. Technologies
  3. Contact
  4. Acknowledgments

About The Project

alt text

wubwub is a web-based, gesture-controlled music mixing interface that brings the energy of a DJ setup straight into the browser. Built with computer vision, no keyboard or mouse is required. Every control is fully responsive to hand movements!

Features

  • Dual Spinning CDs – Showcases album art for the current and upcoming songs.
  • Gesture-Controlled Sliders – Adjust bass boost and volume with vertical hand movements.
  • Nightcore Switch – Triggerable by gesture, this not only alters playback but also transforms the site’s theme (day ↔ night) in neon DJ style.

alt text

  • Dynamic Song Queue – Paste YouTube links to add tracks to the queue on the fly.
  • Sound Bites – Quick-trigger sound effect buttons for live mixing, editable through a pop-up menu.

alt text

  • LED Visualizer – A color bar displays dynamic LED color matching the mood of the song, with synced flashing along the border of the interface.

Built With

MongoDB Gemini Web Audio API Mediapipe OpenCV Youtube DLP Arduino Adafruit Python Figma Tailwind Vite Flask

(back to top)

Technologies

MongoDB

MongoDB stores the active song queue, sound bite configurations, and user session data. It allows the system to persist state across reloads and ensures seamless queue management when multiple tracks are added.

alt text

Gemini

Gemini is used to enrich the music experience by generating color hues that match the mood of each song. These hues are displayed on both the website and LED strip and flash to the beat!

Web Audio API

The Web Audio API is a low-level audio processing interface built into modern browsers. Instead of just playing audio files, it exposes a graph-based system of audio nodes that can be connected, modified, and rerouted in real time.

In our project, audio streams from YouTube-DLP are fed into AudioContext nodes, where we apply effects like biquad filters for bass boosting, playback rate adjustments for Nightcore, and gain nodes for volume control. Because the API runs natively in the browser, these transformations are highly performant and sample-accurate, giving us professional-grade sound manipulation without external software.

Mediapipe and OpenCV

Mediapipe provides a pretrained, GPU-optimized pipeline for tracking hand landmarks in real time (21 key points per hand). Under the hood, it uses deep learning models to estimate 3D positions of these landmarks from a webcam feed. We use these points to detect gestures. For example, vertical hand movement mapped to volume, or a toggle gesture mapped to the Nightcore switch.

Meanwhile, OpenCV handles lower-level video stream preprocessing (frame capture, smoothing, thresholding). Mediapipe alone can track landmarks, but OpenCV gives us fine control to stabilize and clean the input data. Together, they provide a fast, accurate gesture recognition system that works entirely on consumer webcams.

alt text

Youtube DLP

YouTube-DLP is a Python-based tool that bypasses YouTube’s standard playback UI to fetch the direct media streams and metadata. Behind the hood, it parses YouTube’s page structure, extracts the dynamic streaming manifests (DASH/HLS), and resolves the highest-quality audio link.

We use this to retrieve audio in a format directly consumable by the Web Audio API, along with thumbnails/metadata for the spinning CD visuals. Since fetching and parsing manifests can take time, we threaded this process so the user doesn’t experience UI freezes while adding new tracks.

Arduino and Adafruit

Arduino drives connected Adafruit hardware to sync the digital experience with the physical world. LED strips update in real-time with the color visualizer, bringing the board’s energy into a live glowing border. This is communicated through serial instructions from the python server.

alt text

Sockets and Multithreading

Instead of having the front end constantly poll a REST API for updates (which is inefficient and introduces latency), we used WebSockets to push gesture recognition results directly from the CV backend to the React interface. This gives near real-time responsiveness — the moment a hand moves, the slider updates.

We also applied multithreading in the backend: YouTube-DLP runs in its own thread so video metadata extraction and audio URL resolution don’t block the main Flask event loop. This ensures that gesture control and LED updates remain smooth while new songs are loading in parallel.

Figma and Tailwind

Figma was used to design the UI with an emphasis on neon, futuristic DJ vibes. Tailwind CSS translated those designs into a clean, responsive, and customizable interface that feels both modern and immersive.

Vite & React

React powers the interactive UI — from the dual spinning CDs to the live queue and sound bite grid. Vite makes development fast and efficient with hot reloading and optimized builds for smooth browser performance.

Flask

Flask serves as the project’s backend framework, connecting all the pieces together. It manages the API endpoints, handles YouTube-DLP calls, talks to MongoDB, and coordinates real-time communication with the front end and hardware.

Contact

Cindy Li (audio processing, hardware, youtube extraction) - cl2674@cornell.edu

Cindy Yang (design, mongodb, hardware, gemini, threading) - cwyang@umich.edu

Elise Zhu (gesture recognition, sockets, sound bites) - eyz7@georgetown.edu

About

Won first place at Big Red Hacks '25

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors