MonkeyFacer is a web application built with React + TypeScript + Vite that captures webcam frames, extracts face / hand / pose landmarks using the pre‑trained MediaPipe Holistic model, classifies a small set of gestures using project functions (heuristics over landmarks), and displays a representative image for the current gesture.
The application does not include a custom training pipeline—MediaPipe Holistic is the ML component used to provide landmarks. Gesture classification is performed with deterministic code that inspects the landmark coordinates.
src/ml/HolisticDetectors.tsx— MediaPipe Holistic integration, camera loop, draw on canvas, call classification.
https://github.com/Raulmora22/MonkeyFacer/blob/master/src/ml/HolisticDetectors.tsxsrc/func/gestures.ts— gesture classification logic and mapping from gesture → image.
https://github.com/Raulmora22/MonkeyFacer/blob/master/src/func/gestures.tssrc/stores/GestureStore.ts— Zustand store maintaining the current gesture.
https://github.com/Raulmora22/MonkeyFacer/blob/master/src/stores/GestureStore.tssrc/components/ImageDisplay.tsx— component that renders the image corresponding to the detected gesture.
https://github.com/Raulmora22/MonkeyFacer/blob/master/src/components/ImageDisplay.tsxsrc/App.tsx— application root that mounts HolisticDetectors and ImageDisplay.
https://github.com/Raulmora22/MonkeyFacer/blob/master/src/App.tsxpublic/MonkeyFacer/images/— images referenced by the app (smile, eureca, etc.)
-
App mounting
src/App.tsxrenders two main UI parts side by side:HolisticDetectorsandImageDisplay.
-
Camera and Holistic model
HolisticDetectors:- Creates a hidden
<video>element (videoRef) used as the camera source. - Creates a
<canvas>element (canvasRef) used to draw the camera frame and landmark visualizations. - Instantiates MediaPipe
Holisticand configureslocateFileto load model files from CDN. - Uses
Camerafrom@mediapipe/camera_utilsto stream frames from the webcam. - On every frame, it sends the video frame to the Holistic instance (
HandsMesh.send({ image: video })).
- Creates a hidden
-
Receiving results
- The Holistic instance calls
onResults(results)with aResultsobject that contains:results.image— the input image/frame,results.faceLandmarks— array of face landmark points,results.leftHandLandmarksandresults.rightHandLandmarks,results.poseLandmarks— pose landmarks (shoulders, wrists, etc.),- other fields as provided by MediaPipe Holistic.
- The Holistic instance calls
-
Drawing
- In
onResults, the code:- Clears the canvas and draws
results.imagescaled to the canvas size. - Uses
drawConnectorsanddrawLandmarksto render face mesh, eyes, brows, irises, face oval, lips, and hand connections using MediaPipe drawing utilities. - Visual styles (colors, line widths) are applied when drawing connectors/landmarks.
- Clears the canvas and draws
- In
-
Gesture classification
- After drawing, the code calls
classifyGesture(results)(defined insrc/func/gestures.ts). classifyGestureinspectsresults(face / pose landmarks) and returns a gesture label:"eureca","smile", or"serious".- The returned gesture is stored in a global store via
setGesture(detectedGesture)(Zustand).
- After drawing, the code calls
-
UI update
ImageDisplayreadscurrentGesturefromuseGestureStore.- Based on the gesture, it loads and displays the corresponding image from
/MonkeyFacer/images/....
- Holistic instance created with:
locateFile: file => "https://cdn.jsdelivr.net/npm/@mediapipe/holistic/" + file
- Options set on the Holistic instance:
modelComplexity: 2smoothLandmarks: false(value used in the code)minDetectionConfidence: 0.5minTrackingConfidence: 0.5
- The camera is started with width 640 and height 480.
These options control model complexity and detection/tracking confidence thresholds and determine the landmark output used by the gesture classification logic.
Gestures and how they are detected are defined in src/func/gestures.ts. The project defines three gesture types:
Type declaration:
type GestureType = "smile" | "serious" | "eureca";- "eureca" — hands up
- Uses
results.poseLandmarks. - Landmark indices used:
- left shoulder:
poseLandmarks[11] - right shoulder:
poseLandmarks[12] - left wrist:
poseLandmarks[15] - right wrist:
poseLandmarks[16]
- left shoulder:
- Detection rule:
- If
leftWrist.y < leftShoulder.yORrightWrist.y < rightShoulder.y, the function returns"eureca".
- If
- "smile" — smile detection (mouth aspect ratio)
- Uses
results.faceLandmarks. - Face landmark indices used:
- left mouth corner:
face[291] - right mouth corner:
face[61] - upper lip:
face[13] - lower lip:
face[14]
- left mouth corner:
- Detection steps:
- Compute mouth width = distance(left corner, right corner)
- Compute mouth height = distance(upper lip, lower lip)
- Compute mouthAspectRatio = mouthHeight / mouthWidth
- If
mouthAspectRatio > SMILE_THRESHOLD (0.35), returns"smile".
- "serious" — neutral / fallback
- If neither hands-up nor smile conditions match,
classifyGesturereturns"serious".
Classification function (summary):
export function classifyGesture(results: Results): GestureType | null {
if (detectHandsUp(results)) return "eureca";
if (detectSmile(results)) return "smile";
return "serious";
}Gesture → image mapping (from gestureToImage):
{
smile: "/MonkeyFacer/images/smile.png",
serious: "/MonkeyFacer/images/xd.png",
eureca: "/MonkeyFacer/images/eureca.png",
}- A Zustand store in
src/stores/GestureStore.tsholds the current gesture:currentGesture: GestureType | nullsetGesture: (gesture: GestureType | null) => void
- Components subscribe to this store to get live updates (e.g.,
ImageDisplayreadscurrentGestureto display the image).
- Video element:
- A
<video>element is created and kept hidden; it serves as the direct camera source for MediaPipe.
- A
- Canvas:
- A
<canvas>displays the drawn frame and all landmarks/connections. - The canvas is rendered with a horizontal flip (CSS class
scale-x-[-1]), so the drawn output is mirrored relative to the incoming camera image.
- A
- ImageDisplay:
- Shows a static image that corresponds to the currently detected gesture.
- The displayed image is taken from
/MonkeyFacer/images/...paths.
- Gesture store (Zustand):
import { create } from "zustand";
type GestureType = "smile" | "serious" | "eureca";
type GestureStore = {
currentGesture: GestureType | null;
setGesture: (gesture: GestureType | null) => void;
};
export const useGestureStore = create<GestureStore>((set) => ({
currentGesture: null,
setGesture: (gesture) => set({ currentGesture: gesture }),
}));- ImageDisplay (reads store and shows mapped image):
import { useGestureStore } from "../stores/GestureStore";
import { gestureToImage } from "../func/gestures";
export default function ImageDisplay() {
const currentGesture = useGestureStore((state) => state.currentGesture);
const imageSrc = currentGesture
? gestureToImage[currentGesture]
: "/MonkeyFacer/images/ahhh.png";
return (
<div className="w-[640px] h-[480px] shrink-0">
<img src={imageSrc} alt="Gesture Representation" className="w-full h-full object-cover rounded-md" />
</div>
);
}- Holistic integration (key steps):
- Create Holistic, set options, register
onResults. - Create
Camera(videoRef.current, { onFrame: () => HandsMesh.send({ image: videoRef.current }) })andcamera.start(). - On
onResults, draw image and landmarks, callclassifyGesture, and callsetGesture.
- Create Holistic, set options, register
- Clone the repository:
git clone https://github.com/Raulmora22/MonkeyFacer.git
cd MonkeyFacer- Install dependencies (project uses pnpm in package.json, but npm or yarn can be used depending on environment):
pnpm install- Start development server:
pnpm run dev- Open the app in your browser (default Vite dev URL, e.g. http://localhost:5173).
When the app runs it will request access to your webcam. The Holistic model will process frames and the UI will update with the detected gesture image.