FaceTrackingKit

A Swift framework for real-time face tracking on iOS, built for research. Supports both ARKit (TrueDepth camera devices) and MediaPipe (any device with a front camera).

Features

Real-time blend shapes, face mesh vertices, gaze tracking, and light estimation
FACS Action Units — 14 Action Unit intensities computed automatically from blend shapes
Head pose — pitch, yaw, roll in radians (ARKit only)
Event markers — timestamped labels for aligning face data with experimental stimuli
Async stream API for live frame access
Built-in session storage with CSV, JSON Lines, and HDF5 export
Optional image and depth map capture
MediaPipe model is downloaded and cached automatically

Requirements

iOS 17+
Swift 6.0+

Installation

Add FaceTrackingKit to your project via Swift Package Manager:

dependencies: [
    .package(url: "https://github.com/digital-medicine/FaceTrackingKit", from: "0.1.0")
]

Quick Start

1. Choose a provider

ARKit — High accuracy, requires Face ID device (iPhone/iPad with TrueDepth camera). Provides 52 blend shapes, gaze tracking, light estimation, and depth maps.

MediaPipe — Works on any iOS device with a front camera. Provides 51 blend shapes and 478 face landmarks. The model (~4 MB) is downloaded automatically on first use.

2. Add camera permission

Add to your Info.plist:

NSCameraUsageDescription
This app uses the camera for face tracking.

3. Track faces

import FaceTrackingKit

// Create a tracker — pick one:
let tracker = FaceTracker(provider: .arKit())       // ARKit (Face ID devices)
let tracker = FaceTracker(provider: .mediaPipe())    // MediaPipe (any device)

// Start a session
try await tracker.start(participant: "P001")

// Mark experimental events
tracker.addEvent("stimulus_onset")

// Read frames in real time
for await frame in tracker.frames {
    if let aus = frame.actionUnits {
        print("AU12 (smile): \(aus[.au12] ?? 0)")
    }
    if let pose = frame.headPose {
        print("Head yaw: \(pose.yaw) rad")
    }
}

// Stop and get a summary
let result = try await tracker.stop()
print("Captured \(result.frameCount) frames over \(result.duration)s")

4. Export session data

let documentsURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!

let exportResult = try await tracker.export(
    session: result.sessionID,
    to: documentsURL,
    options: .init(tabularFormat: .csv, includeImages: false)
)

print("Exported to: \(exportResult.directory.path)")
// Produces:
//   session_P001_2025-06-15/
//     session.json          — session metadata
//     blendshapes.csv       — one row per frame, one column per blend shape
//     metadata.csv          — timestamps, gaze, light, head pose, action units
//     events.json           — timestamped event markers (if any)

Configuration

ARKit

let tracker = FaceTracker(provider: .arKit(.init(
    captureBlendShapes: true,
    captureLookAtPoint: true,
    captureLightEstimation: true,
    captureDistanceToScreen: true,
    captureHeadPose: true,
    vertices: .all(precision: .float32),
    captureImages: .everyNthFrame(10),
    captureDepthMaps: .none,
    maxTrackedFaces: 1
)))

MediaPipe

let tracker = FaceTracker(provider: .mediaPipe(.init(
    captureBlendShapes: true,
    vertices: .all(precision: .float32),
    captureImages: .everyNthFrame(10),
    maxTrackedFaces: 1
)))

To use a bundled model file instead of the automatic download:

let tracker = FaceTracker(provider: .mediaPipe(.init(
    modelPath: Bundle.main.url(forResource: "face_landmarker", withExtension: "task")!
)))

Vertex capture modes

.none                                       // No vertices (default)
.all(precision: .float32)                   // All vertices, full precision
.all(precision: .float16)                   // All vertices, half precision (~50% smaller)
.subset(indices: [0, 10, 20], precision: .float32)  // Specific vertex indices
.stride(4, precision: .float32)             // Every 4th vertex

Image capture modes

.none               // No images (default)
.everyFrame         // Every frame (~100-150 MB/min)
.everyNthFrame(5)   // Every 5th frame

Session Management

// List all stored sessions
let sessions = try await tracker.listSessions()
for session in sessions {
    print("\(session.participant): \(session.frameCount) frames, \(session.storageSizeBytes) bytes")
}

// Delete a session
try await tracker.deleteSession(session.sessionID)

Pause and Resume

try await tracker.start(participant: "P001")

// Pause tracking (data is preserved)
try tracker.pause()

// Resume tracking (appends to the same session)
try await tracker.resume()

let result = try await tracker.stop()

Error Handling

do {
    try await tracker.start(participant: "P001")
} catch FaceTrackerError.providerUnavailable(let name) {
    print("Provider \(name) is not supported on this device")
} catch FaceTrackerError.modelDownloadFailed(let error) {
    print("Could not download MediaPipe model: \(error)")
} catch FaceTrackerError.permissionDenied {
    print("Camera access was denied")
}

Action Units

FaceTrackingKit automatically computes 14 FACS Action Unit intensities from blend shapes (based on EMFACS Table 1, Aldenhoven et al.). Action units are available on every frame when blend shapes are enabled — no additional configuration needed.

for await frame in tracker.frames {
    if let aus = frame.actionUnits {
        print("AU6 (cheek raise): \(aus[.au6] ?? 0)")
        print("AU12 (smile): \(aus[.au12] ?? 0)")
    }
}

You can also compute action units from arbitrary blend shape dictionaries:

let aus = FaceTracker.actionUnits(from: blendShapes)

AU	Name	Blend Shapes
AU1	Inner Brow Raise	browInnerUp
AU2	Outer Brow Raise	mean(browOuterUpLeft, browOuterUpRight)
AU4	Brow Lowerer	mean(browDownLeft, browDownRight)
AU5	Upper Lid Raise	mean(eyeWideLeft, eyeWideRight)
AU6	Cheek Raise	mean(cheekSquintLeft, cheekSquintRight)
AU7	Lid Tightener	mean(eyeSquintLeft, eyeSquintRight)
AU9	Nose Wrinkler	mean(noseSneerLeft, noseSneerRight)
AU12	Lip Corner Puller	mean(mouthSmileLeft, mouthSmileRight)
AU14	Dimpler	mean(mouthDimpleLeft, mouthDimpleRight)
AU15	Lip Corner Depressor	mean(mouthFrownLeft, mouthFrownRight)
AU16	Lower Lip Depressor	mean(mouthLowerDownLeft, mouthLowerDownRight)
AU20	Lip Stretcher	mean(mouthStretchLeft, mouthStretchRight)
AU23	Lip Tightener	mouthPucker
AU26	Jaw Drop	jawOpen

Head Pose

Head orientation (pitch, yaw, roll) is extracted from the ARKit face anchor transform. Enabled by default, ARKit only.

for await frame in tracker.frames {
    if let pose = frame.headPose {
        print("Pitch: \(pose.pitch), Yaw: \(pose.yaw), Roll: \(pose.roll)")
    }
}

Disable with captureHeadPose: false in ARKitConfiguration. Not available for MediaPipe.

Event Markers

Record timestamped event markers to align face data with experimental stimuli (e.g., stimulus onset, trial boundaries). Events are written to events.json in the export directory.

try await tracker.start(participant: "P001")

tracker.addEvent("baseline_start")
// ... present stimulus ...
tracker.addEvent("stimulus_onset")
// ... wait ...
tracker.addEvent("stimulus_offset")

let result = try await tracker.stop()

Events are no-ops when no session is active. The timestamps use the same epoch-seconds clock as Frame.timestamp.

Export Formats

CSV (default)

blendshapes.csv — one row per frame with columns for each blend shape:

timestamp,frameIndex,browDownLeft,browDownRight,...,noseSneerRight
1718451234.5,0,0.012,0.009,...,0.003
1718451234.53,1,0.014,0.011,...,0.002

metadata.csv — includes head pose and action units alongside other metadata:

timestamp,frameIndex,isFaceTracked,...,headPose.pitch,headPose.yaw,headPose.roll,au1,au2,...,au26

JSON Lines

{"timestamp":1718451234.5,"frameIndex":0,"browDownLeft":0.012,...}
{"timestamp":1718451234.53,"frameIndex":1,"browDownLeft":0.014,...}

HDF5

Export to a single HDF5 file for Python/scientific analysis workflows. Readable by h5py, MATLAB, and R. No external dependencies required.

let exportResult = try await tracker.export(
    session: result.sessionID,
    to: documentsURL,
    options: .init(tabularFormat: .hdf5)
)
// exportResult.hdf5File → session.h5

The HDF5 file contains these datasets:

Dataset	Shape	Type	Description
`/blendshapes`	N × 52	Float32	Blend shape values per frame
`/metadata`	N × M	Float32	Head pose, AUs, gaze, light, emotions
`/timestamps`	N	Float64	Frame timestamps (epoch seconds)
`/frame_indices`	N	Int32	Sequential frame numbers
`/events`	K	String	Event markers (if recorded)

Each dataset has a columns attribute with comma-separated column names.

Reading in Python:

import h5py

with h5py.File("session.h5", "r") as f:
    timestamps = f["timestamps"][:]
    blendshapes = f["blendshapes"][:]
    columns = f["blendshapes"].attrs["columns"].decode().split(",")

When HDF5 is selected, CSV/JSONL files are not written.

Provider Comparison

Feature	ARKit	MediaPipe
Device requirement	TrueDepth camera (Face ID)	Any front camera
Blend shapes	52 (incl. `tongueOut`)	51
Action units	Yes (14 AUs, from blend shapes)	Yes (14 AUs, from blend shapes)
Head pose	Yes (pitch, yaw, roll)	No
Vertices	~1220 (camera space, meters)	478 (normalized 0-1)
Gaze tracking	Yes	No
Light estimation	Yes	No
Distance to screen	Yes	No
Depth maps	Yes	No
Event markers	Yes	Yes
Model download	None needed	Auto (~4 MB, cached)

License

See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Example		Example
Sources/FaceTrackingKit		Sources/FaceTrackingKit
Tests/FaceTrackingKitTests		Tests/FaceTrackingKitTests
ftkit-data		ftkit-data
.gitignore		.gitignore
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FaceTrackingKit

Features

Requirements

Installation

Quick Start

1. Choose a provider

2. Add camera permission

3. Track faces

4. Export session data

Configuration

ARKit

MediaPipe

Vertex capture modes

Image capture modes

Session Management

Pause and Resume

Error Handling

Action Units

Head Pose

Event Markers

Export Formats

CSV (default)

JSON Lines

HDF5

Provider Comparison

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FaceTrackingKit

Features

Requirements

Installation

Quick Start

1. Choose a provider

2. Add camera permission

3. Track faces

4. Export session data

Configuration

ARKit

MediaPipe

Vertex capture modes

Image capture modes

Session Management

Pause and Resume

Error Handling

Action Units

Head Pose

Event Markers

Export Formats

CSV (default)

JSON Lines

HDF5

Provider Comparison

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages