Five project ideas from the oak-examples repo
that work on our hardware today.
SVA MFA Interaction Design · Spring 2026
Four detectors running on your OAK-D cameras, all controllable via Discord:
| Detector | What It Sees | Pipeline |
|---|---|---|
person_detector.py |
People in frame, count | YOLO v6 (single stage) |
fatigue_detector.py |
Drowsiness, head tilt | YuNet → MediaPipe landmarks |
gaze_detector.py |
Where someone is looking | YuNet → Head pose → Gaze ADAS |
whiteboard_reader.py |
Text on a whiteboard | PaddlePaddle detect + recognize |
What it does
Detects hands in frame, maps 21 keypoints per hand, then classifies the pose into a gesture.
Built-in gestures:
FIST, OK, PEACE, ONE, TWO, THREE, FOUR, FIVE
Example location
On our hardware
Works on RVC2 RVC2
Default ~8 FPS ~8 fps
Discord interaction ideas
!gesture — what gesture is the camera seeing right now?!vote — thumbs up / thumbs down to vote on something!gesture-trigger add PEACE "lights on" — bind a gesture to an actionWilder ideas
What it does
Detects people with YOLO, then estimates 17 body keypoints per person — head, shoulders, elbows, wrists, hips, knees, ankles. A full skeleton.
Alternative: YOLOv8 Pose (single-stage, 17 keypoints in one pass)
Example location
On our hardware
Works on RVC2 RVC2
Default ~5 FPS ~5 fps
Discord interaction ideas
!hand-raised — is anyone raising their hand? (wrist above shoulder)!posture — standing, sitting, or slouching?!activity — classify what the person is doing based on keypoint positionsWilder ideas
What it does
Detects objects with YOLO, then extracts a visual "fingerprint" (embedding) for each one. DeepSORT matches fingerprints across frames so each object keeps its ID — even if it leaves and returns.
Example location
On our hardware
Works on RVC2 RVC2
Default ~5 FPS ~5 fps
Discord interaction ideas
!track — list everyone currently tracked with their ID!who-left — report IDs that disappeared in the last N minutes!dwell-time — how long has person #3 been in frame?Wilder ideas
What it does
Detects faces or bodies, computes a unique embedding, then compares it against previously seen embeddings using cosine similarity. Recognizes the same person appearing again — even after leaving and coming back.
Two modes:
Example location
On our hardware
Works on RVC2 RVC2
Default ~2 FPS ~2 fps
Discord interaction ideas
!attendance — who has the camera seen today?!register "Alex" — name the current face for future recognition!seen "Alex" — when was Alex last spotted?Wilder ideas
What it does
Instead of drawing a bounding box, segmentation classifies every pixel — is it a person, or is it background? You get a precise silhouette, not a rectangle.
Two examples available:
Example locations
On our hardware
Works on RVC2 RVC2
Blur: ~4 FPS ~4 fps
Depth crop: ~10 FPS ~10 fps
Discord interaction ideas
!silhouette — screenshot showing only person outlines!privacy-mode on — switch from full frame to silhouette-only capture!background — extract and share just the background (people removed)!depth-mask — combine segmentation + depth to isolate by distanceWilder ideas
Each idea is useful alone. Together, they start to describe a room that understands what's happening inside it.
Tracking + Pose
Person #3 raised their hand 12 seconds ago and is still waiting.
Re-id + Fatigue
Alex looks tired today. Send a private DM instead of a public alert.
Gesture + Segmentation
Privacy-safe voting: count raised fists from silhouettes, no faces stored.
Our OAK-D cameras use the RVC2 chip (Myriad X). It works — but it's not fast. Here's what to expect:
| Example | FPS on RVC2 | Good For |
|---|---|---|
| Hand gestures | ~8 fps | Interactive commands, voting |
| Human pose | ~5 fps | Posture checks, hand-raise detection |
| DeepSORT tracking | ~5 fps | Arrivals/departures, dwell time |
| Re-identification | ~2 fps | Attendance, periodic check-ins |
| Segmentation (blur) | ~4 fps | Privacy screenshots, silhouettes |
| Segmentation (depth crop) | ~10 fps | Distance-based isolation |
The OAK 4 line uses the RVC4 platform (Qualcomm QCS8550) — 52 TOPS vs. ~1.4 TOPS on our current Myriad X. Every pipeline above would hit 30 FPS.
| Model | Price | Depth |
|---|---|---|
| OAK 4 S | ~$749 | No |
| OAK 4 D | ~$849 | Yes |
| OAK 4 D Pro | ~$949 | Yes + laser |
8 GB RAM, 128 GB storage, 48 MP RGB. USB + PoE built into every unit.
What it unlocks
Alternatively: offload to a GPU. Stream frames from the current OAK-D to your PC or a cloud GPU (RunPod) and run inference there. Same speed boost, no new hardware.
All five examples live in the oak-examples repo and follow the same pattern:
To make it a Smart Objects detector, follow the existing pattern:
fatigue_detector.py)--discord / --log / --display flags!commands to discord_bot.py| Idea | Path in oak-examples | Models |
|---|---|---|
| Hand gestures | neural-networks/pose-estimation/hand-pose/ |
MediaPipe Palm + Hand Landmarker |
| Human pose | neural-networks/pose-estimation/human-pose/ |
YOLOv6 + Lite-HRNet |
| Object tracking | neural-networks/object-tracking/deepsort-tracking/ |
YOLOv6 + OSNet + DeepSORT |
| Re-identification | neural-networks/reidentification/human-reidentification/ |
SCRFD/YuNet + OSNet/ArcFace |
| Segmentation | neural-networks/segmentation/blur-background/ |
DeepLab V3+ |
| Depth crop | neural-networks/segmentation/depth-crop/ |
DeepLab V3+ + StereoDepth |
The camera already sees. Your job is to decide what it should say.
Start from an existing example. Write a status file.
Add a Discord command. Make the camera conversational.