Student Status Detection

An AI project for detecting and logging the status of students in a classroom.

Environment

This project can run on Nvidia GPU or CPU without modifying the code.

cudatookit==12.1
pytorch==2.3.0 
ultralytics==8.2.45

(You can also run it without installing "ultralytics". This project has contained the code we used in ultralytics)

Usage

Simply pass the URL to main.py as a parameter and run it.

python main.py your_rtmp_url_or_file_path
# example-1: python main.py rtmp://172.0.0.1:1935/live/8888
# example-2: python main.py ../test_video.mp4

Solution

Our overall idea is divided into three steps. The first step is to intercept the faces in the video, the second step is to make binary classification judgment on the intercepted faces, and the third step is to accumulate the judgment results in time to judge the state of the students.

Dataset: In the first step of face state detection we used the Wider-Face dataset for face detection. And CAS-PEAL dataset is used in the second step of head pose classification. Dataset structure:

dataset
├─ClassifierData
│  ├─train
│  │  ├─DOWN
│  │  └─UP
│  └─val
│      ├─DOWN
│      └─UP
└─DetectionData
    ├─train
    │  ├─images
    │  └─labels
    └─val
        ├─images
        └─labels

Live-Streaming: First, a server is built using Nginx, and then the video is encoded and encapsulated by ffmpeg and pushed to the server through the RTMP network protocol.

Detection: In step of object detection, we use Yolov9c model for single object detection. And we do not distinguish head pose at this time.
For training detection model, run python trainDetection.py.

Classification: We used the backbone of Yolov8m for classification. During this, we first resized and filled the captured images to 192*192. And in this step, all the faces captured in a picture were formed into a batch for batch inference.
For training classification model, run python trainClassifier.py.

Cluster: To locate the faces, we cluster the detected faces in the first 500 frames and take the cluster center as the face reference position, and number each reference position as the number of the person. When a new frame comes in, we divide the detected faces to the corresponding cluster centers. Such an operation ensures ID invariance and is robust to face occlusion in a few frames. But the disadvantage is that the position of the person in the camera cannot change too much.

Why not use object tracking? Using multi-target tracking can often ensure the invariance of the target ID, but once the target is lost for a long time, such as someone who goes to the restroom and returned to the original seat. The ID of the target detection algorithm will change, resulting in inconsistent state information.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
checkpoints		checkpoints
ultralytics		ultralytics
.gitignore		.gitignore
IdentityLogger.py		IdentityLogger.py
KmeansCluster.py		KmeansCluster.py
README.md		README.md
RtmpStream.py		RtmpStream.py
classifier_yolov8m.yaml		classifier_yolov8m.yaml
detection_data.yaml		detection_data.yaml
detection_yolov9c.yaml		detection_yolov9c.yaml
environment.yaml		environment.yaml
example.gif		example.gif
example.mp4		example.mp4
main.py		main.py
trainClassifier.py		trainClassifier.py
trainDetection.py		trainDetection.py
yolov8n.pt		yolov8n.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Student Status Detection

Environment

Usage

Solution

Example

About

Uh oh!

Releases

Packages

Languages

MTDoven/Student-Detection

Folders and files

Latest commit

History

Repository files navigation

Student Status Detection

Environment

Usage

Solution

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages