This repository serves as a comprehensive and flexible toolbox for human mesh estimation. It supports multiple input modalities and workflows, including:
- Estimating proxy meshes from human-centric images and videos.
- Estimating SMPL-X parameters from colored meshes (experimental).
- Recovering SMPL-X parameters from a given SMPL-X mesh.
Setting up the environment from scratch can be somewhat involved due to the dependencies required by this project. For convenience, we provide a pre-built Docker image on Docker Hub:
docker pull zjwfufu/eva-tracker:latest
After launching a container with this image, simply run:
source activate
conda activate eva_track
to enter the pre-configured environment.
Note
A few components may still need to be re-installed:
-
Packages previously installed via
pip install -e .should be re-installed inside the activated enviroment.# sam2 pip install -e ./modules/models/image_segmenter/sam2 --no-deps # osx modified mmpose cd ./modules/models/mesh_estimator/osx/transformer_utils pip install -e . cd ../../../../../ pip install --upgrade setuptools -
PyTorch3D may need to be rebuilt depending on your GPU architecture and CUDA configuration.
If you prefer to install the environment manually, you can follow the step-by-step instructions below:
-
Clone this repository and create conda environment
git clone https://github.com/zjwfufu/eva_tracker conda create -n eva_track python=3.10 -
Install PyTorch and PyTorch3D
pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu118 pip install -U xformers==0.0.26.post1 --index-url https://download.pytorch.org/whl/cu118 pip install "git+https://github.com/facebookresearch/pytorch3d.git" -
Install other requirements
pip install -r requirements.txt pip install chumpy==0.70 --no-build-isolation -
Install SAM2
pip install -e ./modules/models/image_segmenter/sam2 --no-deps -
Install OSX dependencies
-
Install mmcv-full==1.7.0
Install mmcv-full 1.7.0 compiled with C++17. You can download the source code from this link, which has been patched to compile with C++17 during installation.
After downloading and extracting the archive, navigate to the source directory and run:
MMCV_WITH_OPS=1 pip install . -v -
Install modified mmpose
cd ./modules/models/mesh_estimator/osx/transformer_utils pip install -e . cd ../../../../../ pip install --upgrade setuptools -
Install mmengine
pip install mmengine==0.10.7
-
-
Install TrustNCG optimizer
pip install "git+https://github.com/vchoutas/torch-trust-ncg.git" -
Fix numpy version
pip install numpy==1.23.0
We provide a full archive of all the pretrained weights required by this project, available at this link. Please download and extract the archive into ./modules/weights. After extraction, the directory structure as follow:
./modules/weights/
├── face_tracker
│ ├── emica
│ │ ├── EMICA-CVT_flame2020_notexture.pt
│ │ └── ins_scrfd_10g_bnkps.onnx
│ ├── flame
│ │ ├── canonical.obj
│ │ └── FLAME_with_eye.pt
│ ├── mask2former
│ │ ├── config.json
│ │ ├── gitattributes
│ │ ├── model.safetensors
│ │ ├── preprocessor_config.json
│ │ ├── pytorch_model.bin
│ │ └── README.md
│ ├── matting
│ │ └── stylematte_synth.pt
│ └── vgghead
│ ├── lmks_2d.pt
│ └── vgg_heads_l.trcd
├── hand_tracker
│ ├── hamer.ckpt
├── human_template
│ ├── flame
│ │ ├── 2019
│ │ ├── flame_dynamic_embedding.npy
│ │ ├── FLAME_FEMALEl.pkl
│ │ ├── FLAME_MALE.pkl
│ │ ├── FLAME_NEUTRAL.pkl
│ │ ├── flame_static_embedding.pkl
│ │ ├── FLAME_texture.npz
│ │ └── Readme.pdf
│ ├── flame_assets
│ │ ├── flame
│ │ ├── flame_arkit_bs.npy
│ │ ├── pred_expression.json
│ │ ├── shoulder_mesh.obj
│ │ └── teeth_jawopen_offset.npy
│ ├── mano
│ │ ├── MANO_LEFT.pkl
│ │ └── MANO_RIGHT.pkl
│ ├── mano_mean_params.npz
│ ├── pose_estimate
│ │ └── multiHMR_896_L.pt
│ ├── smpl
│ │ ├── SMPL_FEMALE.pkl
│ │ ├── SMPL_MALE.pkl
│ │ └── SMPL_NEUTRAL.pkl
│ ├── smpl_mean_params.npz
│ ├── smplx
│ │ ├── MANO_SMPLX_vertex_ids.pkl
│ │ ├── SMPLX_FEMALE.npz
│ │ ├── SMPL-X__FLAME_vertex_ids.npy
│ │ ├── smplx_flip_correspondences.npz
│ │ ├── SMPLX_MALE.npz
│ │ ├── SMPLX_NEUTRAL.npz
│ │ ├── smplx_npz.zip
│ │ ├── SMPLX_to_J14.pkl
│ │ ├── smplx_uv
│ │ └── version.txt
│ └── smplx_points
├── image_matting
│ └── BiRefNet-general-epoch_244.pth
├── image_segmenter
│ └── sam2.1_hiera_large.pt
├── keypoint_detector
│ └── sapiens_1b_coco_wholebody_best_coco_wholebody_AP_727_torchscript.pt2
└── mesh_estimator
├── multiHMR_896_L.pt
└── osx_l.pth.tar
Estimating human proxy meshes (FLAME, MANO, SMPL-X) from monocular observations is inherently ill-posed and highly sensitive to initialization and priors.
Estimator backbone. Most pipelines rely on a pretrained HMR model to provide an initial SMPL-X estimate. However, many existing HMR models are primarily supervised with 2D signals during training, which introduces depth ambiguities. As a result, the predicted bodies may lean forward/backward, and the relative distances between hands, arms, and torso can become inconsistent, causing misalignment. More advanced models such as SAM-3D-Body show promising potential for alleviating these issues.
Temporal smoothness. When tracking trimmed videos, temporal smoothness must be applied carefully. For tasks that only require per-frame accuracy, a simple second-order smoothness term is often sufficient. For applications like animation or cross reenactment, temporal regularization requires more careful tuning to avoid over-smoothing important dynamics.
Blinking. This project does not explicitly model eyelid poses. Some blink motions represented in FLAME’s expression blendshapes may be suppressed by strong smoothness terms and thus become less noticeable during optimization.
This project is built on many amazing open-source projects:
and many research works:
Thanks all the authors for their great work.
If you find this project useful in your research, please cite with the following BibTex entry:
@misc{zhang2025evatracker,
title={Expressive and Versatile Avatar Tracker},
author={Zhang, Jiawei},
year={2025},
month={dec},
url={https://github.com/zjwfufu/EVA_tracker}
}@article{zhang2025bringingportrait3dpresence,
title={Bringing Your Portrait to 3D Presence},
author={Zhang, Jiawei and Chu, Lei and Li, Jiahao and Zang, Zhenyu and Li, Chong and Li, Xiao and Cao, Xun and Zhu, Hao and Lu, Yan},
journal={arXiv preprint arXiv:2511.22553},
year={2025}
}