Skip to content

zjwfufu/EVA-Tracker

Repository files navigation

Expressive and Versatile Avatar Tracker

This repository serves as a comprehensive and flexible toolbox for human mesh estimation. It supports multiple input modalities and workflows, including:

  • Estimating proxy meshes from human-centric images and videos.
  • Estimating SMPL-X parameters from colored meshes (experimental).
  • Recovering SMPL-X parameters from a given SMPL-X mesh.

Installation

Setting up the environment from scratch can be somewhat involved due to the dependencies required by this project. For convenience, we provide a pre-built Docker image on Docker Hub:

docker pull zjwfufu/eva-tracker:latest

After launching a container with this image, simply run:

source activate
conda activate eva_track

to enter the pre-configured environment.

Note

A few components may still need to be re-installed:

  • Packages previously installed via pip install -e . should be re-installed inside the activated enviroment.

    # sam2
    pip install -e ./modules/models/image_segmenter/sam2 --no-deps
    
    # osx modified mmpose
    cd ./modules/models/mesh_estimator/osx/transformer_utils
    pip install -e .
    cd ../../../../../
    pip install --upgrade setuptools
    
  • PyTorch3D may need to be rebuilt depending on your GPU architecture and CUDA configuration.

If you prefer to install the environment manually, you can follow the step-by-step instructions below:

  • Clone this repository and create conda environment

    git clone https://github.com/zjwfufu/eva_tracker
    
    conda create -n eva_track python=3.10
    
  • Install PyTorch and PyTorch3D

    pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu118
    
    pip install -U xformers==0.0.26.post1 --index-url https://download.pytorch.org/whl/cu118
    
    pip install "git+https://github.com/facebookresearch/pytorch3d.git"
    
  • Install other requirements

    pip install -r requirements.txt
    
    pip install chumpy==0.70 --no-build-isolation
    
  • Install SAM2

    pip install -e ./modules/models/image_segmenter/sam2 --no-deps
    
  • Install OSX dependencies

    • Install mmcv-full==1.7.0

      Install mmcv-full 1.7.0 compiled with C++17. You can download the source code from this link, which has been patched to compile with C++17 during installation.

      After downloading and extracting the archive, navigate to the source directory and run:

      MMCV_WITH_OPS=1 pip install . -v
      
    • Install modified mmpose

      cd ./modules/models/mesh_estimator/osx/transformer_utils
      pip install -e .
      cd ../../../../../
      pip install --upgrade setuptools
      
    • Install mmengine

      pip install mmengine==0.10.7
      
  • Install TrustNCG optimizer

    pip install "git+https://github.com/vchoutas/torch-trust-ncg.git"
    
  • Fix numpy version

    pip install numpy==1.23.0
    

Weights

We provide a full archive of all the pretrained weights required by this project, available at this link. Please download and extract the archive into ./modules/weights. After extraction, the directory structure as follow:

./modules/weights/
├── face_tracker
│   ├── emica
│   │   ├── EMICA-CVT_flame2020_notexture.pt
│   │   └── ins_scrfd_10g_bnkps.onnx
│   ├── flame
│   │   ├── canonical.obj
│   │   └── FLAME_with_eye.pt
│   ├── mask2former
│   │   ├── config.json
│   │   ├── gitattributes
│   │   ├── model.safetensors
│   │   ├── preprocessor_config.json
│   │   ├── pytorch_model.bin
│   │   └── README.md
│   ├── matting
│   │   └── stylematte_synth.pt
│   └── vgghead
│       ├── lmks_2d.pt
│       └── vgg_heads_l.trcd
├── hand_tracker
│   ├── hamer.ckpt
├── human_template
│   ├── flame
│   │   ├── 2019
│   │   ├── flame_dynamic_embedding.npy
│   │   ├── FLAME_FEMALEl.pkl
│   │   ├── FLAME_MALE.pkl
│   │   ├── FLAME_NEUTRAL.pkl
│   │   ├── flame_static_embedding.pkl
│   │   ├── FLAME_texture.npz
│   │   └── Readme.pdf
│   ├── flame_assets
│   │   ├── flame
│   │   ├── flame_arkit_bs.npy
│   │   ├── pred_expression.json
│   │   ├── shoulder_mesh.obj
│   │   └── teeth_jawopen_offset.npy
│   ├── mano
│   │   ├── MANO_LEFT.pkl
│   │   └── MANO_RIGHT.pkl
│   ├── mano_mean_params.npz
│   ├── pose_estimate
│   │   └── multiHMR_896_L.pt
│   ├── smpl
│   │   ├── SMPL_FEMALE.pkl
│   │   ├── SMPL_MALE.pkl
│   │   └── SMPL_NEUTRAL.pkl
│   ├── smpl_mean_params.npz
│   ├── smplx
│   │   ├── MANO_SMPLX_vertex_ids.pkl
│   │   ├── SMPLX_FEMALE.npz
│   │   ├── SMPL-X__FLAME_vertex_ids.npy
│   │   ├── smplx_flip_correspondences.npz
│   │   ├── SMPLX_MALE.npz
│   │   ├── SMPLX_NEUTRAL.npz
│   │   ├── smplx_npz.zip
│   │   ├── SMPLX_to_J14.pkl
│   │   ├── smplx_uv
│   │   └── version.txt
│   └── smplx_points
├── image_matting
│   └── BiRefNet-general-epoch_244.pth
├── image_segmenter
│   └── sam2.1_hiera_large.pt
├── keypoint_detector
│   └── sapiens_1b_coco_wholebody_best_coco_wholebody_AP_727_torchscript.pt2
└── mesh_estimator
  ├── multiHMR_896_L.pt
  └── osx_l.pth.tar

Usage

Discussions

Estimating human proxy meshes (FLAME, MANO, SMPL-X) from monocular observations is inherently ill-posed and highly sensitive to initialization and priors.

Estimator backbone. Most pipelines rely on a pretrained HMR model to provide an initial SMPL-X estimate. However, many existing HMR models are primarily supervised with 2D signals during training, which introduces depth ambiguities. As a result, the predicted bodies may lean forward/backward, and the relative distances between hands, arms, and torso can become inconsistent, causing misalignment. More advanced models such as SAM-3D-Body show promising potential for alleviating these issues.

Temporal smoothness. When tracking trimmed videos, temporal smoothness must be applied carefully. For tasks that only require per-frame accuracy, a simple second-order smoothness term is often sufficient. For applications like animation or cross reenactment, temporal regularization requires more careful tuning to avoid over-smoothing important dynamics.

Blinking. This project does not explicitly model eyelid poses. Some blink motions represented in FLAME’s expression blendshapes may be suppressed by strong smoothness terms and thus become less noticeable during optimization.

Acknowledgement

This project is built on many amazing open-source projects:

and many research works:

Thanks all the authors for their great work.

Cite

If you find this project useful in your research, please cite with the following BibTex entry:

@misc{zhang2025evatracker,
  title={Expressive and Versatile Avatar Tracker},
  author={Zhang, Jiawei},
  year={2025},
  month={dec},
  url={https://github.com/zjwfufu/EVA_tracker}
}
@article{zhang2025bringingportrait3dpresence,
  title={Bringing Your Portrait to 3D Presence},
  author={Zhang, Jiawei and Chu, Lei and Li, Jiahao and Zang, Zhenyu and Li, Chong and Li, Xiao and Cao, Xun and Zhu, Hao and Lu, Yan},
  journal={arXiv preprint arXiv:2511.22553},
  year={2025}
}

About

A versatile toolbox for human mesh estimation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published