⚠️ Warning:
The latest version ofPFD-kithas been restructured to integrate with the ASE package, offering enhanced functionality and compatibility with modern workflows. Users are encouraged to transition to this updated version for the best experience. For those who wish to access the older version of PFD-kit, it remains available.
PFD-kit is a cloud-base workflow for generating of deep-learning force fields from pre-trained atomic models (P) through fine-tuning (F) and distillation (D). Compared to training force fields from scratch, PFD requires significantly less training data by leveraging transferable knowledge from large pre-trained model. This reduces computational cost by an order of magnitude, making PFD well-suited for high-throughput calculations.
The workflow is especially useful for complex systems (e.g., high-entropy alloys, surfaces/interfaces) that are traditionally difficult to model. With its cloud-ready design, PFD-kit can be a powerful tool for computational materials science.
For complete tutorials and user guide, please refer to our Online Documentation
PFD-kit is bulit on the dflow package and incorporates components of DPGEN2 workflow. PFD-kit currently supports the Deep Potential and MatterSim (only fine-tuning) models, with additional models (e.g. MACE) planned.
- [PDF-kit: Workflow for finetune and fast distillation]
PFD-kit provides an automated workflow for generating machine-learning force fields from pre-trained atomic models via fine-tuning and knowledge distillation.
The PFD workflow begins with a pre-trained large atomic model (LAM) such as DPA, Mattersim, etc. Fine-tuning adapts the model to specific material domains with far less data than traditional from-scratch training. This greatly reduces computational cost while enabling accurate predictions for materials with highly complex structures and chemistries.
While fine-tuned models are accurate, their large architectures can be inefficient for large-scale simulations. To address this, PFD-kit applies knowledge distillation: a lightweight model is trained on data generated and labeled by the fine-tuned model. The distilled model achieves similar accuracy within the target domain but runs much faster during inference.
PFD-kit automates the entire fine-tuning and distillation process. It includes data-selection algorithms that maximize efficiency using entropy-based measures of atomic environments. Built on dflow, it supports both local and cloud execution.
PFD-kit can be built and installed form the source.
pip install git+https://github.com/ruoyuwang1995nya/pfd-kit.gitPFD jobs can be submittied through command line interface using the submit subcommand:
pfd submit input.jsonPFD-kit supports two major types of workflow, fine-tuning and distillation, and the workflow parameters are defined in the input.json script. Users also need to prepare required input files such as pre-trained model files, material structure files, training scripts, etc. For a complete guide on the PFD-kit usage and explanation of input script, please refer to the online documentation.
If you use PFD-kit in a publication, please cite the following paper:
@misc{wang2025pfdkit,
author = {Ruoyu Wang, Yuxiang Gao, Hongyu Wu and Zhicheng Zhong},
title = {Pre-training, Fine-tuning, and Distillation (PFD): Automatically Generating Machine Learning Force Fields from Universal Models},
year = {2025},
eprint = {arXiv:2502.20809},
archivePrefix= {arXiv},
primaryClass = {cond-mat.mtrl-sci},
url = {https://arxiv.org/abs/2502.20809}
}
