Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
data_flags.py	data_flags.py
preprocess.py	preprocess.py
run_preprocess_data.sh	run_preprocess_data.sh
utils.py	utils.py

Name

Last commit message

Last commit date

README.md

data_flags.py

preprocess.py

run_preprocess_data.sh

utils.py

Dataset Preprocessing

Download

URL for downloding SiN and HfO will be available.

Convenient Preprocessing Script

To train models using this framework, users should preprare a database (with .lmdb format).

Using this script as following, users can easily do.

./run_preprocess_data.py $DATA $OUTDATA_TYPE

You should specify DATA and OUTDATA_TYPE.

■ Option 1 (`cloud`)

: save coordinates

./run_preprocess_data.py SiN cloud
./run_preprocess_data.py HfO cloud

■ Option 2 (`graph`)

: save coordinates and edges (which are generated with a cutoff radius of 6.0 and a max number of neighborhood atoms of 50)

./run_preprocess_data.py SiN graph 6.0 50
./run_preprocess_data.py HfO graph 6.0 50

Some models can generate graphs from coordinates of atoms on-the-fly, but some cannot (such as NequIP, Allegro, and MACE).

For latter models, users should generate graphs in advance and save the graphs into .lmdb file.

If a cutoff radius is given, atom cloud data can be converted into graph data and saved with edge indices.

Split Information (train/valid/test)

A dataset is divided into train, validation, and test sets with a ratio of 8:1:1.

The details are described in our benchmark paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Dataset Preprocessing

Download

Convenient Preprocessing Script

■ Option 1 (`cloud`)

■ Option 2 (`graph`)

Split Information (train/valid/test)

FilesExpand file tree

preprocess_data

Directory actions

More options

Directory actions

More options

Latest commit

History

preprocess_data

Folders and files

parent directory

README.md

Dataset Preprocessing

Download

Convenient Preprocessing Script

■ Option 1 (cloud)

■ Option 2 (graph)

Split Information (train/valid/test)

■ Option 1 (`cloud`)

■ Option 2 (`graph`)