Skip to content

🐛 Issue: BERT encoder appears unused despite text_encoder_type='bert' #255

@ntg7creation

Description

@ntg7creation

Hi, and thanks for the great work on this project!

I'm currently working with the training code and noticed something potentially inconsistent. While the documentation and flags suggest support for --text_encoder_type bert, it looks like the dataset is still loading GloVe embeddings via this line:

self.w_vectorizer = WordVectorizer(pjoin(opt.cache_dir, 'glove'), 'our_vab')

This occurs in:

data_loaders/humanml/data/dataset.py

This raises a few questions:

Is BERT actually used anywhere in the dataset loading or preprocessing pipeline?

If BERT is supported, where is it being applied?

Is there a separate dataset class or flow for BERT-based encoding?

I’d love clarification so I can ensure the correct embeddings are used during training.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions