Skip to content

No file existence assurance  #24

@GilPasi

Description

@GilPasi

The generate_embedding.py file's first operations are loading the models and defining the stop words:

word2vec_model = gensim.models.KeyedVectors.load_word2vec_format('./crawl-300d-2M.vec', binary=False)
stop_words = set(stopwords.words('english'))

Which both are very time consuming, it took me almost 20 minutes using google colab.
This situation result that if an error has occurred afterwards the wait was for vain.
For example the run may crash for file does not exists in line 126:

    template_df = pd.read_csv(f'./{dataset}/{dataset}.log_templates.csv')

Is it possible to add a mechanism to assure that the required files exists prior to the 'heavy' operations in order to save some time to new-comers?
Simply adding something like

    template_file = Path(f"./{dataset}/{dataset}.log_templates.json")
    if not template_file.exists:
        raise FileNotFoundError("Template file does not exists")

At the top of the file can be very useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions