-
Notifications
You must be signed in to change notification settings - Fork 48
Open
Description
The generate_embedding.py file's first operations are loading the models and defining the stop words:
word2vec_model = gensim.models.KeyedVectors.load_word2vec_format('./crawl-300d-2M.vec', binary=False)
stop_words = set(stopwords.words('english'))
Which both are very time consuming, it took me almost 20 minutes using google colab.
This situation result that if an error has occurred afterwards the wait was for vain.
For example the run may crash for file does not exists in line 126:
template_df = pd.read_csv(f'./{dataset}/{dataset}.log_templates.csv')
Is it possible to add a mechanism to assure that the required files exists prior to the 'heavy' operations in order to save some time to new-comers?
Simply adding something like
template_file = Path(f"./{dataset}/{dataset}.log_templates.json")
if not template_file.exists:
raise FileNotFoundError("Template file does not exists")
At the top of the file can be very useful.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels