Skip to content

Fix a bug caused by commas in the embeddings.w2v.txt file#5

Open
yanshengjia wants to merge 1 commit intonusnlp:masterfrom
yanshengjia:master
Open

Fix a bug caused by commas in the embeddings.w2v.txt file#5
yanshengjia wants to merge 1 commit intonusnlp:masterfrom
yanshengjia:master

Conversation

@yanshengjia
Copy link
Copy Markdown

No description provided.

@yanshengjia
Copy link
Copy Markdown
Author

I met this error when running nea with the --emb option.

Traceback (most recent call last): File "train_nea.py", line 162, in <module> model = create_model(args, train_y.mean(axis=0), overal_maxlen, vocab) File "/Users/yanshengjia/Desktop/aes/nea/nea/models.py", line 138, in create_model emb_reader = EmbReader(args.emb_path, emb_dim=args.emb_dim) File "/Users/yanshengjia/Desktop/aes/nea/nea/w2vEmbReader.py", line 39, in __init__ assert len(tokens) == self.emb_dim + 1, 'The number of dimensions does not match the header info' AssertionError: The number of dimensions does not match the header info

I found that this error is caused by commas in En_vectors.txt.

The float numbers in En_vectors.txt are supposed to be separated by space instead of comma.

And the original code can only deal with the situation that float numbers are separated by space.

@izk8
Copy link
Copy Markdown

izk8 commented Nov 4, 2017

Ran into the same issue. Thanks Yangshengjia!

@yanshengjia
Copy link
Copy Markdown
Author

You are welcome!

@jkdufair
Copy link
Copy Markdown

jkdufair commented May 8, 2018

👍 This would help a lot!

@nayna-porwal
Copy link
Copy Markdown

nayna-porwal commented Jul 4, 2020

caught the same problem and have word2vec.bin file not .txt file? any solution !!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants