-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Hello,
Tks for this fantastic implementation.
I'm wondering if it's possible to use sentences as training units because normally the window is put on the sentence right? If we use documents the last word of a sentence will has a right window of 5 words which shouldn't have been included.
One can argue that it suffices to give the list of lists of sentences as input, however
from svd2vec import svd2vec
documents = ["this is a test right left".split(
), "this is the second test left right".split()]
svd = svd2vec(documents, window=2, min_count=1, size=2)gives
test_svd.py 3 <module>
svd = svd2vec(documents, window=2, min_count=1,size=2)
core.py 146 __init__
self.weighted_count_matrix_file = self.skipgram_weighted_count_matrix()
core.py 234 skipgram_weighted_count_matrix
(self.vocabulary_len, self.vocabulary_len), np.dtype('float16'))
temporary_array.py 17 __init__
matrix = self.load(erase=True)
temporary_array.py 23 load
return np.memmap(self.file_name, shape=self.shape, dtype=self.dtype, mode='w+')
memmap.py 267 __new__
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
ValueError:
cannot mmap an empty file
As one can expect, the error would disappear if one gives a larger list:
from svd2vec import svd2vec
documents = ["this is a test right left".split(
)*100, "this is the second test left right".split()*100]
svd = svd2vec(documents, window=2, min_count=1, size=2)Tks again!
Metadata
Metadata
Assignees
Labels
No labels