We plan to make a dictionary of all the words used in the abstracts of the papers in our collection and then get rid of the useless ones (stopwords, etc. ). Then we can turn all the abstract to vectors where each elements of each vector shows the normalized requency of the corresponding word in the correspinding abstract.
sraeisi/arxiv
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|