sfmoraa / NIS3356 Public

Notifications You must be signed in to change notification settings
Fork 3
Star 3

A course design on information content security, intended to crawl and analyze specified topics on social networks.

3 stars 3 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.idea		.idea
CrawlingStuff		CrawlingStuff
analysis		analysis
cluster		cluster
model		model
output		output
plot		plot
.gitignore		.gitignore
LDA.py		LDA.py
README.md		README.md
T-SNE.py		T-SNE.py
TFIDF.py		TFIDF.py
bert.py		bert.py
final_process.py		final_process.py
wrod2vec.py		wrod2vec.py

Repository files navigation

NIS3356

A course design on information content security, intended to crawl and analyze specified topics on social networks.

文件目录说明

analysis: 数据分析，包括性别比例，时间趋势分析，地域分析
cluster: 聚类分析，主要是Kmeans聚类
CrawlingStuff: 网络爬虫
model: 中文文本处理模型，包括 Bert、 TFIDF 、Word2Vec.
output: 输出结果
plot: 作图函数以及作图结果
LDA.py: 对数据进行LDA分析的样例
T-SNE.py: 对聚类结果进行T-SNE分析的样例
TFIDF.py: 利用TF-IDF模型进行处理文件的样例
bert.py: 利用Bert模型进行处理文件的样例
LDA.py: 利用LDA模型进行处理文件的样例
word2vec.py: 利用Word2Vec模型进行处理文件的样例
final_process.py: 利用BERT和Kmeans的端到端的将文本数据进行分类的样例

CrawlingStuff说明

运行main_crawl.py，可自由设定要搜索的内容/话题，结果存储路径，搜索日期范围。

爬取对象为指定日期范围内微博高级搜索的逐日搜索结果，每条数据含评论文字内容及相对于爬取时间点的评论发布时间。

About

A course design on information content security, intended to crawl and analyze specified topics on social networks.

Report repository

Releases

No releases published

Packages

No packages published

Contributors 4

Languages

Python 100.0%