Skip to content

doodoo-lsj/Lecture-Recommendation-NLP

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

45 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Lecture-Recommendation-NLP

์—ฐ์„ธ๋Œ€ํ•™๊ต ๊ฐ•์˜๊ณ„ํš์„œ์™€ ์—๋ธŒ๋ฆฌํƒ€์ž„์˜ ๊ฐ•์˜ํ‰์„ ๋ฐ”ํƒ•์œผ๋กœ ๊ฐ•์˜ ์ถ”์ฒœ ์„œ๋น„์Šค ๊ตฌํ˜„

Install

  1. ์„ค์น˜ํ•ด์•ผ ํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

    • Selenium
    • konlpy
    • re
    • pickle
    • keras
    • sklearn
    • gensim
    • torch
    • tqdm
    • SentenceTransformer
  2. ์„ค์น˜ํ•ด์•ผ ํ•˜๋Š” ํ”„๋กœ๊ทธ๋žจ

Crawling

์—ฐ์„ธ๋Œ€ํ•™๊ต ์—๋ธŒ๋ฆฌํƒ€์ž„ ์‚ฌ์ดํŠธ์—์„œ ๊ฐ•์˜ํ‰๊ณผ ๊ฐ•์˜๊ณ„ํš์„œ ๋ฐ์ดํ„ฐ๋ฅผ ํฌ๋กค๋งํ•ด์˜ค๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค. (https://everytime.kr/, ID ๋ฐ ๋น„๋ฐ€๋ฒˆํ˜ธ ํ•„์š”)

crawling_review.ipynb, crawling syllabus.ipynb๋Š” ๊ฐ๊ฐ ์‘์šฉํ†ต๊ณ„ํ•™๊ณผ์˜ ๊ฐ•์˜ํ‰๋“ค๊ณผ ์ƒ๊ฒฝ๋Œ€ํ•™์˜ ๊ฐ•์˜๊ณ„ํš์„œ๋“ค์„ ์˜ˆ์‹œ๋กœ ํ•œ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

Selenium๊ณผ Chromedriver๋ฅผ ์„ค์น˜ํ•ด์•ผ๋งŒ ํฌ๋กค๋ง์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

ํฌ๋กค๋Ÿฌ๊ฐ€ ๋ฐ˜๋ณต๋ฌธ์„ ์ˆ˜ํ–‰ํ•  ๋•Œ ์‹œ๊ฐ„์ด ๋‹ค์†Œ ์˜ค๋ž˜ ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ผ๋ฒจ๋ง ์ž๋™ํ™”

labeling_w2v.ipynb ํŒŒ์ผ์€ ์•„๋ž˜์˜ ๋‚ด์šฉ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

  • ๊ฐœ๋ณ„ ๊ฐ•์˜ํ‰ word2vec ํ•™์Šต
  • ๊ฐ•์˜ ํŠน์ง• ํ•ญ๋ชฉ๋ณ„ ๊ด€๋ จ ๋‹จ์–ด ์ถ”์ถœ: word2vec์—์„œ ์œ ์‚ฌ๋„ ๋†’์€ top n ๋‹จ์–ด ๋‚ด์—์„œ ๋ช…์‚ฌ ์œ„์ฃผ ์„ ์ •
  • ๊ฐ•์˜ํ‰๋ณ„ ๊ด€๋ จ ๋‹จ์–ด ์œ ๋ฌด ํŒŒ์•… ํ›„ ๊ด€๋ จ์„ฑ ํ‘œ์‹œ(Y)
  • ์ตœ์ข… ๋ผ๋ฒจ๋ง: -1(๊ด€๋ จ ์—†์Œ)/ 1(๊ธ์ •์ )/ 0(๋ถ€์ •์ )/ 0.5(์• ๋งค)
    • reviews_label.csv : ์ตœ์ข… ๊ฐœ๋ณ„ ๊ฐ•์˜ํ‰ ๋ผ๋ฒจ ๋ฐ์ดํ„ฐ

1D CNN

1D_CNN.ipynb ํŒŒ์ผ์€ ์•„๋ž˜์˜ ๋‚ด์šฉ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

  • ์ •์ˆ˜ ์ธ์ฝ”๋”ฉ ๊ณผ์ •
  • 1D CNN ๋ชจ๋ธ๋ง
  • model์„ ํ†ตํ•œ ์˜ˆ์ธก

ํ”„๋กœ์ ํŠธ ์ง„ํ–‰ ์‹œ์— ์‚ฌ์šฉํ•œ ์ •์ˆ˜์ธ์ฝ”๋”ฉ๋œ ๋ฐ์ดํ„ฐ, tokenizer pickle, CNN ๋ชจ๋ธ๋“ค์€ 1D CNN ํด๋”์— ๋”ฐ๋กœ ์—…๋กœ๋“œํ•˜์˜€์Šต๋‹ˆ๋‹ค.

KoBERT

๊ฐ•์˜ํ‰ ์˜ˆ์ธก๊ธฐ with KoBERT.ipynb ํŒŒ์ผ์€ ์•„๋ž˜์˜ ๋‚ด์šฉ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

  • KoBERT ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
  • ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋ฐ ํ˜•ํƒœ ๋ณ€ํ™˜
  • ํ† ํฐํ™”(์ˆœ์„œ ์ž„๋ฒ ๋”ฉ ๋“ฑ) ์ž‘์—… ๋ฐ ํ•™์Šต ๋ชจ๋ธ ์„ค๋ฆฝ
  • Train & Test - ๊ฐ•์˜ํ‰ ์˜ˆ์ธก ๋ด‡

๊ฐ•์˜ํ‰ ์ถ”์ฒœ์‹œ์Šคํ…œ with KoBERT.ipynb ํŒŒ์ผ์€ ์œ„ ํŒŒ์ผ๊ณผ ๊ฐ™์€ ๋งฅ๋ฝ์œผ๋กœ ํ˜๋Ÿฌ๊ฐ‘๋‹ˆ๋‹ค.

  • ์ฐจ์ด์  : 5๊ฐœ์˜ ๋ชจ๋ธ์„ ์„ž์–ด์„œ ๊ฒฐ๊ณผ๊ฐ’์„ ๋‚ด์•ผํ•ด ๋‹ค์„ฏ ๋ฒˆ์˜ Train/Test ์ž‘์—…์ด ์ผ์–ด๋‚จ.

Word2Vec/Doc2Vec + NN

Word2Vec/Doc2Vec + NN ํŒŒ์ผ์€ ์•„๋ž˜์˜ ๋‚ด์šฉ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

  • preprocess_tokenize.ipynb : ๊ฐ•์˜ํ‰ ๋ฐ์ดํ„ฐ ํ† ํฐํ™”
    • reviewtokenized.csv ์ƒ์„ฑ
  • preprocess_w2v.ipynb : ํ† ํฐํ™”๋œ ๊ฐ•์˜ํ‰ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ word2vec ๋ชจ๋ธ ์ ์šฉ/๊ฐ๊ฐ์˜ ๊ฐ•์˜ํ‰์— ๋Œ€ํ•ด word2vec ์ ์šฉํ•œ ๊ฒƒ๊ณผ ๋™์ผ ๊ฐ•์˜์— ๋Œ€ํ•œ ๊ฐ•์˜ํ‰ ํ•ฉ์นœ ๊ฒƒ์— word2vec ์ ์šฉํ•œ ๊ฒƒ
    • w2vnormver.csv/w2vaggver.csv
  • preprocess_d2v.ipynb : ํ† ํฐํ™”๋œ ๊ฐ•์˜ํ‰ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ doc2vec ๋ชจ๋ธ ์ ์šฉ
    • d2v.normver.csv
  • NN.ipynb : ๊ฐ๊ฐ์˜ ๋ฒกํ„ฐํ™” ์‹œํ‚จ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Neural Network ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ๊ฐ์˜ ๊ธฐ์ค€์— ๋Œ€ํ•œ ์˜ˆ์ธก๊ฐ’ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ.

SentenceBERT

SentenceBERT.ipynb ํŒŒ์ผ์€ ์•„๋ž˜์˜ ๋‚ด์šฉ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

  • ๊ฐ•์˜๊ณ„ํš์„œ ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
  • ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋ฐ ํ˜•ํƒœ ๋ณ€ํ™˜
  • ํ† ํฐํ™” ๋ฐ Bag of Word
  • CountVectorizer์„ ์ด์šฉํ•œ n-gram ์ถ”์ถœ
  • SBERT๋กœ ๋ฌธ์„œ์™€ ๊ฐ€์žฅ ์œ ์‚ฌํ•œ 5๊ฐœ์˜ ํ‚ค์›Œ๋“œ ์ถ”์ถœ
  • SBERT์— Max Sum Similarity๋ฅผ ์ ์šฉํ•˜์—ฌ ๊ฐ ํ‚ค์›Œ๋“œ๋“ค๊ฐ„์˜ ์œ ์‚ฌ๋„๋Š” ๋‚ฎ์ง€๋งŒ ๋ฌธ์„œ์™€์˜ ์œ ์‚ฌ๋„๋Š” ๋†’์€ 5๊ฐœ์˜ ํ‚ค์›Œ๋“œ ์ถ”์ถœ

TF-IDF

TF-IDF.ipynb ํŒŒ์ผ์€ ์•„๋ž˜์˜ ๋‚ด์šฉ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

  • ๊ฐ•์˜๊ณ„ํš์„œ ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
  • ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋ฐ ํ˜•ํƒœ ๋ณ€ํ™˜
  • ํ† ํฐํ™” ๋ฐ Bag of Word
  • ๊ฐ ๊ฐ•์˜๊ฐœ์š”์— ํฌํ•จ๋˜์–ด์žˆ๋Š” ๋‹จ์–ด๋“ค์˜ TF-IDF ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ƒ์„ฑ
  • ํ‚ค์›Œ๋“œ๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ๊ฐ•์˜๊ฐœ์š”์— ํ•ด๋‹น ํ‚ค์›Œ๋“œ๋ฅผ ํฌํ•จํ•˜๋Š” ๊ฐ•์˜๋“ค์„ ์ถ”์ถœ

Reference

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 100.0%