Skip to content

Solving Semantic Textual Similarity task for KLUE Benchmark dataset within 12 days

Notifications You must be signed in to change notification settings

honeybeat1/klue-sts

Repository files navigation

klue-sts

Solving Semantic Textual Similarity task for KLUE Benchmark dataset within 12 days

NLU ๋ฌธ์žฅ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ (STS)

3,4์ฃผ์ฐจ ๊ธฐ์—…๊ณผ์ œ

๊ณผ์ œ ๋ชฉํ‘œ

  • ํ•œ๊ตญ์–ด ๋ฌธ์žฅ์˜ ์œ ์‚ฌ๋„ ๋ถ„์„ ๋ชจ๋ธ ํ›ˆ๋ จ ๋ฐ ์„œ๋น„์Šคํ™”
  • semantic textual similarity (์˜๋ฏธ์  ํ…์ŠคํŠธ ์œ ์‚ฌ๋„)
  • input) 2๊ฐœ์˜ ํ•œ๊ตญ์–ด ๋ฌธ์žฅ
  • output) ์˜๋ฏธ์  ์œ ์‚ฌ๋„ ์ ์ˆ˜ ์ถœ๋ ฅ

ํ•™์Šต๋ฐ์ดํ„ฐ์…‹

  • KLUE-STS
    • AIRBNB - ๋ฆฌ๋ทฐ
    • policy - ๋‰ด์Šค
    • parakQC - ์Šค๋งˆํŠธํ™ˆ ์ฟผ๋ฆฌ

์ฃผ์˜์‚ฌํ•ญ

  • Train set๋งŒ ์‚ฌ์šฉํ•˜์—ฌ train/val๋กœ ๋‚˜๋ˆ„์–ด์„œ ํ›ˆ๋ จ์‹œํ‚ค๊ธฐ
  • ๊ณต๊ฐœ๋œ pretrained ๋ชจ๋ธ ์‚ฌ์šฉ ๊ฐ€๋Šฅ (์ถœ์ฒ˜ ๋ช…์‹œ)

์ œ์ถœ์‚ฌํ•ญ

  • ํ•™์Šต๋œ ๋ชจ๋ธ (๋ชจ๋ธ ์ž์œ  ์„ ํƒ) Pretrained KLUE RoBERTa Base
  • ํ•™์Šต ๋ฐฉ์‹ ๋ณด๊ณ ์„œ
    • ์–ด๋–ค ๋ชจ๋ธ์„ ์„ ํƒํ–ˆ๋‚˜
    • ์–ด๋–ค ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํŠœ๋‹ํ–ˆ๋‚˜
    • ์–ด๋–ค ํ›ˆ๋ จ ๊ณผ์ •์„ ๊ฑฐ์ณค๋Š”๊ฐ€
  • dev set score(dev set์˜ ๋ชจ๋“  ๋ฌธ์žฅ์„ pair์— ๋Œ€ํ•œ ์œ ์‚ฌ๋„ ์ถ”๋ก  ๊ฒฐ๊ณผ์™€ F1 ์ ์ˆ˜)
  • REST API๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ ๋‘ ๋ฌธ์žฅ์˜ ์œ ์‚ฌ๋„๋ฅผ ๋ถ„์„ํ•˜๋Š” Server Code

Process

  1. Data EDA (exploratory data analysis)
  2. Data Preprocessing
    • Cleansing
    • khaiii ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ ์‚ฌ์šฉ
  3. Data Augmentation
    • EDA (easy data augmentation)
  4. Pretrained Model ์„ ์ •, ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
    • klue-RoBERTa-base
  5. Fine-Tuning
    • transformers.Trainer ํด๋ž˜์Šค๋ฅผ ์ด์šฉํ•œ ํ›ˆ๋ จ
  6. Hyperparameter Search
    • Optuna
  7. Evaluation Metric
    • Pearsonโ€™s r score, f1 score
  8. Serving
    • FastAPI

Data EDA & Preprocess

EDA - ํƒ์ƒ‰

KLUE STS ๋ฐ์ดํ„ฐ๋Š” STS task๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋งŒ๋“ค์–ด์ง„ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์…‹์ด๋ฉฐ, AIRBNB(๊ตฌ์–ด์ฒด ๋ฆฌ๋ทฐ), Policy(๊ฒฉ์‹์ฒด ๋‰ด์Šค), ParaKQC(๊ตฌ์–ด์ฒด ์Šค๋งˆํŠธํ™ˆ ์ฟผ๋ฆฌ)์˜ ์„ธ ๊ฐ€์ง€ ๋„๋ฉ”์ธ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ „์ฒด ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๋Š” ์ด 13,224๊ฐœ๋กœ, Train ๋ฐ์ดํ„ฐ 11,668๊ฐœ, Dev ๋ฐ์ดํ„ฐ 519๊ฐœ, Test ๋ฐ์ดํ„ฐ 1,037๊ฐœ์ธ ์•ฝ 20:1:2์˜ ๋น„์œจ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ๋ถ„์„์—์„œ๋Š” Train ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ Train๊ณผ Dev๋กœ Dev ๋ฐ์ดํ„ฐ๋ฅผ Test๋กœ ๊ฐ„์ฃผํ•˜์—ฌ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. Train ๋ฐ์ดํ„ฐ ๋‚ด์˜ Dev์˜ ๋น„์œจ์ด 9:1์ด ๋˜๋„๋ก 11,668๊ฐœ๋ฅผ 10494, 1167๊ฐœ๋กœ ๋ถ„๋ฆฌํ•˜์˜€์Šต๋‹ˆ๋‹ค.(๋น„์œจ ์ฐธ๊ณ  ๋ ˆํผ๋Ÿฐ์Šค) ๊ทธ๋ž˜์„œ Train:Dev:Test์˜ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๋ฅผ 10494:1167:519 ๊ฐœ๋กœ ์žฌ๊ตฌ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹์€ guid, source, sentence1, sentence2, labels, annotations ์ด 6๊ฐœ์˜ ์นผ๋Ÿผ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ label์€ real-label, label, binary-label 3๊ฐ€์ง€ ๊ฐ’์„ ๊ฐ–๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. real-label์€ 0์—์„œ 5๊นŒ์ง€ ๋ฒ”์œ„์—์„œ ๋‘ ๋ฌธ์žฅ์˜ ์œ ์‚ฌ๋„๋ฅผ ์‹ค์ˆ˜ํ˜•์œผ๋กœ ํ‘œํ˜„ํ•œ ๋ผ๋ฒจ์ด๋ฉฐ, label์€ real-label์„ ์†Œ์ˆซ์  ๋‘˜์งธ ์ž๋ฆฌ์—์„œ ๋ฐ˜์˜ฌ๋ฆผ ํ•œ ๊ฐ’, ๊ทธ๋ฆฌ๊ณ  binary-label์€ threshold 3์„ ๊ธฐ์ค€์œผ๋กœ ์ดํ•˜๋ฉด 0, ์ด์ƒ์ด๋ฉด 1๋กœ ํ‘œํ˜„ํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค.

binary-label์€ Well-balanced ๋˜์–ด ์žˆ๋‹คbinary-label์€ Well-balanced ๋˜์–ด ์žˆ๋‹ค

์ •์ˆ˜ํ˜• label์˜ ๊ฒฝ์šฐ 0์— ๊ฐ€๊นŒ์šด ๊ฐ’์ด ๊ฐ€์žฅ ๋งŽ๋‹ค์ •์ˆ˜ํ˜• label์˜ ๊ฒฝ์šฐ 0์— ๊ฐ€๊นŒ์šด ๊ฐ’์ด ๊ฐ€์žฅ ๋งŽ๋‹ค

KLUE STS Task์—์„œ f1-score metric์˜ ํƒ€๊ฒŸ ๊ฐ’์œผ๋กœ ์‚ฌ์šฉํ•œ Binary label์€ 0.55 : 0.45์˜ ๋น„์œจ๋กœ ๋‹ค์†Œ balanced ๋˜์–ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹์ด์—ˆ์Šต๋‹ˆ๋‹ค. Train์…‹์—์„œ ์ค‘๋ณต ํ–‰์„ ๋ฐœ๊ฒฌํ•˜์—ฌ ํ•˜๋‚˜์”ฉ๋งŒ ๋‚จ๊ธฐ๊ณ  ์ œ๊ฑฐํ•ด ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. โ€˜Sentence1โ€™๊ณผ โ€˜Sentence2โ€™ ๋‚ด์šฉ์ด ๊ฐ™์œผ๋ฉด์„œ labels ๊นŒ์ง€ ๋ชจ๋“  ๊ฐ’์ด ๊ฐ™์€ ์ค‘๋ณต ๋ฐ์ดํ„ฐ๊ฐ€ 5๊ฐœ๊ฐ€ ์กด์žฌํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ์ธก์น˜๋Š” ์กด์žฌํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

Preprocess

KLUE STS ๋ฐ์ดํ„ฐ์…‹์—๋Š” ์ผ๋ฐ˜์ธ์ด ์ง์ ‘ ์“ด AirBnB(๋ฆฌ๋ทฐ)๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋งž์ถค๋ฒ•์ด๋‚˜ ๋„์–ด์“ฐ๊ธฐ ๊ต์ •์ด ์œ ์˜๋ฏธํ•  ๊ฒƒ์ด๋‹ค ํŒ๋‹จํ•˜์—ฌ ์ด์— ๋Œ€ํ•œ ์ „์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์˜คํžˆ๋ ค ์˜๋ฏธ๋ฅผ ํ•ด์น˜๊ฑฐ๋‚˜, ํ‰๊ท  0.56๊ฐœ์˜ ๋งž์ถค๋ฒ• ์—๋Ÿฌ๋ฅผ ๊ณ ์นจ์œผ๋กœ์จ ์‹œ๊ฐ„ ๋Œ€๋น„ ์„ฑ๋Šฅ์ด ๋‚˜์˜ค์ง€ ์•Š์•„ ์ตœ์ข… ๋ฐ์ดํ„ฐ์…‹์—” ๋ฐ˜์˜ํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ํ•œ๊ตญ์–ด์˜ ์˜๋ฏธ์  ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ํ•œ๊ตญ์–ด๋ฅผ ์ œ์™ธํ•œ ์˜์–ด, ํ•œ์ž, ์ผ๋ณธ์–ด, ํŠน์ˆ˜๋ฌธ์ž ๋“ฑ ์˜๋ฏธ๋ฅผ ํ•ด์น  ์ˆ˜ ์žˆ๋Š” ๋ฌธ์ž๋“ค์„ ์ œ๊ฑฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋˜ ์ด๋ฒˆ ํ”„๋กœ์ ํŠธ์—์„œ ํ˜•ํƒœ์†Œ ๋ถ„์„์— ๊ฐ€์žฅ ๋งŽ์€ ์‹œ๊ฐ„์„ ํˆฌ์žํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ฌธ์žฅ์˜ ์˜๋ฏธ์  ์œ ์‚ฌ๋„๋ฅผ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด์„  ์ค‘์š”ํ•œ ์˜๋ฏธ๋ฅผ ๋‹ด๊ณ  ์žˆ๋Š” ๋ช…์‚ฌ, ๋™์‚ฌ ์œ„์ฃผ์˜ ํ˜•ํƒœ์†Œ ๋ถ„๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค๊ณ  ํŒ๋‹จํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ข‹์€ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋ฅผ ํ™œ์šฉํ•œ๋‹ค๋ฉด ๋งŽ์€ ์„ฑ๋Šฅ์„ ๋Œ์–ด์˜ฌ๋ฆด ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ „์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•œ ๋‚ด์šฉ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  1. pyKoSpace๋ฅผ ์ด์šฉํ•ด ๋„์–ด์“ฐ๊ธฐ์— ๋Œ€ํ•œ ์ „์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€์ง€๋งŒ ์˜คํžˆ๋ ค ์˜๋ฏธ๋ฅผ ํ•ด์น˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒํ•˜์—ฌ ์ ์šฉํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

  2. py-hanspell์„ ์ด์šฉํ•ด ๋งž์ถค๋ฒ•์„ ์ˆ˜์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

    1. ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์— ์ ์šฉํ•œ ํ›„ ๊ณ ์นœ ๋ถ€๋ถ„์ด ์žˆ์„์‹œ error๋กœ ํ‘œ์‹œํ•˜์˜€๋Š”๋ฐ, ํ‰๊ท  0.56๊ฐœ์˜ ๋งž์ถค๋ฒ• ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•˜์˜€๊ณ , ํ‰๊ท ์ ์œผ๋กœ ์—๋Ÿฌ๊ฐ€ 1๊ฐœ๋„ ์žˆ์ง€ ์•Š๋‹ค๋Š” ๋œป์ด๊ธฐ ๋•Œ๋ฌธ์— ์ ์šฉํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.
  3. ๊ฐ์ข… ํŠน์ˆ˜ ๋ฌธ์ž ๋ฐ ์˜์–ด, ์ผ๋ณธ์–ด, ํ•œ์ž๋ฅผ ์ œ๊ฑฐํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  4. ์นด์นด์˜ค์˜ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ์ธ โ€˜Khaiiiโ€™๋ฅผ ์ด์šฉํ•˜์—ฌ ํ˜•ํƒœ์†Œ๋ฅผ ๋ถ„๋ฆฌํ•˜์˜€์Šต๋‹ˆ๋‹ค.

    1. ๋ถˆ์šฉ์–ด๋กœ ์ ‘์†์‚ฌ, ์กฐ์‚ฌ ๋“ฑ์˜ ๋ฌธ์žฅ์˜ ์˜๋ฏธ์— ํฐ ์˜ํ–ฅ์„ ๋ผ์น˜์ง€ ์•Š๋Š”๋‹ค ํŒ๋‹จ๋˜๋Š” ๋ถ€๋ถ„์„ ์ œ๊ฑฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ’ˆ์‚ฌ๋ฆฌ์ŠคํŠธ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ๊ฐ์ข… ์กฐ์‚ฌ์™€ ์–ด๋ฏธ๋ฅผ ์ œ๊ฑฐํ–ˆ์Šต๋‹ˆ๋‹ค.

    Untitled

    1. ํŒŒ์ด์ฌ ํ•œ๊ตญ์–ด ํŒจํ‚ค์ง€ konlpy์˜ ์ฝ”๋ชจ๋ž€(komoran), ๊ผฌ๊ผฌ๋งˆ(Kkma), Okt(open korean text)๋‚˜ ํ•œ๋‚˜๋ˆ”, mecab ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ ์™ธ์— ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ khaiii๋ฅผ ์„ ํƒํ•œ ์ด์œ ๋Š” ์‹คํ—˜์„ ํ†ตํ•œ ์„ฑ๋Šฅ ๋น„๊ต ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ํ•ด๋‹น ํฌ์ŠคํŠธ๋ฅผ ์ฐธ๊ณ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

      1. ๋ถ„์„ ์‹œ๊ฐ„

      ๋ฌธ์žฅ์˜ ๊ฐฏ์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚ ์ˆ˜๋ก, ๊ผฌ๊ผฌ๋งˆ ๋ถ„์„๊ธฐ๊ฐ€ ๋ถ„์„ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๊ณ  ๊ผฌ๊ผฌ๋งˆ๋ฅผ ์ œ์™ธํ–ˆ์„๋•Œ mecab ๋ถ„์„๊ธฐ๊ฐ€ ๊ฐ€์žฅ ๋น ๋ฅด๊ฒŒ ํ˜•ํƒœ์†Œ ๋ถ„์„์„ ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ๋‹ค์Œ์œผ๋กœ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๋ถ„์„๊ธฐ์ธ khaiii๊ฐ€ ๋น ๋ฅธ ๊ฒƒ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ•œ ๋ฌธ์žฅ์„ ๋ถ„์„ํ•˜๋Š” ์†๋„์˜ ๊ฒฝ์šฐ, Komoran์„ ์ œ์™ธํ•œ ๋‹ค๋ฅธ ๋ถ„์„๊ธฐ๋“ค ๋ชจ๋‘ 0.0016์ดˆ์—์„œ 0.0001์ดˆ ์‚ฌ์ด๋กœ ๋น ๋ฅธ ๊ฒƒ์„ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

      1. ์„ฑ๋Šฅ

        KLUE STS ๋ฐ์ดํ„ฐ์…‹์˜ ๋ฌธ์žฅ๋“ค์€ ๋Œ€๋ถ€๋ถ„ ๋งž์ถค๋ฒ•๊ณผ ๋„์–ด์“ฐ๊ธฐ๊ฐ€ ์ œ๋Œ€๋กœ ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹์˜ ๋ฌธ์žฅ ์ผ๋ถ€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ์˜ ์„ฑ๋Šฅ์„ ํ…Œ์ŠคํŠธํ•˜์˜€์Šต๋‹ˆ๋‹ค.

        ๋ถ„์„๊ธฐ ๋ฌธ์žฅ1 '๊ทธ๋ƒฅ ๋ชจ๋“ ๊ฒŒ ๋‹ค ์™„๋ฒฝํ–ˆ๋˜ ์—์–ด๋น„์—”๋น„ ์˜€์–ด์š”โ€™ ๋ฌธ์žฅ2 '๋„ˆ๊ฐ€ ์ƒ๊ฐํ•˜๊ธด ๊ฑฐ์‹ค์„ ๊ฐ€์žฅ ํšจ๊ณผ์ ์œผ๋กœ ์ฒญ์†Œํ•˜๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ๋  ๊ฒƒ ๊ฐ™์•„?โ€™ ์ œ๊ฑฐ ๋ฌธ์žฅ
        Okt [('๊ทธ๋ƒฅ', 'Noun'), ('๋ชจ๋“ ', 'Noun'), ('๊ฒŒ', 'Josa'), ('๋‹ค', 'Adverb'), ('์™„๋ฒฝํ–ˆ๋˜', 'Adjective'), ('์—์–ด', 'Noun'), ('๋น„์—”๋น„', 'Noun'), ('์˜€์–ด์š”', 'Verb')] [('๋„ˆ', 'Noun'), ('๊ฐ€', 'Josa'), ('์ƒ๊ฐ', 'Noun'), ('ํ•˜๊ธด', 'Verb'), ('๊ฑฐ์‹ค', 'Noun'), ('์„', 'Josa'), ('๊ฐ€์žฅ', 'Noun'), ('ํšจ๊ณผ', 'Noun'), ('์ ', 'Suffix'), ('์œผ๋กœ', 'Josa'), ('์ฒญ์†Œ', 'Noun'), ('ํ•˜๋ ค๋ฉด', 'Verb'), ('์–ด๋–ป๊ฒŒ', 'Adjective'), ('ํ•ด์•ผ', 'Verb'), ('๋ ', 'Verb'), ('๊ฒƒ', 'Noun'), ('๊ฐ™์•„', 'Adjective'), ('?', 'Punctuation')] 1. ๊ทธ๋ƒฅ ๋ชจ๋“  ๋‹ค ์™„๋ฒฝํ–ˆ๋˜ ์—์–ด ๋น„์—”๋น„ ์˜€์–ด์š” 2. ๋„ˆ ์ƒ๊ฐ ํ•˜๊ธด ๊ฑฐ์‹ค ๊ฐ€์žฅ ํšจ๊ณผ ์  ์ฒญ์†Œ ํ•˜๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ๋  ๊ฒƒ ๊ฐ™์•„ ?
        kkma [('๊ทธ๋ƒฅ', 'MAG'), ('๋ชจ๋“ ', 'MDT'), ('๊ฒƒ', 'NNB'), ('์ด', 'JKS'), ('๋‹ค', 'MAG'), ('์™„๋ฒฝ', 'NNG'), ('ํ•˜', 'XSV'), ('์—ˆ', 'EPT'), ('๋”', 'EPT'), ('ใ„ด', 'ETD'), ('์—์–ด', 'NNG'), ('๋น„', 'NNG'), ('์—', 'JKM'), ('๋Š”', 'JX'), ('๋น„', 'NNG'), ('์ด', 'VCP'), ('์—ˆ', 'EPT'), ('์–ด์š”', 'EFN')] [('๋„ˆ', 'NP'), ('๊ฐ€', 'JKS'), ('์ƒ๊ฐ', 'NNG'), ('ํ•˜', 'XSV'), ('๊ธฐ', 'ETN'), ('๋Š”', 'JKS'), ('๊ฑฐ์‹ค', 'NNG'), ('์„', 'JKO'), ('๊ฐ€์žฅ', 'MAG'), ('ํšจ๊ณผ์ ', 'NNG'), ('์œผ๋กœ', 'JKM'), ('์ฒญ์†Œ', 'NNG'), ('ํ•˜', 'XSV'), ('๋ ค๋ฉด', 'ECE'), ('์–ด๋–ป', 'VA'), ('๊ฒŒ', 'ECD'), ('ํ•˜', 'VV'), ('์–ด์•ผ', 'ECD'), ('๋˜', 'VV'), ('ใ„น', 'ETD'), ('๊ฒƒ', 'NNB'), ('๊ฐ™', 'VA'), ('์•„', 'ECD'), ('?', 'SF')] 1. ๊ทธ๋ƒฅ ๋ชจ๋“  ๋‹ค ์™„๋ฒฝ ์—ˆ ๋” ใ„ด ์—์–ด ๋น„ ์— ๋น„ ์ด ์—ˆ ์–ด์š” 2. ๋„ˆ ์ƒ๊ฐ ๊ฑฐ์‹ค ๊ฐ€์žฅ ํšจ๊ณผ์  ์œผ๋กœ ์ฒญ์†Œ ๋ ค๋ฉด ์–ด๋–ป ๊ฒŒ ํ•˜ ์–ด์•ผ ๋˜ ใ„น ๊ฐ™ ์•„ ?
        khaiii [('๊ทธ๋ƒฅ', 'MAG'), ('๋ชจ', 'VA'), ('๋“ ', 'MM'), ('๊ฒŒ', 'JKB'), ('๋‹ค', 'MAG'), ('์™„๋ฒฝ', 'NNG'), ('ํ•˜', 'XSA'), ('์˜€', 'EP'), ('๋˜', 'ETM'), ('์—์–ด๋น„์—”๋น„', 'NNG'), ('์ด', 'VCP'), ('์—ˆ', 'EP'), ('์–ด์š”', 'EC')] [('๋„ˆ', 'NP'), ('๊ฐ€', 'JKS'), ('์ƒ๊ฐ', 'NNG'), ('ํ•˜', 'XSV'), ('๊ธฐ', 'ETN'), ('ใ„ด', 'JX'), ('๊ฑฐ์‹ค', 'NNG'), ('์„', 'JKO'), ('๊ฐ€์žฅ', 'MAG'), ('ํšจ๊ณผ', 'NNG'), ('์ ', 'XSN'), ('์œผ๋กœ', 'JKB'), ('์ฒญ์†Œ', 'NNG'), ('ํ•˜', 'XSV'), ('๋ ค๋ฉด', 'EC'), ('์–ด๋–ป', 'VA'), ('๊ฒŒ', 'EC'), ('ํ•˜', 'VV'), ('์—ฌ์•ผ', 'EC'), ('๋˜', 'XSV'), ('ใ„น', 'ETM'), ('๊ฒƒ', 'NNB'), ('๊ฐ™', 'VA'), ('์•„', 'EF'), ('?', 'SF')] 1. ๊ทธ๋ƒฅ ๋ชจ ๋“  ๋‹ค ์™„๋ฒฝ ์—์–ด๋น„์—”๋น„ ์ด 2. ๋„ˆ ์ƒ๊ฐ ๊ฑฐ์‹ค ๊ฐ€์žฅ ํšจ๊ณผ ์ฒญ์†Œ ์–ด๋–ป ํ•˜ ๊ฐ™ ?
        mecab [('๊ทธ๋ƒฅ', 'MAG'), ('๋ชจ๋“ ', 'MM'), ('๊ฒŒ', 'NNB+JKS'), ('๋‹ค', 'MAG'), ('์™„๋ฒฝ', 'NNG'), ('ํ–ˆ', 'XSA+EP'), ('๋˜', 'ETM'), ('์—์–ด', 'NNG'), ('๋น„', 'XPN'), ('์—”๋น„', 'NNG'), ('์˜€', 'VCP+EP'), ('์–ด์š”', 'EF')] [('๋„ˆ', 'NP'), ('๊ฐ€', 'JKS'), ('์ƒ๊ฐ', 'NNG'), ('ํ•˜', 'XSV'), ('๊ธด', 'ETN+JX'), ('๊ฑฐ์‹ค', 'NNG'), ('์„', 'JKO'), ('๊ฐ€์žฅ', 'MAG'), ('ํšจ๊ณผ', 'NNG'), ('์ ', 'XSN'), ('์œผ๋กœ', 'JKB'), ('์ฒญ์†Œ', 'NNG'), ('ํ•˜', 'XSV'), ('๋ ค๋ฉด', 'EC'), ('์–ด๋–ป๊ฒŒ', 'MAG'), ('ํ•ด์•ผ', 'VV+EC'), ('๋ ', 'VV+ETM'), ('๊ฒƒ', 'NNB'), ('๊ฐ™', 'VA'), ('์•„', 'EF'), ('?', 'SF')] 1. ๊ทธ๋ƒฅ ๋ชจ๋“  ๊ฒŒ ๋‹ค ์™„๋ฒฝ ํ–ˆ ์—์–ด ๋น„ ์—”๋น„ ์˜€ 2.๋„ˆ ์ƒ๊ฐ ๊ธด ๊ฑฐ์‹ค ๊ฐ€์žฅ ํšจ๊ณผ ์ฒญ์†Œ ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ๋  ๊ฐ™ ?

        ๋„๋ฉ”์ธ์ด ๋‹ค๋ฅธ ๋‘ ๋ฌธ์žฅ์— ๋Œ€ํ•ด ํ˜•ํƒœ์†Œ ๋ถ„์„ํ•œ ๊ฒฐ๊ณผ ์ข€ ๋” ์„ฑ๋Šฅ์ด ์ข‹๋‹ค๊ณ  ๋А๋ผ๋Š” ๋ถ„์„๊ธฐ๋Š” ๋ช…์‚ฌํ˜•์„ ์ œ๋Œ€๋กœ ์ธ์‹ํ•˜๊ณ , ๋” ์ถ•์•ฝ๋˜๋Š” khaiii์˜€์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ์—” ์‚ฌ์šฉํ•˜๊ธฐ ์ข€ ๋” ์šฉ์ดํ•œ khaiii๋ฅผ ์‚ฌ์šฉํ–ˆ์ง€๋งŒ, ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋กœ ์ข€ ๋” ํ™œ๋ฐœํ•˜๊ฒŒ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋Š” mecab์„ ๋‹ค์Œ ํ”„๋กœ์ ํŠธ๋•Œ๋Š” ์‚ฌ์šฉํ•ด๋ณด๊ณ ์ž ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ์—” khaiii ๋ถ„์„๊ธฐ๋ฅผ ์„ ํƒํ•˜์—ฌ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ํ˜•ํƒœ์†Œ ๋ถ„๋ฆฌ๋ฅผ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

KLUE STS ๋ฐ์ดํ„ฐ ์…‹์— ๋Œ€ํ•ด ์ด 4๋‹จ๊ณ„์˜ ์ „์ฒ˜๋ฆฌ๋ฅผ ์ ์šฉํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ๊ฐ€์ง€๊ณ  ํ›ˆ๋ จ์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ฐ ๋‹จ๊ณ„๋ณ„๋กœ ๋ชจ๋ธ์„ ๊ตฌ๋ถ„ํ•˜์—ฌ Wandb๋กœ ์„ฑ๋Šฅ์„ ๋ชจ๋‹ˆํ„ฐ๋ง ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ๊ฒฐ๊ณผ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ์ด Batch_size 128๊ฐœ ๋ชจ๋ธ๋กœ khaiii ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋ฅผ ๊ฑฐ์นœ ๋ฐ์ดํ„ฐ์…‹์ด ๊ฐ€์žฅ ๋†’์€ validation score๋ฅผ ๊ธฐ๋กํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ „์ฒ˜๋ฆฌ ํ›„ ๋ฐ์ดํ„ฐ์˜ ๊ฐฏ์ˆ˜ : train, val (10494, 1167)

Data Augmentation

KLUE-STS ๋ฐ์ดํ„ฐ์…‹์€ 1๋งŒ์—ฌ๊ฐœ์˜ ํ•œ๊ตญ์–ด ๋ฌธ์žฅ ์Œ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋‹ค๋ฅธ STS Task ํ”„๋กœ์ ํŠธ์—์„œ ํ›ˆ๋ จํ•˜๋Š” ๋ฐ์ดํ„ฐ์…‹์˜ ํฌ๊ธฐ๊ฐ€ 30k, 40k ์ •๋„๋กœ ํ™•์—ฐํžˆ ํฌ๊ธฐ ๋•Œ๋ฌธ์—, ์šฐ๋ฆฌ ํ”„๋กœ์ ํŠธ์˜ ์„ฑ๋Šฅ๊ณผ ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๊ณ  ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™”๋ฅผ ๋•๊ธฐ ์œ„ํ•ด EDA(Easy Data Augmentation) ๊ธฐ๋ฒ•์„ ์„ ํƒํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ฆ๊ฐ•ํ•˜์˜€์Šต๋‹ˆ๋‹ค([EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks(2019)]). CV ํ”„๋กœ์ ํŠธ์—์„œ๋Š” ์ž์ฃผ ์‚ฌ์šฉํ•˜๋Š” ์ผ์ • ๋…ธ์ด์ฆˆ๋‚˜ ์˜๋ฏธ๋ฅผ ํ•ด์น˜์ง€ ์•Š๋Š” ๋ณ€ํ™˜์„ ๋ถ€์—ฌํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋Š˜๋ฆฌ๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. NLP์—์„œ๋Š” ๋‹ค์Œ 4๊ฐ€์ง€ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋ฅผ ๊ฐ๊ฐ์˜ ๋ฌธ์žฅ์— ์ž„์˜๋กœ ์„ ํƒํ•˜์—ฌ ๋ฐ์ดํ„ฐ์…‹์„ ๊ฐ•์ œ๋กœ ์ฆ๊ฐ•ํ•ฉ๋‹ˆ๋‹ค.

  1. SR(Synonym Replacement) : ๋ถˆ์šฉ์–ด๊ฐ€ ์•„๋‹Œ n๊ฐœ์˜ ๋‹จ์–ด๋“ค์„ ์„ ํƒํ•ด ์ž„์˜๋กœ ์„ ํƒํ•œ ๋™์˜์–ด๋กœ ๋ฐ”๊พผ๋‹ค.
  2. RI(Random Insertion) : ๋ถˆ์šฉ์–ด๊ฐ€ ์•„๋‹Œ ์ž„์˜์˜ ๋‹จ์–ด๋ฅผ ์„ ํƒํ•ด ํ•ด๋‹น ๋‹จ์–ด์˜ ์ž„์˜์˜ ์œ ์˜์–ด๋ฅผ ์ž„์˜์˜ ํฌ์ง€์…˜์— ์‚ฝ์ž…ํ•œ๋‹ค. ์ด๋ฅผ n๋ฒˆ ๋ฐ˜๋ณตํ•œ๋‹ค.
  3. RS(Random Swap) : ๋ฌธ์žฅ ๋‚ด ์ž„์˜์˜ ๋‘ ๋‹จ์–ด์˜ ์œ„์น˜๋ฅผ ๋ฐ”๊พผ๋‹ค. ์ด๋ฅผ n๋ฒˆ ๋ฐ˜๋ณตํ•œ๋‹ค.
  4. RD(Random Deletion) : ๋ฌธ์žฅ ๋‚ด ์ž„์˜์˜ ๋‹จ์–ด๋ฅผ p์˜ ํ™•๋ฅ ๋กœ ์‚ญ์ œํ•œ๋‹ค.

n๊ฐœ์˜ ๋‹จ์–ด๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐฉ์‹์€ ๋ฌธ์žฅ์˜ ๊ธธ์ด์— ๋”ฐ๋ผ ๋ฌธ์žฅ ์˜๋ฏธ์˜ ๋ณ€์งˆ ์ •๋„๊ฐ€ ๋‹ฌ๋ผ์ง€๋ฏ€๋กœ, SR, RI, RS๋Š” ๋ฌธ์žฅ์˜ ๊ธธ์ด l์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์Œ์˜ ๊ณต์‹ n=ฮฑl์— ๋”ฐ๋ผ n์„ ๊ฒฐ์ •ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ฮฑ๋Š” RD์—์„œ p์™€ ๊ฐ™์€ ๊ฐ’์„ ๊ฐ–๋Š”, ๋ฌธ์žฅ ๋‚ด ๋ณ€ํ•˜๋Š” ๋‹จ์–ด๋“ค์˜ ๋น„์œจ์„ ์ง€์นญํ•˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค. ์ฐธ๊ณ  ๋…ผ๋ฌธ์—์„œ๋Š” ์ด ฮฑ๊ฐ’์„ train ๋ฐ์ดํ„ฐ์…‹ ํฌ๊ธฐ์— ๋”ฐ๋ผ ์ถ”์ฒœํ•˜๋ฉฐ, ํ•ด๋‹น ํ”„๋กœ์ ํŠธ์—์„œ ์›๋ณธ ๋ฐ์ดํ„ฐ๊ฐ€ 1๋งŒ์—ฌ๊ฐœ์ด๋ฏ€๋กœ ฮฑ๋ฅผ 0.1๋กœ, ์ฆ๊ฐ•ํ•  ๋ฌธ์žฅ์„ ์›๋ณธ ๋ฌธ์žฅ๋‹น 4๊ฐœ๋กœ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์˜ ํฌ๊ธฐ๊ฐ€ ์ž‘์„์ˆ˜๋ก(500) ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ๋” ๋†’์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์˜ ํฌ๊ธฐ๊ฐ€ ์ž‘์„์ˆ˜๋ก(500) ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ๋” ๋†’์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

Untitled

์ฆ๊ฐ• ๊ธฐ๋ฒ• ์ค‘ RI, SR์˜ ๊ฒฝ์šฐ ๋™์˜์–ด๋ฅผ ์ฐพ์„ wordnet์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ƒ๊ธฐ ๋…ผ๋ฌธ์„ ํ•œ๊ตญ์–ด๋กœ ๋ณ€ํ™˜ํ•œ KorEDA ํ”„๋กœ์ ํŠธ ์—์„œ๋Š” ํ•ด๋‹น wordnet์„ KAIST์—์„œ ๋ฐฐํฌํ•œ Korean WordNet(KWN)์„ ์‚ฌ์šฉํ–ˆ๊ณ , ๊ทธ๋Œ€๋กœ ์ ์šฉํ•ด ๋ณด์•˜์œผ๋‚˜ ๋™์˜์–ด๋กœ ์ œ๋Œ€๋กœ ๋ณ€ํ™˜๋˜์ง€ ์•Š์•˜๊ณ  ์ค‘๋ณต ๋ฌธ์žฅ์ด ๋‹ค์ˆ˜ ๋ฐœ์ƒํ•˜๋Š” ๋“ฑ์˜ ์–ด๋ ค์›€์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ƒˆ๋กœ์šด ์œ ์˜์–ด ์‚ฌ์ „์ด ํ•„์š”ํ•˜์˜€๊ณ  ๊ตญ๋ฆฝ๊ตญ์–ด์›์—์„œ ์ œ๊ณตํ•˜๋Š” ๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜ - ์–ดํœ˜ ๊ด€๊ณ„ ์ž๋ฃŒ: NIKLex๋ฅผ ์œ ์˜์–ด ์‚ฌ์ „์œผ๋กœ ํ™œ์šฉํ•˜์—ฌ RI, SR์— ์ด์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.NKLex ์ž๋ฃŒ๋Š” ๋น„์Šทํ•œ๋ง, ๋ฐ˜๋Œ€๋ง, ์ƒ์œ„์–ด, ํ•˜์œ„์–ด ๋“ฑ ์–ดํœ˜ ๊ด€๊ณ„๋ฅผ ์ด 5๋งŒ๋ช…์˜ ์–ธ์–ด ์‚ฌ์šฉ์ž๊ฐ€ ํ‰๊ฐ€ํ•œ ์ž๋ฃŒ๋กœ์„œ ์–ดํœ˜ ๊ด€๊ณ„ ๊ธฐ์ดˆ ์ž๋ฃŒ 20๋งŒ ์Œ ์ค‘ ๋น„์Šทํ•œ ๋ง 60,000์Œ์œผ๋กœ ์ œ๊ณตํ•˜๋Š” ๋‹จ์–ด ์ˆ˜๊ฐ€ 9714๊ฐœ์ธ KWN๋ณด๋‹ค ๋” ํ’๋ถ€ํ•œ ๊ฐœ์ฒด์ˆ˜๋ฅผ ๊ฐ–๊ณ  ์žˆ์–ด ๊ธฐ์กด ์–ด๋ ค์›€์„ ํ•ด๊ฒฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. KorEDA ํ”„๋กœ์ ํŠธ์˜ ์ฝ”๋“œ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ Sentence1์˜ ๋ฌธ์žฅ์˜ ์˜๋ฏธ๋ฅผ ๋ณ€์งˆํ•˜์ง€ ์•Š์œผ๋ฉด์„œ ๋ณ€ํ˜•ํ•˜์—ฌ ์ฆ๊ฐ•ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด Sentence1 ๋ฌธ์žฅ์˜ ์ง์ธ Sentence2๋ฅผ ์ฆ๊ฐ•๋œ ๋ฌธ์žฅ์— ์Œ์œผ๋กœ ์ถ”๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ „์ฒ˜๋ฆฌ ๋œ Train ๋ฐ์ดํ„ฐ ์…‹์˜ ํฌ๊ธฐ๊ฐ€ 10,494๊ฐœ์˜ ๋ฌธ์žฅ์Œ์„ ๊ฐ–๊ณ  ์žˆ์—ˆ๋Š”๋ฐ, 61,389๊ฐœ์˜ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์ฆ๊ฐ•์‹œํ‚ฌ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ฆ๊ฐ• ํ›„ ์ค‘๋ณต๋˜๋Š” ๋ฌธ์žฅ์€ ์‚ญ์ œํ•˜์˜€์Šต๋‹ˆ๋‹ค.

๋น„๋ก ํ”„๋กœ์ ํŠธ ๋งˆ๊ฐ ๋‚ด์— ์ฆ๊ฐ•๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ”๋žฉ ํ™˜๊ฒฝ์˜ ํ•œ๊ณ„๋กœ ๋๊นŒ์ง€ ๋Œ๋ ค๋ณผ ์ˆ˜ ์—†์—ˆ์ง€๋งŒ, ํ”„๋กœ์ ํŠธ ๋๋‚˜๊ณ  ๋ณด๊ฐ•ํ•˜๋Š” ๊ณผ์ •์—์„œ ์‹œํ—˜ํ•ด๋ณผ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

Select Model

์–ด๋–ค ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ๊ฒƒ์ธ๊ฐ€์— ๋Œ€ํ•œ ๋…ผ์˜๋ฅผ ํ•  ๋•Œ, STS task ๊ด€๋ จํ•œ ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ ๋ฐ ๋…ผ๋ฌธ์„ ์„œ์น˜ํ•˜์—ฌ ๊ฐ€์žฅ ์„ฑ๋Šฅ์ด ๋†’์€ ๋ชจ๋ธ์„ ๊ณต๋ถ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค. KLUEโ€™s benchmark scores, Tunib-Electraโ€™s benchmark scores, KoElectraโ€™s benchmark scores ๋“ฑ๋“ฑ์˜ ๋ฒค์น˜๋งˆํฌ ์Šค์ฝ”์–ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ๋„ ๊ณ ๋ฏผ ํ•ด ๋ณด์•˜์œผ๋ฉฐ, ๋‹ค๋ฅธ ์–ธ์–ด์˜ STS Task๋ฅผ ์ž˜ ์ˆ˜ํ–‰ํ•œ๋‹ค๊ณ  ํ‰๊ฐ€๋ฐ›์€ Sentence Transformers๋ฅผ ์ด์šฉํ•œ ๋ชจ๋ธ๋„ ์‹œ๋„ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ตญ pretrained model๋กœ KLUE-RoBERTa-Base ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น ๋ชจ๋ธ์„ ์„ ์ •ํ•˜๋Š” ๋ฐ ์žˆ์–ด์„œ ํ•œ๊ตญ์–ด ์ ํ•ฉ์„ฑ, ๋ชจ๋ธ ํฌ๊ธฐ ๋ฐ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์„ ๊ธฐ์ค€์œผ๋กœ ๊ณ ๋ คํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  1. ํ•œ๊ตญ์–ด ์ ํ•ฉ์„ฑ

    ํ•ด๋‹น ๋ชจ๋ธ์€ KLUE ์ฝ”ํผ์Šค๋ฅผ ์‚ฌ์ „ํ•™์Šตํ•œ RoBERTa base ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ๋กœ์„œ KLUE ๋ฐ์ดํ„ฐ์…‹์˜ ํ•œ๊ธ€ ๋ฌธ์žฅ ์œ ์‚ฌ๋„ ์ธก์ •์„ ํ•  ๋•Œ ์ตœ์ ํ™”๋œ ๋ชจ๋ธ์ด๋ฉฐ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ผ ๊ฒƒ์ด๋ผ ํŒ๋‹จํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋˜ KLUE STS task์˜ ๋ฒค์น˜๋งˆํฌ ๋ฆฌ๋”๋ณด๋“œ์—์„œ ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ ๋ชจ๋ธ์ด๊ธฐ ๋•Œ๋ฌธ์—, ํŒŒ์ธ ํŠœ๋‹์— ๋ชฐ์ž…ํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋‹ค์ค‘์–ด ๋ชจ๋ธ ๊ธฐ์ค€ STS๋ฅผ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ task์—์„œ ๋†’์€ ์„ฑ๋Šฅ์„ ๊ธฐ๋กํ•œ multi-use Sentence-BERT ๊ธฐ๋ฐ˜ ๋ชจ๋ธ all-mpnet-base-v2, all-MiniLM-L6-v2 ๋“ฑ์˜ pretrained model๋กœ ์‹œ๋„ํ•ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น ๋ชจ๋ธ์€ **[microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base)**๋ชจ๋ธ์„ ์‚ฌ์ „ํ•™์Šตํ•˜์—ฌ 1B ๋ฌธ์žฅ ์Œ์œผ๋กœ ํŒŒ์ธ ํŠœ๋‹ํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. Contrastive learning์„ ์‚ฌ์šฉํ•œ ๋ชจ๋ธ๋กœ ์ด๋Š” ๋ฌธ์žฅ ํŽ˜์–ด ์ค‘ ํ•˜๋‚˜๊ฐ€ ์ฃผ์–ด์ง€๋ฉด ๋ชจ๋ธ์€ ๋žœ๋คํ•˜๊ฒŒ ์ƒ˜ํ”Œ๋ง๋œ ๋ฌธ์žฅ๋“ค ์ค‘ ์œ ์‚ฌํ•˜์ง€ ์•Š์€ ๋ฌธ์žฅ์„ ๊ฑธ๋Ÿฌ๋‚ด๊ณ  ์ œ์ผ ์œ ์‚ฌํ•œ ํŽ˜์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์žฅ ๋ฒกํ„ฐํ™” ๋ฐฉ๋ฒ•๋ก ์ž…๋‹ˆ๋‹ค. ์ฃผ๋กœ unbalanced label์„ ๊ฐ–๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹์— ํšจ๊ณผ์ ์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ธ๋ฐ ํ•œ๊ตญ์–ด context๋ฅผ ํ•™์Šตํ•œ ๋ชจ๋ธ์— ๋น„ํ•ด ์ ํ•ฉ์„ฑ์ด ๋–จ์–ด์ง€๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  2. ๋ชจ๋ธ ํฌ๊ธฐ ๋ฐ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ

    KLUE ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์ „ํ•™์Šตํ•œ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ BERT base ๋ชจ๋ธ๊ณผ RoBERTa base ๋ชจ๋ธ์ด ์žˆ์—ˆ๊ณ  ๊ฐ๊ฐ ์ž„๋ฒ ๋”ฉ ์‚ฌ์ด์ฆˆ์™€ ๋ ˆ์ด์–ด, ํ—ค๋“œ ์ˆ˜๋กœ large, small๋กœ ๋ถ„๋ฅ˜๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ์ค‘ KLUE ๋ฒค์น˜๋งˆํฌ baseline ๋ชจ๋ธ ์ค‘ STS task์—์„œ ๊ฐ€์žฅ ๋†’์€ ์ ์ˆ˜์ธ pearsonsโ€™ r 93.35๋ฅผ ๊ธฐ๋กํ•œ ๋ชจ๋ธ์€ RoBERTa-large ๋ชจ๋ธ์ด์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ฐœ์ธ ๋žฉํƒ‘(๋งฅ๋ถ ํ”„๋กœ intel 2019 ๋ชจ๋ธ)์—์„œ ๊ตฌ๊ธ€ colab์œผ๋กœ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ํ™˜๊ฒฝ ์ƒ large ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ batch size๋ฅผ ์กฐ์ ˆํ•ด๋„ ๊ณ„์† ๋ฉ”๋ชจ๋ฆฌ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ํ˜„์žฌ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ž์› ์•ˆ์—์„œ ํšจ์œจ์ ์œผ๋กœ ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด base ๋ชจ๋ธ ์ค‘ BERT ๋ณด๋‹ค 1.65 ์ •๋„ ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๊ธฐ๋กํ•œ 92.5 ์Šค์ฝ”์–ด์˜ RoBERTa-base๋ฅผ ์„ ํƒํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋˜ API ์‘๋‹ต ์†๋„๋ฅผ ๊ณ ๋ คํ•˜์—ฌ layer ์ˆ˜๊ฐ€ ์ ์ ˆํ•˜๋ฉด์„œ ์ตœ๋Œ€ํ•œ ๊ฐ€๋ฒผ์šด ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๋„๋ก ๊ฒฐ์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

    Model Embedding Size Hidden Size # Layers # Heads
    KLUE-BERT-base 768 768 12 12
    KLUE-RoBERTa-base 768 768 12 12
    KLUE-RoBERTa-small 768 768 6 12
    KLUE-RoBERTa-large 1024 1024 24 16

Training Model & Hyperparameter tuning

Training Model

๋จผ์ €, ์‹ค๋ฌด์—์„œ ํ˜„์žฌ ํ™œ๋ฐœํ•˜๊ฒŒ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋‹ค๋Š” ๋ง์„ NLP ์‹ค๋ฌด์ž์ธ ๊ฐ™์€ ํŒ€์›์—๊ฒŒ ๋“ค์—ˆ๊ณ  ์ฒ˜์Œ ์ ‘ํ•ด ๋ณด์•˜๊ธฐ์— ํ”„๋กœ์ ํŠธ์—์„œ ํ™œ์šฉํ•˜๊ณ  ์‹ถ์–ด ์„ ํƒํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ณต์‹ ๋„ํ๋จผํŠธ์˜ ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฑด๋“œ๋ ค๋ณด์•˜๋‹ค๊ณ  ํ•ด๋„ ์ข‹์„ ์ •๋„๋กœ ์‹ฌ๋„ ์žˆ๊ฒŒ ํ™œ์šฉํ•ด ๋ณผ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๋‘๋ฒˆ์งธ๋กœ, TrainingArgument๋ผ๋Š” batch size, optimizer, evaluator ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ ์ž…๋ ฅํ•˜์—ฌ ์‰ฝ๊ฒŒ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ์ ์ด ๊ฐ€์žฅ ํฐ ์žฅ์ ์ด์—ˆ์Šต๋‹ˆ๋‹ค. layer๋ฅผ freeze ๋˜ f1_score, pearsonr ์„ ๊ณ„์‚ฐํ•  ๋•Œ ๋ณ€์ˆ˜ ํ•˜๋‚˜๋งŒ์œผ๋กœ ์‰ฝ๊ฒŒ ๋ชจ๋ธ์„ ๋”ฐ๋กœ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์žˆ์–ด์„œ ํŽธ๋ฆฌํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์œผ๋กœ Trainer์˜ ๊ฒฝ์šฐ docs๋‚˜ ํŠœํ† ๋ฆฌ์–ผ์˜ ์„ค๋ช…์ด ๋งค์šฐ ์ž˜ ๋˜์–ด ์žˆ์–ด์„œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๊ฑฐ๋‚˜ ํ•˜๊ณ  ์‹ถ์€ ์ž‘์—…์ด ์žˆ์„๋•Œ๋งˆ๋‹ค ์‰ฝ๊ฒŒ ์ดํ•ดํ•˜๋ฉฐ ํŒŒ์ธ ํŠœ๋‹์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ €๋Š” NLP๋ฅผ ๊ณต๋ถ€ํ•˜๋Š” ํ•™์ƒ์˜ ์ž…์žฅ์—์„œ ๋ชจ๋ธ ํ•™์Šต ๊ณผ์ • ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๊ทธ ๊ณผ์ •์—์„œ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ์—ญํ•  ๋˜ํ•œ ์ง๊ด€์ ์œผ๋กœ ๋ฐ›์•„๋“ค์ด๋ฉฐ ๊ณต๋ถ€ํ•  ์ˆ˜ ์žˆ์–ด์„œ ์ข‹์•˜์Šต๋‹ˆ๋‹ค.

TrainingArgument

TrainingArgument์—๋Š” ํฌ๊ฒŒ Dataset, Optimizer, Regularization, Evaluation ๊ด€๋ จ ๋งค๊ฐœ์ธ์ž ๊ฐ’์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ €๋Š” ํ•™์Šต epoch๋ฅผ 10์œผ๋กœ ์žก๊ณ , eval epoch๋Š” 8๋กœ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ๊ทธ๋ณด๋‹ค ๋” ์ ๊ฒŒ ์žก์•˜์Šต๋‹ˆ๋‹ค. objective function์œผ๋กœ๋Š” Cosine Similarity ๊ธฐ๋ฐ˜ MSE Loss๋ฅผ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. Optimizer๋Š” AdamW๋ฅผ ์ง€์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ•™์Šต๋ฅ  Learning rate 6e-5, weight decay๋Š” 0.01๋กœ trainer class์˜ default ๊ฐ’์œผ๋กœ ์ง€์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ•™์Šต์ด ๋ชจ๋‘ ๋๋‚˜๋ฉด, push_to_hub ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ HuggingFace Hub์— ์—…๋กœ๋“œ ๋˜๋„๋ก ํ–ˆ๊ณ  ๋‚˜์ค‘์— ํ›ˆ๋ จ์ด ๋๋‚œ ๋ชจ๋ธ์„ ๋ฐ”๋กœ ํ—ˆ๋ธŒ์—์„œ ๊ฐ„ํŽธํ•˜๊ฒŒ ๋Œ์–ด์˜ฌ ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

fp16 ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ฒ˜์Œ ์ ‘ํ–ˆ๋Š”๋ฐ, ๋ชจ๋ธ์ด ํ•™์Šตํ•  ๋•Œ 32-bit Floating Point๊ฐ€ ์•„๋‹Œ, 16-bit Floating Point๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. True ๊ฐ’์„ ์ค˜์„œ ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ๋ชจ๋ธ ํ•™์Šต์‹œ ์„ฑ๋Šฅ์€ ๋น„์Šทํ•˜์ง€๋งŒ ์•ฝ 60% ๊ฐ€๋Ÿ‰ ํ–ฅ์ƒ๋œ ์†๋„๋กœ ํ•™์Šต์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

Hyperparameter tuning

Trainer์— ์กด์žฌํ•˜๋Š” hyperparameter_search๋ผ๋Š” ๋ฉ”์†Œ๋“œ๋ฅผ ์ด์šฉํ•ด Hyperparameter tuning์„ ์œ„ํ•œ ์ตœ์ ๊ฐ’์„ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค. Trainer์—๋Š” raytune, optuna, sig0pt๊ณผ ๊ฐ™์€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ €๋Š” optuna๋ฅผ ์„ ํƒํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ ์ด์œ ๋Š” t์„ค์น˜๋ถ€ํ„ฐ ์‚ฌ์šฉ์ด ๊ฐ„ํŽธํ•˜๊ณ  ํ”„๋ ˆ์ž„์›Œํฌ์˜ ํฌ๊ธฐ ๋ฐ ๊ตฌ์กฐ๊ฐ€ ๊ฐ€๋ณ์Šต๋‹ˆ๋‹ค. ๋˜ ์กฐ๊ฑด ๋ฐ ๋ฃจํ”„๊ฐ€ ์นœ์ˆ™ํ•œ python ๊ตฌ๋ฌธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์„œ์น˜ ๋ฒ”์œ„๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ฝ”๋“œ๋ฅผ ๊ฑฐ์˜ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๊ณ  ํŒ€์› ๋ชจ๋‘๊ฐ€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ฐ„๊ฒฐํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์ด optuna๊ฐ€ ์ฐพ์„ hyperparameter๋“ค์„ ํ•จ์ˆ˜๋กœ ์ •์˜ํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์„œ์น˜์— ๋”ฐ๋ฅธ ์—ฌ๋Ÿฌ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ํ‰๊ฐ€๋Š” Weights&Bias(Wandb)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋‹ˆํ„ฐ๋ง ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ์ถ”์ด๋ฅผ ์‹œ๊ฐํ™”ํ•˜์—ฌ ์ œ๊ณตํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šต์ด ์ž˜ ์ด๋ค„์ง€๊ณ  ์žˆ๋Š”์ง€, ์ตœ์ ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐํ•ฉ์€ ๋ฌด์—‡์ธ์ง€ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Sweep ๊ธฐ๋Šฅ ๋“ฑ ๊ฐ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ๋ณ„๋กœ ๋ฒ”์œ„๋ฅผ ์ง€์ •ํ•˜๊ฑฐ๋‚˜ ์˜ํ–ฅ๋ ฅ์„ ์ง๊ด€์ ์œผ๋กœ ์•Œ ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋Šฅ์— ๋Œ€ํ•ด ๋’ค๋Šฆ๊ฒŒ ์•Œ๊ฒŒ ๋˜์–ด ์•„์‰ฌ์› ๊ณ , ๋‹ค์Œ ํ”„๋กœ์ ํŠธ๋•Œ ๊ผญ ์‚ฌ์šฉํ•ด๋ณด๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

แ„‰แ…ณแ„แ…ณแ„…แ…ตแ†ซแ„‰แ…ฃแ†บ 2022-03-24 แ„‹แ…ฉแ„Œแ…ฅแ†ซ 1.57.58.png

optuna๋Š” hyperparameter search ๋ฐฉ์‹์ธ grid search, random search, Bayesian method ์ค‘ ๋ฒ ์ด์ง€์•ˆ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜๋ฉด์„œ, ์†๋„๊ฐ€ ๊ต‰์žฅํžˆ ๋น ๋ฅธ ํŽธ์ž…๋‹ˆ๋‹ค.

optuna๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์กฐ์ •ํ•œ parameter๋Š” learning_rate, train_epochs, batch_size, weight_decay, warmup_steps์ž…๋‹ˆ๋‹ค. ์œ„ ๋‹ค์„ฏ ๊ฐœ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ณ ๋ฅธ ์ด์œ ๋Š” Wandb ์›น์‚ฌ์ดํŠธ์˜ โ€˜HuggingFace Transformers๋ฅผ ์œ„ํ•œ Hyperparameter ์ตœ์ ํ™”โ€™ ์นผ๋Ÿผ์—์„œ ์ฐธ๊ณ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

13๊ฐœ์˜ hyperparameter ์„œ์น˜๋ฅผ 60ํšŒ ํŠธ๋ผ์ด์–ผ ๋™์•ˆ ์ง„ํ–‰ํ•˜๋ฉด์„œ warmup_steps์˜ ์ค‘์š”๋„๊ฐ€ ๊ฐ€์žฅ ๋†’๊ฒŒ ๋‚˜์˜จ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์Œ์˜ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์ œ์ผ ๋†’์œผ๋ฉฐ, warmup_steps๊ฐ€ ๋‚ฎ์„์ˆ˜๋ก accuracy๋Š” ๋†’๊ฒŒ ๋‚˜์˜ต๋‹ˆ๋‹ค. ํ•ด๋‹น ์‹คํ—˜์„ ํ†ตํ•ด ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋‹ค์„ฏ๊ฐ€์ง€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉฐ, ๊ฐ€์žฅ ์˜ํ–ฅ์„ ํฌ๊ฒŒ ์ฃผ๋Š” ๋ฒ”์œ„๋กœ ์ง€์ •ํ•˜์—ฌ hyperparameter search๋ฅผ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

learning_rate๋Š” ๋„ˆ๋ฌด ์ž‘์œผ๋ฉด ์ˆ˜๋ ดํ•˜๊ธฐ๊นŒ์ง€ ๋งŽ์€ iteration์ด ํ•„์š”ํ•ด ๋น„ํšจ์œจ์ ์ด์ง€๋งŒ, ๋„ˆ๋ฌด ๋งŽ์€ iteration์€ ์ž์นซํ•˜๋ฉด ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹์— ๊ณผ์ ํ•ฉ๋˜์–ด robustํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์—†๋‹ค๋Š” ๋‹จ์ ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๋„ˆ๋ฌด ํฌ๋ฉด global minimum์— ์ˆ˜๋ ดํ•˜์ง€ ๋ชปํ•˜๊ณ  ๋ฐœ์‚ฐํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ฐ์ดํ„ฐ์…‹์ด ์ž‘์•„ ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด batch size๋ฅผ ํฌ๊ฒŒ ์„ค์ •ํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์—, ์ด์— ๋งž๋Š” learning rate๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜์˜€์Šต๋‹ˆ๋‹ค. num_train_epochs๋Š” ๋„ˆ๋ฌด ๋งŽ์€ ํ•™์Šต์€ ํ•™์Šต๋ฐ์ดํ„ฐ์— ๊ณผ์ ํ•ฉ๋  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋„ˆ๋ฌด ํฐ ๊ฐ’์œผ๋กœ ์„ค์ •ํ•˜๋ฉด ์•ˆ๋ฉ๋‹ˆ๋‹ค. per_device_batch_size๋„ ๋„ˆ๋ฌด ํฌ๋ฉด ๋ฉ”๋ชจ๋ฆฌ์— ์˜ํ•ด ์ œํ•œ๋  ์ˆ˜ ์žˆ๊ณ , ๋„ˆ๋ฌด ์ž‘์œผ๋ฉด ๊ฐ€์ค‘์น˜๋ฅผ ๋” ๋งŽ์ด ์—…๋ฐ์ดํŠธ ํ•˜๊ฒŒ ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ ์ ˆํ•œ ์‚ฌ์ด์ฆˆ์—ฌ์•ผํ•œ๋‹ค๊ณ  Reference ์—์„œ ์–ธ๊ธ‰ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. warmup steps๋Š” Reference ๋ฅผ ์‚ดํŽด๋ณด๋ฉด, ํ•™์Šต์˜ ์‹œ์ž‘ ์ „ํ›„์— learning rate๊ฐ€ ๋‚ฎ์„ ๋•Œ, warmup steps์˜ ๋‹จ๊ณ„๋ฅผ ์ง€๋‚˜ regularํ•œ learning rate๋ฅผ ๊ฐ–๊ฒŒํ•˜๋Š” ์ˆ˜์น˜์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์ €ํฌ๊ฐ€ ์“ฐ๊ณ  ์žˆ๋Š” Adam๊ฐ™์€ optimizer๊ฐ€ ์ •ํ™•ํ•˜๊ฒŒ gradient์˜ ํ†ต๊ณ„๋Ÿ‰์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค. Reference๋ฅผ ๋ณด๋ฉด weight decay๋ฅผ Gradient exploding์„ ํ”ผํ•˜๊ณ , Overfitting์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ์–ธ๊ธ‰ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณดํ†ต SGD optimizer๋‚˜ Adam optimizer๋ฅผ ์“ธ๋•Œ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๋Š” Parameter์ธ๋ฐ ์ €ํฌ์˜ ๋ชจ๋ธ์˜ optimizer๋„ Adam์ด๊ธฐ ๋•Œ๋ฌธ์—, ์ง€์ •ํ•ด์ค˜์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ด Hyperparameter๋“ค์— ๋Œ€ํ•œ ์ตœ์  ํƒ์ƒ‰์„ ์ˆ˜ํ–‰ํ–ˆ๊ณ , ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ hyperparameter tuning์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Result

Evaluation

๐Ÿ’ก Pearsonsโ€™r score of 0.932

F1 score of 0.728

ํ›ˆ๋ จ์— ์‚ฌ์šฉ๋˜์ง€ ์•Š์€ ์ œ๊ณต๋œ Validation set์„ Evaluation์— ์‚ฌ์šฉํ•˜์—ฌ, ์ตœ์ข… ์Šค์ฝ”์–ด๋ฅผ ์œ„์™€ ๊ฐ™์ด ๊ธฐ๋กํ•˜์˜€์Šต๋‹ˆ๋‹ค. F1 score์˜ ๊ฒฝ์šฐ, Trainer class์˜ metric์„ โ€˜f1โ€™์œผ๋กœ ์ง€์ •ํ•˜์—ฌ ๋ชจ๋ธ์„ ํฌ๊ฒŒ ์ˆ˜์ •ํ•˜์ง€ ์•Š๊ณ  ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜์˜€๋Š”๋ฐ ๊ทธ ๊ณผ์ •์—์„œ ์ œ๋Œ€๋กœ ํ›ˆ๋ จ๋˜์ง€ ๋ชปํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ •ํ™•๋„๊ฐ€ ๋งค์šฐ ๋‚ฎ์€ ๊ฒƒ์œผ๋กœ ์ƒ๊ฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Pearsonr metric์˜ ๊ฒฝ์šฐ KLUE-STS Leaderboard์—์„œ 2๋“ฑ ์„ฑ์ ์„ ๊ธฐ๋กํ•˜์˜€์Šต๋‹ˆ๋‹ค. threshold๊ฐ€ 3์ ์ธ ๊ฒƒ์„ ํ™œ์šฉํ•˜์—ฌ, ์œ ์‚ฌ๋„ ์ ์ˆ˜ 3์„ ๋„˜๊ธธ ๊ฒฝ์šฐ 1, ์ดํ•˜์ผ ๊ฒฝ์šฐ 0์œผ๋กœ binary-label์„ ์˜ˆ์ธกํ•œ๋‹ค๋ฉด f1 score๊ฐ€ ์ƒ์Šนํ•  ๊ฒƒ์œผ๋กœ ์ƒ๊ฐ๋ฉ๋‹ˆ๋‹ค.

Serving

FastAPI ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Fine-Tuningํ•œ Klue-RoBERTa-Base ๋ชจ๋ธ์„ ์„œ๋น™ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ž…๋ ฅ๊ฐ’์œผ๋กœ ํ•œ๊ตญ์–ด ๋ฌธ์žฅ 2๊ฐœ๋ฅผ ๋„ฃ์œผ๋ฉด ๋ชจ๋ธ์ด ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•˜์—ฌ inference ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. STS task๋ฅผ ์œ„ํ•œ Rest API ์„œ๋ฒ„ ์ฝ”๋“œ๋Š” https://github.com/honeybeat1/klue-sts-serving ํ•ด๋‹น ๊นƒํ—™์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Fine-Tuning๋œ model, tokenizer๋ฅผ huggingface์—์„œ ๋ฐ›์•„์˜ค๊ธฐ ๋•Œ๋ฌธ์— ์ฒซ ์ ‘์†์‹œ ๋กœ๋”ฉ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์„œ๋ฒ„ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๊ณ  ๋กœ์ปฌ IP address(http://127.0.0.1:8000/docs)๋กœ ์ ‘์†ํ•˜๋ฉด post method๋ฅผ ํ†ตํ•ด ๋น„๊ตํ•˜๊ณ ์ž ํ•˜๋Š” Sentence1, Sentence2๋ฅผ json ํ˜•์‹์œผ๋กœ ์ „์†กํ•ฉ๋‹ˆ๋‹ค. Inference ๊ฒฐ๊ณผ๋Š” ๋น„๊ตํ•  ๋‘ ๋ฌธ์žฅ, Cosine Similarity๋กœ ์ธก์ •ํ•œ ์‹ค์ˆ˜ํ˜• ์œ ์‚ฌ๋„ ๊ฐ’, threshold 3์ ์„ ๊ธฐ์ค€์œผ๋กœ ํ•˜๋Š” โ€˜๋‘ ๋ฌธ์žฅ์ด ๋น„์Šทํ•œ๊ฐ€?โ€์— ๋Œ€ํ•œ ๋‹ต์ธ ์ด์ง„ํ˜• ๋ผ๋ฒจ ๊ฐ’์ด ๋ฐ˜ํ™˜๋ฉ๋‹ˆ๋‹ค. ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„ ๊ฐ’์ด 3 ์ด์ƒ์ผ์‹œ โ€˜Yesโ€™๋ฅผ, ์ดํ•˜์ผ์‹œ โ€˜Noโ€™๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

Untitled

Works to try Next time

  • 12์ผ์ด๋ผ๋Š” ์‹œ๊ฐ„์˜ ์••๋ฐ•๊ณผ ์ž์›์˜ ํ•œ๊ณ„๋กœ ์ธํ•ด Trainer ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ Fine-tuning์„ ์ง„ํ–‰ํ–ˆ๋Š”๋ฐ, ๋‹ค์Œ์—๋Š” ๋ณ„๋„ LSTM๊ณผ ๊ฐ™์€ bidirectional layer๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์‹œ๋„ํ•ด๋ณด๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค. ๋˜ NLP ํ”„๋กœ์ ํŠธ๋Š” ์ฒ˜์Œ์ด๋ผ ํ˜„์žฌ ์‹ค๋ฌด์ž๋กœ ์ผํ•˜๊ณ  ์žˆ๋Š” ์„ ๋ฐฐ์—๊ฒŒ ์‹ค์šฉ์ ์ด๊ณ  SOTA ๋ชจ๋ธ์— ๋Œ€ํ•ด ๋งŽ์€ ์งˆ๋ฌธ์„ ํ–ˆ์—ˆ๋Š”๋ฐ, pretrained ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์ง€์•Š๊ณ  ํ† ํฌ๋‚˜์ด์ €๋ถ€ํ„ฐ ์ƒˆ๋กœ ๋งŒ๋“ค์–ด ์“ด๋‹ค๋ฉด sentencepiece ๋ชจ๋ธ์„ ํ™œ์šฉํ•ด ๋ณด๋Š” ๊ฒƒ๋„ ๋‹ค์Œ์— ์‹œ๋„ํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ชจ๋ธ epoch๊ฐ€ ์ง€๋‚ ์ˆ˜๋ก ๊ณผ์ ํ•ฉ์ด ๋ฐœ์ƒํ•˜์—ฌ Validation loss๋Š” ๋Š˜์–ด๋‚˜๊ณ  train loss๋Š” ์ค„์–ด๋“œ๋Š” ์ƒํ™ฉ์ด ๋ฐœ์ƒํ–ˆ์„๋•Œ, ๋ชจ๋ธ ์ž์ฒด์— early stopping ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜์ง€ ์•Š์•„ ์‹œ๊ฐ„์ด ๋‚ญ๋น„๋˜๋Š” ์ƒํ™ฉ์ด ๊ณ„์† ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋•Œ ๋ชจ๋ธ์„ ๋Œ๋ ค๋†“๊ณ  ๊ทธ ์‹œ๊ฐ„๋™์•ˆ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋‚˜ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๊ฒ€์ƒ‰์„ ๊ณ„์† ํ–ˆ๋˜ ์ƒํ™ฉ์ด๋ผ ์ผ์†์ด ๋ถ€์กฑํ•˜์—ฌ ์ด๋Ÿฐ ์ผ์ด ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์—๋Š” ํ”„๋กœ์ ํŠธ ์ƒ์‚ฐ์„ฑ์— ์ดˆ์ ์„ ๋งž์ถฐ๋ณด๋ฉด์„œ ์ง„ํ–‰ํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.
  • Wandb๋ฅผ ์ด๋ฒˆ์— ์ฒ˜์Œ์œผ๋กœ ์‚ฌ์šฉํ•˜๋ฉด์„œ wandb๊ฐ€ ์ œ๊ณตํ•˜๋Š” ๋‹ค์–‘ํ•œ ์‹œ๊ฐํ™” ๊ธฐ๋Šฅ๋“ค์„ ์‹ฌ๋„ ์žˆ๊ฒŒ ์‚ฌ์šฉํ•˜์ง€ ๋ชปํ•ด ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค. ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์˜ํ–ฅ๋ ฅ๊ณผ accuracy์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ ๋“ฑ์„ ๋‹ค์Œ์—๋Š” ์„ฑ๋Šฅ๊ณผ ํ•จ๊ป˜ ๋ชจ๋‹ˆํ„ฐ๋ง ํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.
  • Data Augmentation ๋ถ€๋ถ„์—์„œ EDA๋ฟ๋งŒ ์•„๋‹ˆ๋ผ Round-trip translation / Back Translation๋„ ์ž์ฃผ ์“ฐ์ด๋Š” ๋ฐฉ์‹์ธ ๊ฒƒ์„ ์ด๋ฒˆ์— ์•Œ๊ฒŒ ๋˜์–ด, ๋‹ค์Œ์—๋Š” ๊ผญ ์ ์šฉํ•ด๋ณด๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

Acknowledgement

Role : ๋…ผ๋ฌธ ๋ฆฌ์„œ์น˜ / ๋ชจ๋ธ๋ง / ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ / ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• / ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ / ๊ธฐํƒ€ ์ฝ”๋“œ ์ž‘์—… / ๋ณด๊ณ ์„œ ์ž‘์„ฑ

์ €๋Š” ์ด๋ฒˆ NLP ํ”„๋กœ์ ํŠธ์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ „์ฒ˜๋ฆฌํ•˜๋Š” ๊ฐ€์žฅ ์ฒซ ์ฝ”๋“œ๋ถ€ํ„ฐ ๋งˆ์ง€๋ง‰ API ์„œ๋ฒ„ ์ฝ”๋“œ๊นŒ์ง€ ์™„์„ฑํ•ด๋ณธ ๊ท€์ค‘ํ•œ ๊ฒฝํ—˜์„ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ๊ป ํŒ€ ํ”„๋กœ์ ํŠธ์—์„œ ์Šคํฌ๋ž˜์น˜๋ถ€ํ„ฐ ์„œ๋ฒ„ ์ฝ”๋“œ๊นŒ์ง€ ๋งŒ์ ธ๋ณธ ์ ์€ ์—†์—ˆ๋Š”๋ฐ, ์ฒ˜์Œ์œผ๋กœ ๋ชจ๋“  ๋ถ€๋ถ„์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ํ”„๋กœ์ ํŠธ๋ž€ ์ด๋Ÿฐ ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฑฐ์น˜๋Š” ๊ฒƒ์ด๊ตฌ๋‚˜ ์ง์ ‘ ์•Œ๊ฒŒ ๋˜์–ด ํ”„๋กœ์ ํŠธ ๋ง๋ฏธ์ธ ์ง€๊ธˆ ์–ด๋–ค ์‚ฐ๋ด‰์šฐ๋ฆฌ์— ๋„์ฐฉํ•œ ๋А๋‚Œ์ž…๋‹ˆ๋‹ค. ๊ณต๋ถ€ํ•˜๋ฉด ํ• ์ˆ˜๋ก ์žฌ๋ฐŒ๊ณ  ๋” ์ ์šฉํ•˜๊ณ  ์‹ถ์€ ๋ถ€๋ถ„์ด ์ƒ๊ฒจ์„œ ๋„ˆ๋ฌด๋‚˜ ์žฌ๋ฐŒ๊ฒŒ ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์ƒˆ๋กœ์šด Tool์„ docs๋ฅผ ์—ด์‹ฌํžˆ ์ฝ์–ด๊ฐ€๋ฉฐ ์‚ฌ์šฉํ•ด๋ณธ ๊ฒฝํ—˜์ด ์ข‹์•˜์Šต๋‹ˆ๋‹ค. ์ฒ˜์Œ์—” Wandb์˜ ์กด์žฌ๋„ ๋ชฐ๋ผ์„œ Notion์— ๋ณ„๋„๋กœ try๋ฅผ ์ •๋ฆฌํ•˜๋Š” ์ˆ˜์ค€์ด์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด์ œ๋Š” Wandb๋กœ ๋‹ค์–‘ํ•œ try๋“ค์˜ ์„ฑ๋Šฅ ๋ชจ๋‹ˆํ„ฐ๋ง์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

Untitled

์ด๋ฒˆ ์‹ฌํ™” ๊ณผ์ •์— ์ฐธ์—ฌํ•˜๋ฉด์„œ, ์ข‹์•˜๋˜ ์ ์€ NLP ํ”„๋กœ์ ํŠธ๊ฐ€ ์ฒ˜์Œ์ด์—ˆ์ง€๋งŒ 5์ฃผ๊ฐ„ ์ฐจ๊ทผ ์ฐจ๊ทผ ํŠœํ† ๋ฆฌ์–ผ์ฒ˜๋Ÿผ ๊ธฐ์ดˆ๋ถ€ํ„ฐ ์•Œ๋ ค์ฃผ์‹  ๋•๋ถ„์— ๋”ฐ๋ผํ•  ์ˆ˜ ์žˆ์—ˆ๋˜ ์ ์ž…๋‹ˆ๋‹ค. ์ œ๊ฐ€ ๋…ผ๋ฌธ๊ณผ ๊ตฌ๊ธ€๋ง์„ ํ†ตํ•ด ํ•ด๋ณด๊ณ  ์‹ถ์—ˆ๋˜ ์ „์ฒ˜๋ฆฌ๋‚˜, ๋ชจ๋ธ ๋“ฑ์„ ์‹œ๊ฐ„ ๋‚ด์— ๋‹ค ์ ์šฉํ•ด ๋ณผ ์ˆ˜ ์—†์—ˆ์ง€๋งŒ ํ”„๋กœ์ ํŠธ ๋ณด๊ณ ์„œ๋ฅผ ์ •๋ฆฌํ•˜๋ฉด์„œ 3์ฃผ๊ฐ„ ์ผ์ผ ๊ณผ์ œ๋ฅผ ํ†ตํ•ด ๋ฐฐ์› ๋˜ ๊ธฐ์ดˆ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ง‘์ค‘์ ์œผ๋กœ, ํŠนํžˆ sts task์— ๋Œ€ํ•˜์—ฌ ๋๊นŒ์ง€ ํŒŒ๋ณธ ๊ธฐ๋ถ„์ด๋ผ ๋ฟŒ๋“ฏํ–ˆ์Šต๋‹ˆ๋‹ค. ์ฒซ import ์ฝ”๋“œ๋ถ€ํ„ฐ api ์ฝ”๋“œ๊นŒ์ง€ ํ”Œ์  ๋ชจ๋“  ๊ณผ์ •์— ์ฐธ์—ฌํ•ด๋ณธ ์ ์€ ์ฒ˜์Œ์ด๋ผ์„œ ๋ชธ์ด ๊ณ ๋˜๋„ ์ •๋ง ๋งŽ์ด ๋ฐฐ์› ๋‹ค๋Š” ์ƒ๊ฐ์— ๋ณด๋žŒ์ฐผ๋˜ ๊ธฐ๊ฐ„์ด์—ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

Reference

  1. A disciplined approach to neural network hyper-parameters https://arxiv.org/abs/1803.09820
  2. How to Tune Hyper-Parameters in Deep Learning | by Neil Zhang | Medium
  3. Sentence Transformers https://arxiv.org/abs/1908.10084
  4. Why do high learning rate diverges the weight updates? https://medium.com/@prash24goel/why-do-high-learning-rate-diverges-the-weight-updates-c39d9b3b326d
  5. Learning rate์˜ Max, Min https://arxiv.org/pdf/1506.01186.pdf
  6. ์นด์นด์˜ค ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ Khaiii https://brunch.co.kr/@kakao-it/308
  7. KLUE benchmark https://arxiv.org/pdf/2105.09680.pdf
  8. Weight decay https://medium.com/analytics-vidhya/deep-learning-basics-weight-decay-3c68eb4344e9#:~:text=Weight decay is a regularization,weights and not the bias.
  9. KoELECTRA https://github.com/monologg/KoELECTRA
  10. tunib-electra https://github.com/tunib-ai/tunib-electra
  11. EDA : Easy Data Augmentation for Boosting Performance on Text Classification Tasks https://github.com/jasonwei20/eda_nlp
  12. [konlpy] ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋ณ„ ๋ช…์‚ฌ(noun) ๋ถ„์„ ์†๋„ ๋น„๊ต https://needjarvis.tistory.com/691

About

Solving Semantic Textual Similarity task for KLUE Benchmark dataset within 12 days

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages