Skip to content

WEBAI-IAMAI/TEXT2POKEMON-Instruct-pix2pix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

20 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

TEXT2POKEMON-Instruct-pix2pix

We have fine-tuned the Instruct Pix2Pix model to create a system that can colorize characters in images using only text.

๐Ÿ†Prize - ๋ฐ์ดํ„ฐ๊ณผํ•™ ๊ฒฝ์ง„๋Œ€ํšŒ ์šฐ์ˆ˜์ƒ

Table of contents

๐Ÿถ dataset
๐Ÿ“ท image-captioning
๐Ÿ’บ fine-tuning
๐ŸŽ› WEB SERVICE

datasets

datasets ๊ฐœ์š”

์ด ํ”„๋กœ์ ํŠธ๋Š” instruct pix2pix ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ‘๋ฐฑ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉ์ž์˜ ์บก์…˜์— ๋งž๊ฒŒ ์ฑ„์ƒ‰ํ•˜๋Š” ๊ธฐ์ˆ ์„ ๊ฐœ๋ฐœํ•˜๊ณ  ํ…Œ์ŠคํŠธ ํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ชฉํ‘œ๋Š” ํ‘๋ฐฑ ์ด๋ฏธ์ง€์— ์‚ฌ์šฉ์ž์˜ ์บก์…˜์— ๋”ฐ๋ผ ์ƒ‰์ƒ์„ ๋ถ€์—ฌํ•˜์—ฌ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€๋กœ ์ถœ๋ ฅํ•˜๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ธฐ์ˆ ์€ ์—ญ์‚ฌ์  ์‚ฌ์ง„, ์˜ˆ์ˆ  ์ž‘ํ’ˆ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ํ™œ์šฉ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

datasets ์ถœ์ฒ˜

๋ณธ ๋ฐ์ดํ„ฐ์…‹์€ svjack/pokemon-blip-captions-en-ja์˜ ์บก์…˜ ๋ฐ์ดํ„ฐ, Sketch2Pokemon์˜ ์ปฌ๋Ÿฌ, ํ‘๋ฐฑ ์ด๋ฏธ์— ํ•ด๋‹นํ•˜๋ฉฐ, 2023๋…„ 7์›” ๊ธฐ์ค€์œผ๋กœ ์ˆ˜์ง‘๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ์— ๋Œ€ํ•œ ๋ผ์ด์„ ์Šค ๋ฐ ์ €์ž‘๊ถŒ ์ •๋ณด๋Š”
svjack/pokemon-blip-captions-en-ja ยท Datasets at Hugging Face
Sketch2Pokemon

svjack/pokemon-blip-captions-en-ja ์—์„œ ์บก์…˜์„ ์ˆ˜์ง‘, Sketch2Pokemon์—์„œ ์Œ์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ํ‘๋ฐฑ ์ด๋ฏธ์ง€์™€ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€๋ฅผ ์ˆ˜์ง‘ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ˆ˜์ง‘ํ•œ ํ‘๋ฐฑ ์ด๋ฏธ์ง€์™€ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๋Š” 256x256, pngํ˜•์‹ ์ž…๋‹ˆ๋‹ค. (ํฌ๊ธฐ๋ฅผ ๋งž์ถœ ํ•„์š”๋Š” ์—†์Šต๋‹ˆ๋‹ค.) ๊ฐ ๋ฐ์ดํ„ฐ๋“ค์€ csvํŒŒ์ผ๋กœ ๋ชจ๋ธ์— ์ ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ‘๋ฐฑ ์ด๋ฏธ์ง€, ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€ ๊ทธ๋ฆฌ๊ณ  ์ด๋ฏธ์ง€๋ฅผ ์„ค๋ช…ํ•˜๋Š” ์บก์…˜์„ ๋งค์นญ์‹œํ‚จ ํ˜•ํƒœ๋กœ ๊ตฌ์„ฑ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ•™์Šต ์‹œํ‚จ ์ด๋ฏธ์ง€๋Š” ์Œ์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ํ‘๋ฐฑ, ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€ ๊ฐ๊ฐ 826์žฅ, ์ด๋ฏธ์ง€๋ฅผ ์„ค๋ช…ํ•˜๋Š” ์บก์…˜ 307๊ฐœ ์ž…๋‹ˆ๋‹ค.

ํ‘๋ฐฑ ์ด๋ฏธ์ง€ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€ cation
์ด๋ฏธ์ง€ ๊ฐœ์ˆ˜ 826 826 307
ํฌ๊ธฐ 28.6MB 37.6MB 15.1KB

์‚ฌ์šฉ์ž๋Š” ์ž์‹ ์˜ ๋ฐ์ดํ„ฐ์…‹์„ ๊ตฌ์กฐ์— ๋งž๊ฒŒ ์ค€๋น„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ์ด๋ฏธ์ง€์— ๋งž๊ฒŒ ์ด๋ฏธ์ง€ ๊ฒฝ๋กœ๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ๋งค์นญ ์‹œํ‚ค๊ณ , ์ด๋ฏธ์ง€๋ฅผ ์„ค๋ช…ํ•˜๋Š” ์บก์…˜์„ ์ถ”๊ฐ€ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

datasets ๊ตฌ์กฐ

์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ์…‹์€ ํ‘๋ฐฑ ์ด๋ฏธ์ง€, ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€์™€ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€๋ฅผ ์„ค๋ช…ํ•˜๋Š” ์บก์…˜์ด ์Œ์œผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋Š” ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.

  • ํ‘๋ฐฑ ์ด๋ฏธ์ง€ : ์›๋ณธ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€์—์„œ ์ƒ‰์ƒ ์ •๋ณด๋ฅผ ์ œ๊ฑฐํ•œ ์ด๋ฏธ์ง€์ž…๋‹ˆ๋‹ค. ์ด ํ‘๋ฐฑ ์ด๋ฏธ์ง€๋“ค์€ ๋ชจ๋ธ์ด ์ฑ„์ƒ‰์„ ์ˆ˜ํ–‰ํ•  ๋Œ€์ƒ์ด ๋ฉ๋‹ˆ๋‹ค.
  • ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€ : ํ‘๋ฐฑ ์ด๋ฏธ์ง€์™€ ์Œ์„ ์ด๋ฃจ๋Š” ์›๋ณธ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€์ž…๋‹ˆ๋‹ค. ์ด ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€๋“ค์€ ๋ชจ๋ธ์ด ์ฑ„์ƒ‰ ํŒจํ„ด์„ ํ•™์Šตํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • ์บก์…˜ : ์›๋ณธ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€๋ฅผ ์„ค๋ช…ํ•˜๋Š” ์งง์€ ๋ฌธ์žฅ์ž…๋‹ˆ๋‹ค. ์ด ์บก์…˜์€ ์›๋ณธ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€์˜ ์ƒ‰์ƒ ์ •๋ณด, ํŠน์ง• ์ •๋ณด๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

ex_data.jpg

๐Ÿ‘‰ a drawing of a green pokemon with red eyes

์œ„ ์Œ์„ ์ด๋ฃจ๋Š” ํ‘๋ฐฑ ์ด๋ฏธ์ง€, ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€์™€ ์บก์…˜์€ ํ•˜๋‚˜์˜ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.

์ด ๋ฐ์ดํ„ฐ ์…‹์€ ํ•™์Šต์— ํ•„์š”ํ•œ ๊ฒƒ์œผ๋กœ, ์บก์…˜์˜ ๊ฒฝ์šฐ ๋” ๋งŽ์€ ์ƒ‰์ƒ ์ •๋ณด์™€ ํŠน์ง•์„ ๋‹ด๋Š”๋‹ค๋ฉด, ๋ชจ๋ธ ํ–ฅ์ƒ์— ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

image captioning

๐Ÿ–ผ๏ธimage captioning

์šฐ๋ฆฌ๊ฐ€ ๊ฐ€์ง„ dataset ์— ๋Œ€ํ•ด์„œ ์•Œ๋งž๋Š” caption์„ ์ž๋™ํ™”ํ•˜์—ฌ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก image captioning๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ

caption data๋ฅผ ์ƒ์„ฑํ•˜์˜€๋‹ค.

Instruct pix2pix ๋ชจ๋ธ์„ finetuning ์‹œํ‚ฌ ๋•Œ ๋‹ค๋Ÿ‰์˜ caption data๊ฐ€ ํ•„์š”ํ•˜๋ฏ€๋กœ ํฌ์ผ“๋ชฌ ์ด๋ฏธ์ง€์˜ ์ •๋ณด๋ฅผ text๋กœ ๋‚˜ํƒ€๋‚ด์ค„ ์ˆ˜ ์žˆ๋Š” image captioning model์„

์„ ์ • ํ•ด์•ผํ–ˆ๋‹ค. ๊ทธ ์ค‘ โ€œ**CNN-LSTMโ€**๊ณผ โ€œCLIPโ€ 2๊ฐœ์˜ ๋ชจ๋ธ์ด ํ›„๋ณด์— ์˜ฌ๋ž๋‹ค.

์ด ์ค‘์—์„œ ํฌ์ผ“๋ชฌ ์ด๋ฏธ์ง€์˜ ์ƒ‰ ์ •๋ณด๋ฅผ ๋” ์ž˜ ์ดํ•ดํ•˜๊ณ  caption์œผ๋กœ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ย  ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด ๊ด€๊ฑด์ด์—ˆ๋‹ค.

                                                                                  ํ›„๋ณด ๋ชจ๋ธ๊ตฐ

CNN-LSTM

CNN-LSTM

CLIP

CLIP

์œ„ ๋ชจ๋ธ์˜ ํŠน์ง•์„ Encoder, Decoder ์œ ๋ฌด๋กœ ๋ฐœ์ƒํ•˜๋Š” ์ฐจ์ด๋กœ ์„ค๋ช…์„ ํ•  ์ˆ˜ ์žˆ๋‹ค.

(1)Encoder(CLIP)

ํŠน์ • input data๊ฐ€ ๋“ค์–ด์™”์„ ๋•Œ ๊ทธ๊ฒƒ์„ ์ž˜ ์š”์•ฝ์„ ํ•ด์ค˜์„œ ์ž ์žฌ์ ์ธ ์ •๋ณด(latent vector)์— ์ดˆ์ ์„ ๋งž์ถ”์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ƒ๋Œ€์ ์œผ๋กœ ์ •๋ณด์˜ ์ž์œ ๋„๋Š” ๋†’์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ->โ€zero-shotโ€ ์„ฑ๋Šฅ ์šฐ์ˆ˜

(2)Encoder, Decoder(CNN-LSTM)

****์šฐ์„  encoder๋กœ latent vector๋ฅผ ์ถ”์ถœํ•˜๊ณ  decoder๋ฅผ ํ†ตํ•ด์„œ ์ •์ œ๋œ caption์„ ์ž˜ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ๋‹ค.

-> ์„ฑ๋Šฅ ๋ฐธ๋Ÿฐ์Šค ์šฐ์ˆ˜

๐Ÿง๋ชจ๋ธ ์„ฑ๋Šฅ ํ‰๊ฐ€ ๋ฐ ์„ ์ • ์ด์œ 

๋ชจ๋ธ์„ ๋Œ๋ ค๋ณธ ๊ฒฐ๊ณผ, CNN-LSTM์—์„œ๋Š” unknown token์ด ๋ฐœ์ƒํ•˜์—ฌ ์ด๋ฏธ์ง€์˜ ์ค‘์š”ํ•œ ์ƒ‰ ์ •๋ณด๋ฅผ ํ‘œํ˜„ํ•˜์ง€ ๋ชปํ•˜์˜€๋‹ค. ํ•˜์ง€๋งŒ latent vector์— ์ดˆ์ ์„ ๋งž์ถ˜ CLIP์€ CNN-LSTM์— ๋น„ํ•ด ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— CLIP์œผ๋กœ 826๊ฐœ์˜ caption data๋ฅผ ์ƒ์„ฑํ•˜์˜€๋‹ค.

CLIP์€ meta์—์„œ 2022๋…„์— ๋ฐœํ‘œํ•œ ๋ชจ๋ธ๋กœ 10์–ต๊ฐœ์˜ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜์œผ๋กœ noise๊ฐ€ ์กฐ๊ธˆ ์žˆ์ง€๋งŒ ํ‘œํ˜„์˜ ๋ฒ”์œ„๊ฐ€ ๋ณด๋‹ค ๋‹ค์–‘ํ•œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

CNN-LSTM CLIP
Turn it into a drawing with a body, face, and horns. Turn it into a close up of a cartoon bird with a red head and white wings, style of pokemon, werecrow, ultra-high resolution, kid named finger, cleanest image, wildfire, metalhead, soaring, tuxedo, black white red, folklore

๐Ÿ› ๏ธFINE TUNING

์ด์ œ ํ•„์š”ํ•œ ํ•™์Šต์— ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ์…‹์ด ๋ชจ๋‘ ๊ตฌ์ถ•์ด ๋˜์—ˆ์œผ๋‹ˆ instruct pix2pix ๋ชจ๋ธ์„ ํฌ์ผ“๋ชฌ ์ฑ„์ƒ‰์— ์ตœ์ ํ™”๋œ ๋ชจ๋ธ๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด fine-tuning์„ ํ•ด๋ณผ ๊ฒƒ์ด๋‹ค. fine-tuning์„ ํ•  ๋•Œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ github, huggingface ๋ ˆํผ๋Ÿฐ์Šค๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ํ•˜์˜€๋‹ค.

GitHub - huggingface/instruction-tuned-sd: Code for instruction-tuning Stable Diffusion.

ํ•™์Šต ํ™˜๊ฒฝ์€ PyTorch 1.13.1 (CUDA 11.6) GPU RTX4090 ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•˜์˜€๊ณ .

xformer 0.0.16 ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ์— ํšจ์œจ์ ์ธ ํ•™์Šต์„ ํ•˜๋‹ค.

๐Ÿ“–ํ•™์Šต๊ฒฐ๊ณผ

Untitled

์ด์™€ ๊ฐ™์€ ๊ฒฐ๊ณผ๋Š” instruct pix2pix model์— ๊ทธ๋ฆผ์˜ ์ •๋ณด๋ฅผ ์„ค๋ช…ํ•˜๋Š” caption๊ณผ ๋ณ€ํ™˜์ „ image,๋ณ€ํ™˜ ํ›„ image๋ฅผ ๋„ฃ๊ณ  fine-tuning์„ ํ•œ ํ›„ ์‚ฌ์šฉ์ž๊ฐ€ ์›ํ•˜๋Š” ๋ถ€์œ„์— ์›ํ•˜๋Š” ์ƒ‰์„ ์น ํ•˜๊ฒ ๋‹ค๋Š” prompt๋ฅผ fine-tuning๋œ ๋ชจ๋ธ์ด ๋ฐ›์œผ๋ฉด ์„ธ๋ฐ€ํ•˜๊ฒŒ ์Šค์ผ€์น˜์˜ ๋ถ€์œ„๋ฅผ ์ธ์‹ํ•˜๊ณ  ์ƒ‰์„ ์น ํ•ด์ค„ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์ฆ๋ช…ํ•œ๋‹ค. ๋˜ํ•œ CNN-LSTM์˜ caption data์™€ CLIP์˜ caption data๋ฅผ ํ•™์Šต์‹œํ‚จ ๊ฒฝ์šฐ๋ฅผ ๋น„๊ตํ•˜๋ฉด ํ•™์Šต๋œ caption์˜ ์งˆ์— ๋”ฐ๋ผ ํ™•์—ฐํžˆ ๊ฒฐ๊ณผ๊ฐ€ ๋‹ค๋ฅธ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๊ณ  instruct pix2pix๋Š” ๋‹ค์–‘ํ•œ ํŽธ์ง‘๋Šฅ๋ ฅ์„ ๊ฐ€์ง€๊ณ ์žˆ๊ณ  ์ œ๋กœ์ƒท ์„ฑ๋Šฅ๋„ ๋†’๊ธฐ ๋•Œ๋ฌธ ์งˆ๊ณผ ์–‘์—์„œ ์šฐ์ˆ˜ํ•œ data๋ฅผ ํ•™์Šต์‹œํ‚จ๋‹ค๋ฉด ๊ทธ์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ๋„ ๋ˆˆ์— ๋„๊ฒŒ ํ–ฅ์ƒ๋  ๊ฒƒ์ด๋ผ ๊ธฐ๋Œ€ํ•ด ๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜ํƒ€๋‚ฌ๋‹ค.

fine-tuning code

export MODEL_ID="timbrooks/instruct-pix2pix"
export DATASET_ID="instruction-tuning-sd/cartoonization"
export OUTPUT_DIR="cartoonization-finetuned"

accelerate launch --mixed_precision="fp16" finetune_instruct_pix2pix.py \
--pretrained_model_name_or_path=$MODEL_ID \
--dataset_name=$DATASET_ID \
--use_ema \
--enable_xformers_memory_efficient_attention \
--resolution=256 --random_flip \
--train_batch_size=2 --gradient_accumulation_steps=4 --gradient_checkpointing \
--max_train_steps=15000 \
--checkpointing_steps=5000 --checkpoints_total_limit=1 \
--learning_rate=5e-05 --lr_warmup_steps=0 \
--mixed_precision=fp16 \
--val_image_url="./pokemon_pix2pix_dataset/trainA/0002.png" \
--validation_prompt="a cartoon character with a potted plant on his head" \
--seed=42 \
--output_dir=$OUTPUT_DIR \
--report_to=wandb \
--push_to_hub

๐Ÿ–ฅ๏ธWeb Service

main image1 main image2 main search image
Collect page(Main page)
ํฌ์ผ“๋ชฌ์˜ ์ด๋ฆ„ ๋ฐ ์†์„ฑ ์ •๋ณด ์ œ๊ณต
ํฌ์ผ“๋ชฌ ์•„์ดํ…œ ์„ ํƒ ์‹œ ์ƒ์„ฑ ํŽ˜์ด์ง€๋กœ ์ด๋™
์ƒ๋‹จ- ๊ฒ€์ƒ‰์ฐฝ & ์นดํ…Œ๊ณ ๋ฆฌ ๋ฒ„ํŠผ
์ด๋ฆ„ ๋ฐ ์†์„ฑ ๊ฒ€์ƒ‰ / ์นดํ…Œ๊ณ ๋ฆฌ ๊ฒ€์ƒ‰ / ํฌ์ผ“๋ชฌ ์„ ํƒ ์„น์…˜

image2
Create page
์ƒ์„ฑํ•  ํฌ์ผ“๋ชฌ ์ด๋ฆ„ ์ž‘์„ฑ / Prompt์— ์ฑ„์ƒ‰ํ•  Text ์ž…๋ ฅ
ํฌ์ผ“๋ณผ ์ด๋ฏธ์ง€ ํด๋ฆญ ์‹œ ๋กœ๋”ฉ ํŽ˜์ด์ง€๋กœ ์ด๋™

loading image
Loading page
๋กœ๋”ฉ ์ค‘ ์ƒํƒœ ํ™•์ธ
๋ชจ๋ธ๋ง์ด ์™„๋ฃŒ๋˜๋ฉด ๊ฒฐ๊ณผ ํŽ˜์ด์ง€๋กœ ์ด๋™

result image
Result page
์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€ ํ™•์ธ ๋ฐ ์ €์žฅ

๐Ÿ“image resource

CollectPage Header background by Pinterest

pokeball image, search-icon image by Flaticon

CollectPage main background by Pxfuel

๐Ÿ“font resource

๋‘˜๊ธฐ๋งˆ์š”๊ณ ๋”•

About

We have fine-tuned the Instruct Pix2Pix model to create a system that can colorize characters in images using only text.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors