Dream Booth 3D#

Information

0. Abstract#

  • DreamBooth3D : ํ”ผ์‚ฌ์ฒด์˜ 3-6๊ฐœ์˜ ์บ์ฃผ์–ผํ•œ ์ดฌ์˜ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ text-to-3D ์ƒ์„ฑ ๋ชจ๋ธ์„ personalization (๋งž์ถคํ™”)

  • DreamBooth + DreamFusion ์˜ ๊ฒฐํ•ฉ

    • DreamBooth : personalizing text-to-image models

    • DreamFusion : text-to-3D generation

  • ๋‘ ๋ฐฉ๋ฒ•๋ก ์„ ๋‚˜์ด๋ธŒํ•˜๊ฒŒ ๊ฒฐํ•ฉ์‹œ subject์˜ input viewpoints ์— ๋Œ€ํ•ด ์˜ค๋ฒ„ํ”ผํŒ…ํ•˜๋Š” ๊ฐœ์ธํ™”๋œ t2i ๋ชจ๋ธ๋กœ ์ธํ•ด Subject ์— ๋Œ€ํ•ด ๋งŒ์กฑ์Šค๋Ÿฝ์ง€ ๋ชปํ•œ 3D ๊ฒฐ๊ณผ๋ฌผ ์ƒ์„ฑ

  • t2i ๋ชจ๋ธ์˜ ๊ฐœ์ธํ™” ๊ธฐ๋Šฅ๊ณผ ํ•จ๊ป˜ NERF์˜ 3D ์ผ๊ด€์„ฑ์„ ๊ณต๋™์œผ๋กœ ํ™œ์šฉํ•˜๋Š” 3๋‹จ๊ณ„ ์ตœ์ ํ™” ์ „๋žต (3-stage optimization strategy)์„ ํ†ตํ•ด ์ด๋ฅผ ๊ทน๋ณต

  • Subject ์˜ ์ž…๋ ฅ ์ด๋ฏธ์ง€์—์„œ ๋ณผ ์ˆ˜ ์—†๋Š” ์ƒˆ๋กœ์šด ํฌ์ฆˆ, ์ƒ‰์ƒ ๋“ฑ ์— ๋Œ€ํ•ด ํ…์ŠคํŠธ ์ค‘์‹ฌ ์ˆ˜์ •์„ ํ†ตํ•ด ๊ณ ํ’ˆ์งˆ์˜ subject ์ค‘์‹ฌ์˜ 3D ๊ฒฐ๊ณผ๋ฌผ ์ƒ์„ฑ ๊ฐ€๋Šฅ


1. Introduction#

  • ๋„์ž…

    • 3D asset์ƒ์„ฑ์€ VR, ์˜ํ™”, ๊ฒŒ์ž„ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์— ์‘์šฉ ๊ฐ€๋Šฅํ•˜๋‚˜, ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋งŒ์œผ๋กœ ์ƒ์„ฑ๋œ 3D asset ์˜ ์ •์ฒด์„ฑ, ๊ธฐํ•˜ํ•™์  ๊ตฌ์กฐ, ์™ธ๊ด€์„ ์ •ํ™•ํ•˜๊ฒŒ ์ œ์–ดํ•˜๊ธฐ ์–ด๋ ค์›€.

    • ํŠนํžˆ, ํŠน์ • subject ์˜ ํŠน์„ฑ์„ ๋ฐ˜์˜ํ•˜๋Š” 3D assets ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋Šฅ๋ ฅ์— ๋Œ€ํ•œ ๊ฐœ๋ฐœ ํ•„์š”

    • T2I ๋ชจ๋ธ subject personalization (๋งž์ถคํ™”, ๊ฐœ์ธํ™”) ํƒœ์Šคํฌ์—์„œ ์„ฑ๊ณต์ ์ธ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ธ ์—ฐ๊ตฌ๋“ค์€ ๋งŽ์ง€๋งŒ, 3D asset ์ƒ์„ฑ์ด๋‚˜ 3D control ์„ ์ œ๊ณตํ•˜์ง€๋Š” ์•Š์Œ.

    • DreamBooth3D๋Š” ์†Œ์ˆ˜์˜ (3-6๊ฐœ) ์บ์ฃผ์–ผํ•˜๊ฒŒ ์ดฌ์˜๋œ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ subject ์ค‘์‹ฌ์˜ ํ…์ŠคํŠธ-3D ์ƒ์„ฑ์„ ์ œ์•ˆ

    โ‡’ NeRF ์™€ T2I ๋ชจ๋ธ์„ ํ•จ๊ป˜ ์ตœ์ ํ™”ํ•˜์—ฌ subject ์ค‘์‹ฌ์˜ 3D ์ž์‚ฐ์„ ์ƒ์„ฑํ•˜์ž !

  • ๋ฌธ์ œ์ 

    • subject์— ๋งž๊ฒŒ ๊ฐœ์ธํ™”๋œ T2I ๋ชจ๋ธ & NeRF ๋ฅผ ์ตœ์ ํ™” ํ•˜๋Š” ๊ฒƒ์€ ์—ฌ๋Ÿฌ ์‹คํŒจ ์‚ฌ๋ก€๊ฐ€ ๋ฐœ์ƒ

    • ์ฃผ์š” ๋ฌธ์ œ : ๊ฐœ์ธํ™”๋œ T2I ๋ชจ๋ธ์ด ์ œํ•œ๋œ ์ฃผ์ œ ์ด๋ฏธ์ง€์˜ ์นด๋ฉ”๋ผ ๋ทฐํฌ์ธํŠธ์— ๊ณผ์ ํ•ฉ

    • ์—ฐ์†์ ์ธ ์ž„์˜์˜ ๋ทฐํฌ์ธํŠธ์—์„œ ์ผ๊ด€๋œ 3D NeRF ๊ฒฐ๊ณผ๋ฌผ์„ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์Œ.

  • ํ•ด๊ฒฐ์ฑ…

    • DreamBooth3D๋Š” ํšจ๊ณผ์ ์ธ 3๋‹จ๊ณ„ ์ตœ์ ํ™” ๋ฐฉ์‹์„ ์ œ์•ˆ

    • Dream Booth , Dream Fusion ์‚ฌ์šฉ


    [STEP 1๏ธโƒฃ]

    • DreamBooth ๋ชจ๋ธ์„ ๋ถ€๋ถ„์ ์œผ๋กœ ๋ฏธ์„ธ ์กฐ์ •

    • DreamFusion์„ ์‚ฌ์šฉํ•˜์—ฌ NeRF ์ตœ์ ํ™”

    • ๋ถ€๋ถ„์ ์œผ๋กœ ๋ฏธ์„ธ ์กฐ์ •๋œ DreamBooth ๋ชจ๋ธ์€ ์ฃผ์–ด์ง„ ๋Œ€์ƒ ๋ทฐ์— ๊ณผ์ ํ•ฉ ๋˜์ง€ ์•Š์œผ๋ฉฐ ๋ชจ๋“  subject๋ณ„ ์„ธ๋ถ€ ์ •๋ณด๋ฅผ ์บก์ฒ˜ํ•˜์ง€ ์•Š์Œ

    • ๊ฒฐ๊ณผ์ ์œผ๋กœ ์ƒ์„ฑ๋œ NeRF ์ž์‚ฐ์€ 3D ์ผ๊ด€์„ฑ์ด ์žˆ์ง€๋งŒ subject ์— ๋Œ€ํ•œ ํŠน์„ฑ์„ ์™„์ „ํžˆ ๋ฐ˜์˜ํ•˜์ง€๋ชปํ•จ.

    [STEP 2๏ธโƒฃ]

    • DreamBooth ๋ชจ๋ธ์„ ์™„์ „ํžˆ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ์„ธ๋ถ€ ์‚ฌํ•ญ์„ ์บก์ฒ˜

    • 1๋‹จ๊ณ„์—์„œ ํ•™์Šต๋œ NeRF์˜ ๋‹ค์ค‘ ๋ทฐ ๋ Œ๋”๋ง์„ ์™„์ „ํžˆ ํ•™์Šต๋œ DreamBooth ๋ชจ๋ธ์— ํˆฌ์ž…

    • ์ด๋ฅผ ํ†ตํ•ด subject ๋ณ„๋กœ ๋‹ค์ค‘ ๋ทฐ ๊ฐ€์ƒ ์ด๋ฏธ์ง€ ์ง‘ํ•ฉ์„ ์ƒ์„ฑ

    [STEP 3๏ธโƒฃ]

    • 1๋‹จ๊ณ„์˜ ์ฃผ์–ด์ง„ subject ์ด๋ฏธ์ง€์™€ ๊ฐ€์ƒ(pseudo) ๋‹ค์ค‘ ๋ทฐ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ DreamBooth ๋ชจ๋ธ์„ ์ถ”๊ฐ€๋กœ ์ตœ์ ํ™”

    • ์ถ”๊ฐ€ ์ตœ์ ํ™”ํ•œ DreamBooth ๋กœ NeRF 3D ๋ณผ๋ฅจ์„ ์ตœ์ข… ์ตœ์ ํ™”

    • ์ตœ์ข… NeRF ์ตœ์ ํ™”์‹œ ์ถ”๊ฐ€ ๊ทœ์ œํ•ญ์œผ๋กœ pseudo ๋‹ค์ค‘ ๋ทฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ weak reconstruction loss๋ฅผ ์‚ฌ์šฉ

    • 3๋‹จ๊ณ„์— ๊ฑธ์นœ NeRF ๋ฐ T2I ๋ชจ๋ธ์˜ ํ•ฉ๋™ ์ตœ์ ํ™”๋Š” DreamBooth ๋ชจ๋ธ์ด subject ์˜ ํŠน์ • view point ์— ๊ณผ์ ํ•ฉ๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๋Š” ๋™์‹œ์— ๋™์‹œ์— ๊ฒฐ๊ณผ NeRF ๋ชจ๋ธ์ด ๋Œ€์ƒ์˜ ์ •์ฒด์„ฑ์— ์ถฉ์‹คํ•˜๋„๋ก ๋ณด์žฅ


  • ๊ฒฐ๊ณผ

    • ์‹คํ—˜ ์ƒ˜ํ”Œ ๊ฒฐ๊ณผ๋“ค์„ ํ†ตํ•ด ๋ณธ ์ ‘๊ทผ ๋ฐฉ์‹์ด ์ž…๋ ฅ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์— ์กด์žฌํ•˜๋Š” ์ปจํ…์ŠคํŠธ๋ฅผ ์กด์ค‘ํ•˜๋ฉด์„œ ์ฃผ์–ด์ง„ ๋Œ€์ƒ๊ณผ ์œ ์‚ฌ์„ฑ์ด ๋†’์€ ํ˜„์‹ค์ ์ธ 3D ์ž์‚ฐ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆ

    • ์—ฌ๋Ÿฌ ๋ฒ ์ด์Šค๋ผ์ธ๊ณผ ๋น„๊ตํ•  ๋•Œ, ์ •๋Ÿ‰์  ๋ฐ ์ •์„ฑ์  ๊ฒฐ๊ณผ๋Š” DreamBooth 3D ์ƒ์„ฑ์ด ๋ณด๋‹ค 3D ์ผ๊ด€์„ฑ์ด ์žˆ๊ณ  ๋Œ€์ƒ ์„ธ๋ถ€ ์‚ฌํ•ญ์„ ๋” ์ž˜ ํฌ์ฐฉํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์ž…์ฆ


3. Approach#

Problem setup.

Untitled_1

Fig. 721 Input and Output#

  • Input : subject ์ด๋ฏธ์ง€ ์ง‘ํ•ฉ, ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ

    • \(\left\{I_i \in \mathbb{R}^{n \times 3}\right\}(i \in\{1, \ldots, k\})\) : ๊ฐ n๊ฐœ์˜ ํ”ฝ์…€, k ์žฅ์˜ subject ์ด๋ฏธ์ง€๋“ค์˜ ์ง‘ํ•ฉ

    • context(๋งฅ๋ฝ) ๋ถ€์—ฌ, ์˜๋ฏธ ๋ณ€ํ™”๋ฅผ ์œ„ํ•œ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ T (ex) sleeping, standingโ€ฆetc.


๐ŸŒŸ Goal ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์— ์ถฉ์‹คํ•˜๋ฉด์„œ ์ฃผ์–ด์ง„ subject ์˜ identity (๊ธฐํ•˜ ํ˜•ํƒœ ๋ฐ ์™ธ๊ด€)์„ ๋ฐ˜์˜ํ•˜๋Š” 3D assets ์ƒ์„ฑ#

  • 3D volume ์—์„œ radiance ํ•„๋“œ๋ฅผ ์ธ์ฝ”๋”ฉํ•˜๋Š” MLP ๋„คํŠธ์›Œํฌ \(M\) ์œผ๋กœ ๊ตฌ์„ฑ๋œ Neural Radiance Fields (NeRF) ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ 3D assets ๋ฅผ ์ตœ์ ํ™”

  • ๋ณธ ๋ฌธ์ œ๋Š” subject ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ๋ฐ˜์˜์ด ํ•„์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ผ๋ฐ˜์ ์ธ multi-view ์ด๋ฏธ์ง€ ์บก์ฒ˜๊ฐ€ ํ•„์š”ํ•œ 3D reconstruction ์„ค์ •์— ๋น„ํ•ด ์ƒ๋‹นํžˆ ์ œํ•œ์ ์ด๊ณ  ์–ด๋ ค์šด ๋ฌธ์ œ

  • T2I personalization ๋ฐ Text-to-3D ์ตœ์ ํ™”์˜ ์ตœ๊ทผ ๋ฐœ์ „์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ธฐ์ˆ ์„ ๊ตฌ์ถ•

    โ‡’ DreamBooth personalization + DreamFusion text-to-3D๋กœ ์ตœ์ ํ™”๋ฅผ ์‚ฌ์šฉ

3.1. Preliminaries#


3.1.1 T2I diffusion models#

  • T2I diffusion models : Imagen, StableDiffusion and DALL-E 2 โ€ฆetc..

  • T2I diffusion model \(\mathcal{D}_\theta(\epsilon, \mathbf{c})\)

    • input :์ดˆ๊ธฐ ๋…ธ์ด์ฆˆ \(\epsilon\) , ํ”„๋กฌํ”„ํŠธ ํ…์ŠคํŠธ ์ž„๋ฒ ๋”ฉ \(\mathbf{c}\)

      • an initial noise \(\epsilon \sim \mathcal{N}(0,1)\)

      • text embedding \(\mathbf{c}=\Theta(T)\) (a given prompt \(T\) with a text encoder \(\Theta\))

    • output : ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋ฐ˜์˜ํ•˜์—ฌ ์ƒ์„ฑํ•œ ์ด๋ฏธ์ง€

  • T2I diffusion model ์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ํ”„๋กฌํ”„ํŠธ์™€ ์ผ์น˜ํ•˜์ง€๋งŒ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋‚ด์—์„œ ์„ธ๋ถ€์ ์ธ ์ œ์–ด๊ฐ€ ์–ด๋ ค์›€. โ†’ DreamBooth ๋ฅผ ํ†ตํ•ด ์ด๋ฅผ ํ•ด๊ฒฐ


3.1.2 Dream Booth T2I Personalization.#

Untitled_4

Fig. 722 ํŠน์ • ํ”ผ์‚ฌ์ฒด์— ๋Œ€ํ•œ ์†Œ์ˆ˜์˜ ์ด๋ฏธ์ง€ ์ง‘ํ•ฉ (3-5์žฅ) ์„ ํ†ตํ•ด ํ…์ŠคํŠธ๋กœ ์ฃผ์–ด์ง€๋Š” Context ์— ๋งž๋Š” ๋งž์ถคํ™” ์ด๋ฏธ์ง€ ์ƒ์„ฑ#

  • \(\left\{I_i\right\}\) ์—์„œ ๋„คํŠธ์›Œํฌ๋ฅผ ํŒŒ์ธํŠœ๋‹ํ•˜์—ฌ T2I diffusion ๋ชจ๋ธ์„ ๋งž์ถคํ™”, \(\left\{I_i\right\}\) : a small set of casual captures

  • DreamBooth diffusion loss : T2I model ํŒŒ์ธํŠœ๋‹์„ ์œ„ํ•ด ์‚ฌ์šฉ

    \[ \mathcal{L}_d=\mathbb{E}{\epsilon, t}\left[w_t\left\|\mathcal{D}_\theta\left(\alpha_t I_i+\sigma_t \epsilon, \mathbf{c}\right)-I_i\right\|^2\right], \]
    • \(t \sim \mathcal{U}[0,1]\) : the time-step in the diffusion proces

    • \(w_t, \alpha_t, \sigma_t\) : the corresponding scheduling parameters

  • DreamBooth Class prior preserving loss

    DreamBooth ๋Š” \(\left\{I_i\right\}\) ์— ๋Œ€ํ•œ over fitting ์„ ๋ฐฉ์ง€ํ•˜์—ฌ ๋‹ค์–‘์„ฑ์„ ๊ฐœ์„ ํ•˜๊ณ , language drift ํ˜„์ƒ์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด ์„ ํƒ์ ์œผ๋กœ class prior preserving loss ๋ฅผ ์‚ฌ์šฉ

  • ์ตœ์ข… loss : reconstruction loss + class prior preservation loss

\[ \mathbb{E}_{x, c, \epsilon, \epsilon^{\prime}, t}\left[w_t\left\|\hat{x_\theta} \left(\alpha_t x+\sigma_t \epsilon, c\right)-x\right\|_2^2+\lambda w_{t^{\prime}}\left\|\hat{x}_\theta\left(\alpha_{t^{\prime}} x_{p r}+\sigma_{t^{\prime}}\epsilon^{\prime}, c_{pr}\right)-x_{pr}\right\|_2^2\right] \]
  • (example) over fitting

Untitled_2

Fig. 723 over fitting#

  • (example) language-drift

Untitled_3

Fig. 724 language-drift#


3.1.3 DreamFusion#

Untitled_6

Fig. 725 DreamFusion process / (source : DreamFusion)#

  • T2I diffusion model์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ณผ๋ฅจ์˜ ๋žœ๋ค๋ทฐ๊ฐ€ ํ”„๋กฌํ”„ํŠธ \(T\) ์— ์ƒ์‘ํ•˜๋„๋ก NeRF \(\mathcal{M}_\phi\) (\(\phi\) : parameters) ๋ฅผ ํ†ตํ•ด ํ‘œํ˜„๋œ ๋ณผ๋ฅจ์„ ์ตœ์ ํ™”

  • normals : ๋ฐ€๋„์˜ ๊ทธ๋ž˜๋””์–ธํŠธ๋กœ๋ถ€ํ„ฐ ๊ณ„์‚ฐ๋œ nomals์€ Lambertian shading ์œผ๋กœ ๊ธฐํ•˜ํ•™์  ์‚ฌ์‹ค์„ฑ์„ ๊ฐœ์„ ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ์„ ๋žœ๋ค์œผ๋กœ relight ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋จ.

  • \(\mathcal{M}_\phi\) : mapping (camera, light (location) โ†’ albedo &density)

    • ๋žœ๋ค ๋ทฐ \(v\), ๋žœ๋ค ์กฐ๋ช…(light) ๋ฐฉํ–ฅ์ด ์ฃผ์–ด์ง€๋ฉด shaded(์Œ์˜ ์ฒ˜๋ฆฌ๋œ) ์ด๋ฏธ์ง€ \(\hat{I}v\) ๋กœ ๋ณผ๋ฅจ ๋ Œ๋”๋ง์„ ์ˆ˜ํ–‰

  • ์ด ๋•Œ ๋ณผ๋ฅจ ๋ Œ๋”๋งํ•œ ์ด๋ฏธ์ง€๊ฐ€ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ \(T\) ์ฒ˜๋Ÿผ ๋ณด์ด๋„๋ก NeRF \(\phi\) ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•ด DreamFusion ์€ score distillation sampling *(SDS) ๋ฅผ ๋„์ž…

  • score distillation sampling (SDS)

    \[ \nabla_\phi \mathcal{L}_{SDS}=\mathbb{E}{\epsilon, t}\left[w_t\left(\mathcal{D}_\theta\left(\alpha_t \hat{I}_v+\sigma_t \epsilon, \mathbf{c}\right)-\hat{I}_v\right) \frac{\partial \hat{I}_v}{\partial \phi}\right] . \]
  • ๋ Œ๋”๋ง๋œ ์ด๋ฏธ์ง€์˜ ๋…ธ์ด์ฆˆ๊ฐ€ ์ฒ˜๋ฆฌ๋œ ๋ฒ„์ „๋“ค์„ T2I diffusion model์˜ ๋‚ฎ์€ ์—๋„ˆ์ง€ ์ƒํƒœ๋กœ push

  • ๋‹ค์–‘ํ•œ views๋ฅผ ๋žœ๋ค์œผ๋กœ ์„ ํƒํ•˜๊ณ , NeRF ๋ฅผ ํ†ตํ•ด ์—ญ์ „ํŒŒ ํ•จ์œผ๋กœ์จ, rendering ๊ฒฐ๊ณผ๋“ค์ด T2I model \(\mathcal{D}_\theta\) ๋กœ ์ฃผ์–ด์ง„ ํ”„๋กฌํ”„ํŠธ์— ๋งž๊ฒŒ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์ฒ˜๋Ÿผ ๋ณด์ด๋„๋ก ํ•จ.

  • DreamFusion ์—์„œ ์‚ฌ์šฉ๋œ ์‹คํ—˜ ํ™˜๊ฒฝ์„ ์ •ํ™•ํ•˜๊ฒŒ ๋™์ผํ•˜๊ฒŒ ์‚ฌ์šฉํ•จ.

3.2 Failure of Naive Dreambooth+Fusion#


  • ํ”ผ์‚ฌ์ฒด(subject) ์ค‘์‹ฌ text-to-3D ์ƒ์„ฑ์„ ์œ„ํ•œ ์ง๊ด€์ ์ธ ์ ‘๊ทผ ๋ฐฉ์‹

    1. subject์— ๋Œ€ํ•ด T2I model ์„ pesonalized(๋งž์ถคํ™”)

    2. ๋งž์ถคํ™”๋œ T2I model ์„ text-to-3D optimization ์„ ์œ„ํ•ด ์‚ฌ์šฉ

  • ์ฆ‰, DreamBooth ์ตœ์ ํ™”(personalized) โ‡’ DreamFusion ์ตœ์ ํ™”

  • BUT, Naive Dreambooth+Fusion ์˜ ๊ฒฐํ•ฉ์€ ๋ถˆ๋งŒ์กฑ์Šค๋Ÿฌ์šด ๊ฒฐ๊ณผ๋ฅผ ์ดˆ๋ž˜


ํ•ต์‹ฌ ๋ฌธ์ œ (KEY Issue)

  • Dream Booth๊ฐ€ ํ›ˆ๋ จ๋œ ๋ทฐ์— ์กด์žฌํ•˜๋Š” subject ์˜ ๋ทฐ์— ๊ณผ์ ํ•ฉ ๋˜์–ด ์ด๋ฏธ์ง€ ์ƒ์„ฑ์—์„œ viewpoint ์— ๋Œ€ํ•œ ๋‹ค์–‘์„ฑ์ด ๊ฐ์†Œํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ž„.

  • ๋ฏธ์„ธ ์กฐ์ • ๋‹จ๊ณ„๊ฐ€ ์ฆ๊ฐ€ํ• ์ˆ˜๋ก, Subject ์œ ์‚ฌ์„ฑ ์ฆ๊ฐ€ (๐Ÿ‘) BUT input exemplar views์— ์œ ์‚ฌํ•˜๋„๋ก viewpoints ์ƒ์„ฑ (๐Ÿ‘Ž) โ‡’ ์ฆ‰, ๋‹ค์–‘ํ•œ ์‹œ์ ์—์„œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋Šฅ๋ ฅ์ด ์ €ํ•˜๋จ.


  • ์ด๋Ÿฐ DreamBooth ๋ชจ๋ธ ๊ธฐ๋ฐ˜์˜ NeRF SDS ์†์‹ค์€ ์ผ๊ด€๋œ 3D NeRF ๊ฒฐ๊ณผ๋ฌผ์„ ์–ป๊ธฐ์— ๋ถˆ์ถฉ๋ถ„

  • DreamBooth+Fusion NeRF ๋ชจ๋ธ์ด ์„œ๋กœ ๋‹ค๋ฅธ view ์— ๊ฑธ์ณ ํ•™์Šต๋œ ๋™์ผํ•œ ๋Œ€์ƒ์— ๋Œ€ํ•œ ๋ทฐ(์˜ˆ: face of a dog : ๋‹ค์–‘ํ•œ ๊ฐ๋„์—์„œ ๋ณธ ๋™์ผํ•œ dog face)๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Œ.

    • โ€œJanus problemโ€ : ๋‘ ๊ฐ€์ง€ ์ƒ๋ฐ˜๋˜๊ฑฐ๋‚˜ ์—ฐ๊ด€๋œ ์ธก๋ฉด์„ ๋™์‹œ์— ๋‹ค๋ฃจ์–ด์•ผ ํ•˜๋Š” ๋ฌธ์ œ

3.3. Dreambooth3D Optimization#


Untitled_8

Fig. 726 DreamBooth3D Overview#

  • DreamBooth3D Overview

stage-1 (์™ผ์ชฝ): ๋จผ์ € DreamBooth๋ฅผ ๋ถ€๋ถ„์ ์œผ๋กœ ํ›ˆ๋ จ์‹œํ‚ค๊ณ , ๊ฒฐ๊ณผ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ดˆ๊ธฐ NeRF๋ฅผ ์ตœ์ ํ™”

stage-2 (๊ฐ€์šด๋ฐ): ์ดˆ๊ธฐ NeRF์—์„œ ๋žœ๋ค ์‹œ์ ์— ๋”ฐ๋ผ ๋‹ค์ค‘ ์‹œ์  ์ด๋ฏธ์ง€๋ฅผ ๋ Œ๋”๋งํ•œ ํ›„, ์™„์ „ํžˆ ํ›ˆ๋ จ๋œ DreamBooth ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฅผ ๊ฐ€์ƒ ๋‹ค์ค‘ ์‹œ์  subject ์ด๋ฏธ์ง€๋กœ ๋ณ€ํ™˜

stage-3 (์˜ค๋ฅธ์ชฝ): ๋‹ค์ค‘ ์‹œ์  ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ€๋ถ„์ ์ธ DreamBooth๋ฅผ ์ถ”๊ฐ€๋กœ ๋ฏธ์„ธ ์กฐ์ •ํ•œ ๋‹ค์Œ, ๊ฒฐ๊ณผ์ ์œผ๋กœ ์–ป์–ด์ง„ ๋‹ค์ค‘ ์‹œ์  DreamBooth๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ข… NeRF 3D ์ž์‚ฐ์„ SDS ์†์‹ค๊ณผ ๋‹ค์ค‘ ์‹œ์  ์žฌ๊ตฌ์„ฑ ์†์‹ค์„ ํ†ตํ•ด ์ตœ์ ํ™”

  • ์œ„์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ  ์„ฑ๊ณต์ ์ธ subject ๋งž์ถค text-to-3D ์ƒ์„ฑ์„ ์œ„ํ•ด ํšจ์œจ์ ์ธ 3๋‹จ๊ณ„ ์ตœ์ ํ™” ๋ฐฉ์‹์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ Dream-Booth3D ์ œ์•ˆ


3.3.1 Stage 1๏ธโƒฃ: 3D with Partial DreamBooth#

Untitled_9

Fig. 727 Stage-1 : 3D with Partial DreamBooth#

  • ์ž…๋ ฅ๋œ Subject ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ DreamBooth ๋ชจ๋ธ \(\hat{\mathcal{D}}_\theta\) ๋ฅผ ํ›ˆ๋ จ


๐ŸŒŸ DreamBoothT2I ๋ชจ๋ธ์˜ ์ดˆ๊ธฐ ์ฒดํฌํฌ์ธํŠธ๊ฐ€ (=๋ถ€๋ถ„์ ์œผ๋กœ ํŒŒ์ธํŠœ๋‹ํ•œ ๊ฒฐ๊ณผ) ์ฃผ์–ด์ง„ subject view์— ๊ณผ์ ํ•ฉ๋˜์ง€ ์•Š์Œ์„ ํ™•์ธ

โ‡’ partial DreamBooth (๋ถ€๋ถ„์ ์œผ๋กœ ํŒŒ์ธํŠœ๋‹ํ•œ Dreambooth)

  • partial DreamBooth ๋ชจ๋ธ ํ•˜์— DreamFusion์€ ๋” ์ผ๊ด€๋œ 3D NeRF๋ฅผ ์ƒ์„ฑ๊ฐ€๋Šฅ

  • NeRF ์ตœ์ ํ™”์‹œ SDS ์†์‹ค ์‚ฌ์šฉ :

    • \(\nabla_\phi \mathcal{L}_{SDS}=\mathbb{E}{\epsilon, t}\left[w_t\left(\hat{\mathcal{D}}_\theta^{\text {partial }}\left(\alpha_t \hat{I}_v+\sigma_t \epsilon, \mathbf{c}\right)-\hat{I}_v\right) \frac{\partial \hat{I}_v}{\partial \phi}\right]\)

      • \(\hat{\mathcal{D}}_\theta^{\text {partial }}\): partial DreamBooth

      • SDS ์†์‹ค์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฃผ์–ด์ง„ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์— ๋Œ€ํ•œ ์ดˆ๊ธฐ NeRF ์ž์‚ฐ์„ ์ตœ์ ํ™”

  • partial DreamBooth ๋ชจ๋ธ๊ณผ NeRF ๊ฒฐ๊ณผ๋ฌผ์€ ์ž…๋ ฅ๋œ subject ์™€ ์™„์ „ํžˆ ์œ ์‚ฌํ•˜์ง€ ์•Š์Œ


๐ŸŒŸ ์ฆ‰, Stage-1๏ธโƒฃ ์—์„œ์˜ ์ดˆ๊ธฐ NeRF ๋Š” ์ฃผ์–ด์ง„ subject ์™€ ๋ถ€๋ถ„์ ์œผ๋กœใ… ์œ ์‚ฌํ•˜๋ฉด์„œ, ์ฃผ์–ด์ง„ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์— ์ถฉ์‹คํ•œ subject class 3D ๋ชจ๋ธ


3.3.2 Stage 2๏ธโƒฃ: Multi-view Data Generation#


๐ŸŒŸ Stage-2 Multi-view Data Generation : ๋ณธ ์ ‘๊ทผ๋ฒ•์˜ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋ถ€๋ถ„

์ผ๊ด€์„ฑ์„ ๊ฐ–์ถ˜ 3D initial NeRF ์™€ fully-trained DreamBooth ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ pseudo multi-view subject ์ด๋ฏธ์ง€๋“ค์„ ์ƒ์„ฑ


  1. Initial NeRF ๋กœ๋ถ€ํ„ฐ ๋‹ค์–‘ํ•œ ๋žœ๋ค viewpoints \(\{v\}\)์„ ๋”ฐ๋ผ ์—ฌ๋Ÿฌ ์ด๋ฏธ์ง€\(\left\{\hat{I}v \in \mathbb{R}^{n \times 3}\right\}\) ๋ฅผ ๋ Œ๋”๋งํ•˜์—ฌ ๋‹ค์ค‘ ์‹œ์  ๋ Œ๋”๋ง์„ ์ƒ์„ฑ

  2. ๊ฐ ๋ Œ๋”๋ง์— ๊ณ ์ •๋œ ์–‘์˜ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” forward diffusion ๊ณผ์ •์„ ํ†ตํ•ด \(t_{pseudo}\)๋กœ ์ „ํ™˜

  3. reverse diffusion ๊ณผ์ •์„ ์‹คํ–‰ํ•˜์—ฌ fully-trained DreamBooth ๋ชจ๋ธ \(\hat{\mathcal{D}}_\theta\) ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ˜ํ”Œ์„ ์ƒ์„ฑ

    • ์ƒ˜ํ”Œ๋ง ๊ณผ์ •์€ ๊ฐ ๋ทฐ์— ๋Œ€ํ•ด ๋…๋ฆฝ์ ์œผ๋กœ ์ˆ˜ํ–‰

    • Initial NeRF ๊ฒฐ๊ณผ๋ฌผ ์— ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•œ noisy render ๋ฅผ ์กฐ๊ฑด์œผ๋กœ ์ง€์ •ํ•จ์œผ๋กœ์จ, ๋„“์€ ๋ฒ”์œ„์˜ ์‹œ์ ์„ ์ปค๋ฒ„ํ•˜๋ฉด์„œ subject ๋ฅผ ์ž˜ ๋‚˜ํƒ€๋‚ด๋Š” ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๊ฐ€๋Šฅ โ‡’ ๋‹ค์–‘ํ•œ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ์ด๋ฏธ์ง€๋ฅผ ์กฐ๊ฑด์œผ๋กœ ํ•™์Šต์‹œ, ๋‹ค์–‘ํ•œ ๋ณ€ํ˜•์— ๋Œ€ํ•œ ํ•™์Šต ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ

    • BUT reverse diffusion ๊ณผ์ •์€ ๋‹ค๋ฅธ ๋ทฐ์— ๋‹ค๋ฅธ ์„ธ๋ถ€ ์‚ฌํ•ญ์„ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€๋Š” multi-view ์— ๋Œ€ํ•œ ์ผ๊ด€์„ฑ์ด ์—†์Œ.

      โ‡’ ๊ฐ€์ƒ(pseudo) ๋‹ค์ค‘ ์‹œ์  ์ด๋ฏธ์ง€ ์ง‘ํ•ฉ (collection of pseudo multi-view images)


๐Ÿ”‘ Key insight

  1. ์ดˆ๊ธฐ NeRF ์ด๋ฏธ์ง€๊ฐ€ unseen views ์— ๊ฐ€๊นŒ์šธ ๊ฒฝ์šฐ, DreamBooth๊ฐ€ Subject ์˜ unseen views๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ƒ์„ฑ ๊ฐ€๋Šฅ

  2. ์ž…๋ ฅ ์ด๋ฏธ์ง€์— ๋น„ํ•ด Subject ์™€ ๋” ์œ ์‚ฌํ•œ ์ถœ๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ƒ์„ฑ๊ฐ€๋Šฅ**


  • ์œ„ ๊ทธ๋ฆผ์„ ํ†ตํ•ด ์ฒดํฌํ•  ๋ถ€๋ถ„

    • fully-trained DreamBooth ๋ฅผ ์‚ฌ์šฉํ•œ Img2Img ๋ณ€ํ™˜์˜ ์ƒ˜ํ”Œ ์ถœ๋ ฅ

    • ์ž…๋ ฅ NeRF ๋ Œ๋”๋ง์˜ ์‹œ์ ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ subject ์ด๋ฏธ์ง€์™€ ๋” ์œ ์‚ฌํ•œ ๋ชจ์Šต

    • ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค๊ณผ ๋‹ฌ๋ฆฌ Img2Img ๋ณ€ํ™˜์„ DreamBooth, NeRF 3D assets ๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ ์‚ฌ์šฉ (๊ธฐ์กด ์—ฐ๊ตฌ์˜ ๊ฒฝ์šฐ Img2Img ๋ณ€ํ™˜์„ ์ด๋ฏธ์ง€ editing ์‘์šฉ์œผ๋กœ๋งŒ ์‚ฌ์šฉ)

3.3.3 Stage3๏ธโƒฃ: Final NeRF with Multi-view DreamBooth#

Untitled_12

Fig. 728 Stage-3 : Final NeRF with Multi-view DreamBooth SDS์™€ multi-view reconstruction ์†์‹ค์„ ์‚ฌ์šฉํ•œ ์ตœ์ข… NeRF ์ตœ์ ํ™”#

์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ \(I^{\text{aug}}\) ์ƒ์„ฑ

  • ๊ฐ€์ƒ ๋‹ค์ค‘ ์‹œ์  ์ด๋ฏธ์ง€ \(\left\{I_v^{\text {pseudo }}\right\}\), ์ž…๋ ฅ Subject ์ด๋ฏธ์ง€ \(\left\{I_i\right\}\) ์˜ ๊ฒฐํ•ฉ์„ ํ†ตํ•ด ์ƒ์„ฑ

\[ I^{\text {aug }}=\left\{I_v^{\text {pseudo }}\right\} \cup\left\{I_i\right\} \]

\(I^{\text {aug}}\) ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ข… Multi-view DreamBooth ๋ชจ๋ธ์„ ์ตœ์ ํ™”

  1. 1๋‹จ๊ณ„์—์„œ partial DreamBooth \(\hat{\mathcal{D}}_{\theta^*}\) ์ค€๋น„

  2. ์œ„์˜ ์ฆ๊ฐ• ๋ฐ์ดํ„ฐ \(I^{\text {aug}}\) ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ \(\hat{\mathcal{D}}_{\theta^*}\) ์— ๋Œ€ํ•œ ํŒŒ์ธํŠœ๋‹์„ ์ถ”๊ฐ€ ์ง„ํ–‰

  3. Multi-view DreamBooth \(\hat{\mathcal{D}}_\theta^{\mathrm{multi}}\) ๋ฅผ ์ƒ์„ฑ

\(\hat{\mathcal{D}}_\theta^{\text {multi }}\) ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ DreamFusion SDS Loss ์™€ ํ•จ๊ป˜ NeRF 3D assets ๋ฅผ ์ตœ์ ํ™”

  • 1๋‹จ๊ณ„์˜ partial DreamBooth์— ๋น„ํ•ด multi-view DreamBooth ์˜ ๋ทฐ ์ผ๋ฐ˜ํ™”์™€ subject ๋ณด์กด ๋Šฅ๋ ฅ์ด ๋” ์šฐ์ˆ˜ํ•˜๊ธฐ ๋•Œ๋ฌธ์— subject idendtity๊ฐ€ ์ƒ๋‹นํžˆ ํ–ฅ์ƒ๋œ NeRF ๋ชจ๋ธ ์ƒ์„ฑ ๊ฐ€๋Šฅ

  • BUT SDS ์†์‹ค๋งŒ ์‚ฌ์šฉ์‹œ ์ตœ์ ํ™”๋œ NeRF assets ์ด

    • ์ฃผ์–ด์ง„ subject ์— ๋Œ€ํ•ด ์šฐ์ˆ˜ํ•œ ๊ธฐํ•˜ํ•™์  ์œ ์‚ฌ์„ฑ ๋ณด์œ 

    • Color saturation artifacts ํ˜„์ƒ ๋‹ค์ˆ˜ ๋ฐœ์ƒ

    • ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด \(\left\{I_v^{\mathrm{pseudo}}\right\}\) ๋ฅผ ์‚ฌ์šฉํ•œ ์ƒˆ๋กœ์šด weak reconstruction loss ๋„์ž…

    • **** Color saturation artifacts :**

      • ์ƒ‰์ƒ์˜ ๊ณผ๋„ํ•œ ํฌํ™”(saturation)๋กœ ์ธํ•ด ๋น„ํ˜„์‹ค์ ์ด๊ฑฐ๋‚˜ ์™œ๊ณก๋œ ์ƒ‰์ƒ ํ‘œํ˜„์ด ๋‚˜ํƒ€๋‚˜๋Š” ๊ฒฐํ•จ ํ˜„์ƒ

      • ๋ชจ๋ธ์ด ํŠน์ • ์ƒ‰์ƒ์„ ๊ณผ๋„ํ•˜๊ฒŒ ๊ฐ•์กฐํ•˜๋Š” ๊ฒฝ์šฐ ๋ฐœ์ƒ

      • ์ƒ‰์ƒ ๊ฐ’์„ ์ž˜๋ชป ์˜ˆ์ธกํ•˜์—ฌ ๋น„ํ˜„์‹ค์ ์ธ ์ƒ‰์ƒ ํ‘œํ˜„์ด ๋‚˜ํƒ€๋‚œ ๊ฒฝ์šฐ ๋ฐœ์ƒ

      • ๋‹ค์–‘ํ•œ ์‹œ์ ์—์„œ ์ผ๊ด€๋œ ์ƒ‰์ƒ ํ‘œํ˜„์„ ์œ ์ง€ํ•˜์ง€ ๋ชปํ•œ ๊ฒฝ์šฐ ๋ฐœ์ƒ


Reconstruction loss

  • \(\left\{I_v^{\mathrm{pseudo}}\right\}\) ๊ฐ€ ์ƒ์„ฑ๋œ ์นด๋ฉ”๋ผ ๋งค๊ฐœ๋ณ€์ˆ˜ \(\left\{P_v\right\}\) ์ •๋ณด๋ฅผ ์•Œ๊ณ  ์žˆ์œผ๋ฏ€๋กœ, ๋‘ ๋ฒˆ์งธ NeRF MLP \(\mathcal{F}\gamma\) ์˜ ํ›ˆ๋ จ์„ reconstruction loss ๋ฅผ ํ†ตํ•ด ์ถ”๊ฐ€๋กœ ๊ทœ์ œ

    \[ \mathcal{L}_{recon }=\left\|\Gamma\left(\mathcal{F}_\gamma, P_v\right)-I_v^{\text {pseudo }}\right\|_p, \]
    • \(\Gamma\left(\mathcal{F}\gamma, P_v\right)\) : ์นด๋ฉ”๋ผ ์‹œ์  \(P_v\) ๋ฅผ ๋”ฐ๋ผ NeRF \(\mathcal{F}\gamma\) ์—์„œ ์ด๋ฏธ์ง€๋ฅผ ๋ Œ๋”๋งํ•˜๋Š” ํ•จ์ˆ˜

  • Reconstruction loss ์˜ ๋ชฉ์ 

    • ์ƒ์„ฑ๋œ ๋ณผ๋ฅจ์˜ ์ƒ‰์ƒ ๋ถ„ํฌ๋ฅผ image exemplars ๊ณผ ๋” ๊ฐ€๊น๊ฒŒ ์กฐ์ •

    • unseen views์—์„œ subject ์œ ์‚ฌ์„ฑ์„ ํ–ฅ์ƒ

    Final NeRF Loss function

\[ \mathcal{L}=\lambda_{\text {recon }} \mathcal{L}_{\text {recon }}+\lambda_{\text {SDS }} \mathcal{L}_{\text {SDS }}+\lambda_{\text {nerf }} \mathcal{L}_{\text {nerf }} \]
  • \(\mathcal{L}_{\text {nerf }}\) ๋Š” Mip-NeRF360 [2]์—์„œ ์‚ฌ์šฉ๋œ ์ถ”๊ฐ€์ ์ธ NeRF ์ •๊ทœํ™”


4. Experiments#


Implementation Details.

  • ์‚ฌ์šฉ ๋ชจ๋ธ:

    • T2I : Imagen T2I ๋ชจ๋ธ

    • Text-encoding: T5-XXL

    • NeRF : DreamFusion

  • ํ›ˆ๋ จ ์‹œ๊ฐ„: 4core TPUv4, ๊ฐ ํ”„๋กฌํ”„ํŠธ๋‹น 3๋‹จ๊ณ„ ์ตœ์ ํ™”๋ฅผ ์™„๋ฃŒํ•˜๋Š” ๋ฐ ์•ฝ 3์‹œ๊ฐ„ ์†Œ์š”

  • ํ›ˆ๋ จ ๋‹จ๊ณ„:

    • ๋ถ€๋ถ„ DreamBooth ๋ชจ๋ธ (\(D_ฮธ^{partial}\)) : 150๋ฒˆ์˜ ๋ฐ˜๋ณตํ›ˆ๋ จ

    • ์ „์ฒด DreamBooth ๋ชจ๋ธ (\(D_ฮธ\)) : 800๋ฒˆ ๋ฐ˜๋ณต ํ›ˆ๋ จ์‹œ ์ตœ์ ์˜ ์„ฑ๋Šฅ

  • pseudo multi-view data generation : ์›์ ์—์„œ ๊ณ ์ •๋œ ๋ฐ˜๊ฒฝ์œผ๋กœ ๊ท ์ผํ•˜๊ฒŒ ์ƒ˜ํ”Œ๋งํ•œ 20๊ฐœ์˜ ์ด๋ฏธ์ง€๋ฅผ ๋ Œ๋”๋ง

  • Stage-3 Multi-view DreamBooth \(\hat{\mathcal{D}}_\theta^{\mathrm{multi}}\): 3๋‹จ๊ณ„์—์„œ ์ถ”๊ฐ€๋กœ 150๋ฒˆ ๋ฐ˜๋ณตํ•˜์—ฌ ๋ถ€๋ถ„์ ์œผ๋กœ ํ›ˆ๋ จ๋œ \(\hat{D}_{ฮธ}^โˆ—\) ๋ชจ๋ธ์„ Finetuning

  • Hyperparams : supplementary material ์ฐธ๊ณ 


Datasets.

  • ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ: ๊ณต๊ฐœ๋œ ์ด๋ฏธ์ง€ ์ปฌ๋ ‰์…˜์„ ์‚ฌ์šฉํ•˜์—ฌ personalized text-to-3D ๋ชจ๋ธ์„ ํ›ˆ๋ จ

    • ๋‹ค์–‘ํ•œ subject(๊ฐœ, ์žฅ๋‚œ๊ฐ, ๋ฐฐ๋‚ญ, ์„ ๊ธ€๋ผ์Šค, ๋งŒํ™” ์บ๋ฆญํ„ฐ ๋“ฑ) ์˜ 4-6๊ฐœ์˜ casual ์ด๋ฏธ์ง€๋ฅผ ํฌํ•จํ•œ 30๊ฐœ์˜ ๋‹ค๋ฅธ ์ด๋ฏธ์ง€ ์ปฌ๋ ‰์…˜์œผ๋กœ ๊ตฌ์„ฑ

  • ํฌ๊ท€ ๊ฐ์ฒด ์„ฑ๋Šฅ ๋ถ„์„: โ€œ์˜ฌ๋นผ๋ฏธ ์žฅ์‹ํ’ˆโ€๊ณผ ๊ฐ™์€ ํฌ๊ท€ํ•œ ๋Œ€์ƒ์˜ ์„ฑ๋Šฅ์„ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•ด ์ถ”๊ฐ€ ์ด๋ฏธ์ง€ ์ˆ˜์ง‘

  • 3-6๊ฐœ์˜ ํ”„๋กฌํ”„ํŠธ์— ๋Œ€ํ•ด ๊ฐ 3D ๋ชจ๋ธ์„ ์ตœ์ ํ™”ํ•˜์—ฌ 3D contextualizations ๋ฌธ๋งฅํ™” ์‹œ์—ฐ


Baselines.

  • Latent-NeRF

    • RGB ํ”ฝ์…€ ๊ณต๊ฐ„์ด ์•„๋‹Œ Stable Diffusion ์˜ latent feature ๊ณต๊ฐ„์—์„œ SDS ์†์‹ค์„ ํ†ตํ•ด 3D NeRF ๋ชจ๋ธ์„ ํ•™์Šต

    • baseline ์œผ๋กœ์จ fully dreamboothed T2I model ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Latent-NeRF ์‹คํ–‰

  • DreamFusion+DreamBooth: DreamBooth ํ™•์‚ฐ ๋ชจ๋ธ์„ ๋จผ์ € ํ›ˆ๋ จํ•œ ํ›„ DreamFusion์„ ์‚ฌ์šฉํ•˜์—ฌ 3D NeRF๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ๋‹จ์ผ ๋‹จ๊ณ„ ์ ‘๊ทผ ๋ฐฉ์‹

  • ๋ณธ ์—ฐ๊ตฌ์˜ 3๋‹จ๊ณ„ ์ตœ์ ํ™” ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋ก  : โ€œDreamBooth3Dโ€


Evaluation Metrics.

  • CLIP R-Precision

    • rendering๋œ ์žฅ๋ฉด๋“ค์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ํ”„๋กฌํ”„ํŠธ์™€ ์–ผ๋งˆ๋‚˜ ์ •ํ™•ํ•˜๊ฒŒ ์ผ์น˜ํ•˜๋Š”์ง€ ๋น„์œจ์„ ๋‚˜ํƒ€๋ƒ„.

    • CLIP ViT-B/16, ViT-B/32, ViT-L-14 ๋ชจ๋ธ์„ ํ‰๊ฐ€์— ์‚ฌ์šฉ

  • ์ถ”๊ฐ€์ ์œผ๋กœ user study ์ˆ˜ํ–‰ (๋’ค์— ์–ธ๊ธ‰)

4.1. Results#

Visual Results

  • ๋น„๊ต ๊ฒฐ๊ณผ: DreamBooth3D, Latent-NeRF, DreamBooth+Fusion ๊ธฐ์ค€ ๋ชจ๋ธ์˜ ๋น„๊ต

    • Latent-NeRF : ์ผ๋ถ€ ๊ฒฝ์šฐ(์˜ค๋ฆฌ)์—์„œ ์ ์ ˆํžˆ ์ž‘๋™ํ•˜์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ ์ผ๊ด€๋œ 3D ๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‹คํŒจ

    • DreamBooth+Fusion : ์—ฌ๋Ÿฌ ์‹œ์ ์—์„œ ๋™์ผํ•œ ์™ธํ˜• ๋ฐ ๊ตฌ์กฐ๋ฅผ ๋ณด์ž„

    • DreamBooth3D : 360๋„ ์ผ๊ด€๋œ 3D Asset์„ ์ƒ์„ฑํ•˜๋ฉฐ, ์ฃผ์–ด์ง„ subject ์˜ ๊ธฐํ•˜ํ•™์  ๊ตฌ์กฐ ๋ฐ ์™ธ๊ด€์˜ ์„ธ๋ถ€ ์‚ฌํ•ญ์„ ์ž˜ ๋ฐ˜์˜ํ•จ


Initial vs. Final NeRF

  • 1๋‹จ๊ณ„์™€ 3๋‹จ๊ณ„์—์„œ ์ƒ์„ฑ๋œ ์ดˆ๊ธฐ NeRF์™€ ์ตœ์ข… NeRF ๊ฒฐ๊ณผ

  • ์ดˆ๊ธฐ NeRF : ์ฃผ์–ด์ง„ subject ์™€ ๋ถ€๋ถ„์ ์œผ๋กœ๋งŒ ์œ ์‚ฌ, 3D ์ผ๊ด€์„ฑ์„ ์œ ์ง€

  • ์ตœ์ข… NeRF : ์ฃผ์–ด์ง„ subject ์™€ ๋” ์œ ์‚ฌํ•˜, ์ผ๊ด€๋œ 3D ๊ตฌ์กฐ๋ฅผ ์œ ์ง€

  • ์ด๋Ÿฌํ•œ ์˜ˆ์‹œ๋Š” DreamBooth3D์˜ 3๋‹จ๊ณ„ ์ตœ์ ํ™”๊ฐ€ ํ•„์š”ํ•จ์„ ์ž…์ฆ (?)


User Study.

โ†’ DreamBooth3D์™€ ๋น„๊ต ๋ชจ๋ธ๋“ค์„ ์„ธ๊ฐ€์ง€์ธก๋ฉด์— ๋Œ€ํ•ด ์•„๋ž˜์˜ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์œผ๋กœ ํ‰๊ฐ€** 
  1. subject ์ถฉ์‹ค๋„: โ€œ์–ด๋–ค 3D ํ•ญ๋ชฉ์ด subject ์™€ ๋” ์œ ์‚ฌํ•˜๊ฒŒ ๋ณด์ž…๋‹ˆ๊นŒ?โ€

  2. 3D ์ผ๊ด€์„ฑ๊ณผ ํƒ€๋‹น์„ฑ: โ€œ์–ด๋–ค 3D ํ•ญ๋ชฉ์ด ๋” ํƒ€๋‹นํ•˜๊ณ  ์ผ๊ด€๋œ ๊ธฐํ•˜ํ•™์  ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๊นŒ?โ€

  3. ํ”„๋กฌํ”„ํŠธ ์ถฉ์‹ค๋„: โ€œ์–ด๋–ค ๋น„๋””์˜ค๊ฐ€ ์ œ๊ณต๋œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋” ์ž˜ ๋ฐ˜์˜ํ•ฉ๋‹ˆ๊นŒ?โ€

  • ์—ฐ๊ตฌ ๋ฐฉ๋ฒ•

    • 3D ์ผ๊ด€์„ฑ๊ณผ ์ฃผ์ œ ์ถฉ์‹ค๋„ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ฐ์ดํ„ฐ์…‹์˜ 30๊ฐœ subject ๊ฐ๊ฐ์— ๋Œ€ํ•ด ํšŒ์ „ ๋น„๋””์˜ค ๊ฒฐ๊ณผ๋ฅผ ์ œ์‹œํ•˜๊ณ  11๋ช…์˜ ์‚ฌ์šฉ์ž๊ฐ€ ๊ฐ ์Œ์— ๋Œ€ํ•ด ์‘๋‹ต

    • ํ”„๋กฌํ”„ํŠธ ์ถฉ์‹ค๋„ ์—ฐ๊ตฌ์—์„œ๋Š” 54๊ฐœ์˜ ๊ณ ์œ ํ•œ ํ”„๋กฌํ”„ํŠธ์™€ ์ฃผ์ œ ์Œ์— ๋Œ€ํ•ด ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•˜๊ณ , 21๋ช…์˜ ์‚ฌ์šฉ์ž๊ฐ€ ์‘๋‹ต

  • ์ตœ์ข… ๊ฒฐ๊ณผ

    • ์ตœ์ข… ๊ฒฐ๊ณผ๋Š” ๋‹ค์ˆ˜๊ฒฐ ํˆฌํ‘œ๋ฅผ ํ†ตํ•ด ์‚ฐ์ถœ

    • DreamBooth3D๋Š” 3D ์ผ๊ด€์„ฑ, ์ฃผ์ œ ์ถฉ์‹ค๋„, ํ”„๋กฌํ”„ํŠธ ์ถฉ์‹ค๋„์—์„œ ๊ธฐ์ค€ ๋ชจ๋ธ๋“ค๋ณด๋‹ค ์œ ์˜๋ฏธํ•˜๊ฒŒ ๋” ์„ ํ˜ธ๋จ.

4.2. Sample Applications#

  • Recontextualization. (์žฌ๋ฌธ๋งฅํ™”)

    • ๋‹จ์ˆœํ•œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๊ฐœ ์ฃผ์ œ์˜ 3D ๋ชจ๋ธ๋กœ ์žฌ๋ฌธ๋งฅํ™”ํ•œ ์ƒ˜ํ”Œ ๊ฒฐ๊ณผ

    • ๋ชจ๋“  subject ์—์„œ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์— ์ฃผ์–ด์ง„ ๋ฌธ๋งฅ์„ ์ผ๊ด€๋˜๊ฒŒ ๋ฐ˜์˜

    • ์ถœ๋ ฅ๋œ 3D ๋ชจ๋ธ์˜ ์ž์„ธ์™€ ๋กœ์ปฌ ๋ณ€ํ˜•์€ ์ž…๋ ฅ ์ด๋ฏธ์ง€์— ์—†๋Š” ํฌ์ฆˆ์ž„์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๋งค์šฐ ์‚ฌ์‹ค์ 

  • Color/Material Editing.

    • ์ƒ‰์ƒ ํŽธ์ง‘ ๋ฐ ์žฌ์งˆ ํŽธ์ง‘

  • Accessorization

    • subject ์— ์•ก์„ธ์„œ๋ฆฌ ์ถ”๊ฐ€

  • Stylization

    • ํฌ๋ฆผ์ƒ‰ ์‹ ๋ฐœ์„ ์ƒ‰์ƒ๊ณผ ํ”„๋ฆด ์ถ”๊ฐ€๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์Šคํƒ€์ผํ™”

  • Cartoon-to-3D

    • ๋น„์‚ฌ์‹ค์  ํ”ผ์ƒ์ฒด ์ด๋ฏธ์ง€(์˜ˆ: 2D ํ‰๋ฉด ์บ๋ฆญํ„ฐ)๋ฅผ ๊ทธ๋Ÿด๋“ฏํ•œ 3D ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜

    • ๋ชจ๋“  subject ์ด๋ฏธ์ง€๊ฐ€ ์ •๋ฉด์ž„์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ๊ทธ๋Ÿด๋“ฏํ•œ 3D ๊ฒฐ๊ณผ๋ฌผ ์ƒ์„ฑ

4.3. Limitations#


limitation

Fig. 729 limitations#

  1. ์ตœ์ ํ™”๋œ 3D ํ‘œํ˜„์ด ๋•Œ๋•Œ๋กœ ๊ณผ๋„ํ•˜๊ฒŒ ํฌํ™”๋˜๊ณ  ๋งค๋„๋Ÿฝ๊ฒŒ ์ฒ˜๋ฆฌ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์กด์žฌ

    1. ๋†’์€ ๊ฐ€์ค‘์น˜ ๊ฐ€์ด๋˜์Šค๋ฅผ ๊ฐ€์ง„ SDS ๊ธฐ๋ฐ˜ ์ตœ์ ํ™”์— ์˜ํ•ด ๋ฐœ์ƒ

    2. 64ร—64 ํ”ฝ์…€์ด๋ผ๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ๋‚ฎ์€ ์ด๋ฏธ์ง€ ํ•ด์ƒ๋„๋กœ ์ œํ•œ๋˜์–ด ๋ฐœ์ƒ

    3. diffusion ๊ณผ NeRF ์˜ ํšจ์œจ์„ฑ ํ–ฅ์ƒ์€ ๋” ๋†’์€ ํ•ด์ƒ๋„๋กœ ํ™•์žฅํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅ์„ฑ์„ ์ œ๊ณต

  2. Janus problem : ์ตœ์ ํ™”๋œ 3D ํ‘œํ˜„์€ ์ž…๋ ฅ ์ด๋ฏธ์ง€์— ์‹œ์  ๋ณ€ํ™”๊ฐ€ ์—†์œผ๋ฉด ์—ฌ๋Ÿฌ ๋ถˆ์ผ์น˜ํ•œ ์‹œ์ ์—์„œ ์ •๋ฉด์œผ๋กœ ๋ณด์ด๋Š” viewpoints ๋ถˆ์ผ์น˜ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒ

  3. ์„ ๊ธ€๋ผ์Šค์™€ ๊ฐ™์€ ์–‡์€ ๊ฐ์ฒด ๊ตฌ์กฐ๋ฅผ ์žฌ๊ตฌ์„ฑํ•˜๋Š” ๋ฐ ์–ด๋ ค์›€์ด ์กด์žฌ

5. Conclusion#

  • Subject ์ค‘์‹ฌ์˜ ํ…์ŠคํŠธ-3D ์ƒ์„ฑ์„ ์œ„ํ•œ ๋ฐฉ๋ฒ•์ธ DreamBooth3D๋ฅผ ์ œ์•ˆ

  • Subject ์— ๋Œ€ํ•œ ์†Œ๊ทœ๋ชจ casual ์ด๋ฏธ์ง€ ์…‹ํŠธ๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, (์นด๋ฉ”๋ผ ํฌ์ฆˆ์™€ ๊ฐ™์€ ์ถ”๊ฐ€ ์ •๋ณด ์—†์ด) ์ž…๋ ฅ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์—์„œ ์ œ๊ณต๋œ ์ปจํ…์ŠคํŠธ(์ž๊ณ  ์žˆ๋Š”, ์ ํ”„ํ•˜๋Š”, ๋นจ๊ฐ„ ๋“ฑ)๋ฅผ ์ค€์ˆ˜ํ•˜๋Š” subject ๋ณ„ 3D assets ๋ฅผ ์ƒ์„ฑ

  • DreamBooth ๋ฐ์ดํ„ฐ์…‹ ์— ๋Œ€ํ•œ ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ํ•ด๋‹น ๋ฐฉ๋ฒ•์ด ์ฃผ์–ด์ง„ subject ์™€ ๋†’์€ ์œ ์‚ฌ์„ฑ์„ ๊ฐ€์ง€๋ฉด์„œ๋„ ์ž…๋ ฅ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์— ๋‚˜ํƒ€๋‚œ ์ปจํ…์ŠคํŠธ๋ฅผ ์ž˜ ๋ฐ˜์˜ํ•˜๋Š” ํ˜„์‹ค์ ์ธ 3D assets ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆ

  • ์ •๋Ÿ‰์  ๋ฐ ์ •์„ฑ์  ํ‰๊ฐ€์—์„œ ์—ฌ๋Ÿฌ ๊ธฐ์ค€ ๋ชจ๋ธ๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ž„์„ ํ™•์ธ