4.6 ๐Ÿ’ป Data Pipeline ์‹ค์Šต

4.6 ๐Ÿ’ป Data Pipeline ์‹ค์Šต#

์˜ˆ์ œ Process#

Untitled

Source Data

  • ๋„ค์ด๋ฒ„ ๋ถ€๋™์‚ฐ ๋ฐ์ดํ„ฐ ์ค‘ ํ˜„ ๋งค๋ฌผ ๋ฆฌ์ŠคํŠธ๋ฅผ Crawlingํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ–ˆ์Šต๋‹ˆ๋‹ค.

Untitled

๋ฐ์ดํ„ฐ ์ •๋ณด

Untitled

Data Lake

  • Data Lake๋Š” source data ์›๋ณธ ์ €์žฅ ์šฉ๋„๋กœ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • raw file์ธ csv ํ˜•ํƒœ๋กœ ์ €์žฅ๋˜์–ด ์žˆ์œผ๋ฉฐ Data Lake์—์„  ์ฟผ๋ฆฌ, ๋ฐ์ดํ„ฐ ํƒ์ƒ‰ ๋“ฑ์€ ๋ถˆ๊ฐ€๋Šฅํ•œ ์ƒํƒœ์ž…๋‹ˆ๋‹ค.

Untitled

Data Warehouse

  • Data Lake์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ Data Warehouse์— Loadํ•˜๋ฉด์„œ structured data๋กœ ๋ณ€ํ™˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • Data Warehouse์—์„  ์ฟผ๋ฆฌ๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ ํƒ์ƒ‰์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

Untitled

Untitled

Visualize

  • Looker studio๋ฅผ ํ™œ์šฉํ•˜์—ฌ bigquery ํ…Œ์ด๋ธ”์„ loadํ•˜์—ฌ ์‹œ๊ฐํ™”ํ•˜๋Š” ๊ณผ์ •๊นŒ์ง€ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

Untitled