์˜ˆ์ธก ๋ชจ๋ธ์˜ ์˜ˆ์ธก ๊ฐ€์ ธ์˜ค๊ธฐ

์ด ํŽ˜์ด์ง€์—์„œ๋Š” ํ•™์Šต๋œ ์˜ˆ์ธก ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์˜ˆ์ธก์„ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

์˜ˆ์ธก์„ ๋งŒ๋“ค๋ ค๋ฉด ์ž…๋ ฅ ์†Œ์Šค์™€ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ์ €์žฅํ•  ์ถœ๋ ฅ ์œ„์น˜๋ฅผ ์ง€์ •ํ•˜์—ฌ ์˜ˆ์ธก ๋ชจ๋ธ์— ์ง์ ‘ ์ผ๊ด„ ์˜ˆ์ธก ์š”์ฒญ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

AutoML์„ ํ†ตํ•œ ์˜ˆ์ธก์€ ์—”๋“œํฌ์ธํŠธ ๋ฐฐํฌ๋‚˜ ์˜จ๋ผ์ธ ์˜ˆ์ธก๊ณผ ํ˜ธํ™˜๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์˜ˆ์ธก ๋ชจ๋ธ์—์„œ ์˜จ๋ผ์ธ ์˜ˆ์ธก์„ ์š”์ฒญํ•˜๋ ค๋ฉด ์˜ˆ์ธก์„ ์œ„ํ•œ ํ…Œ์ด๋ธ” ํ˜•์‹ ์›Œํฌํ”Œ๋กœ๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”.

์„ค๋ช…(ํŠน์„ฑ ๊ธฐ์—ฌ ๋ถ„์„์ด๋ผ๊ณ ๋„ ํ•จ)์ด ํฌํ•จ๋œ ์˜ˆ์ธก์„ ์š”์ฒญํ•˜์—ฌ ๋ชจ๋ธ์ด ์˜ˆ์ธก์— ์–ด๋–ป๊ฒŒ ๋„์ฐฉํ–ˆ๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋กœ์ปฌ ํŠน์„ฑ ์ค‘์š”๋„ ๊ฐ’์€ ๊ฐ ํŠน์„ฑ์ด ์˜ˆ์ธก ๊ฒฐ๊ณผ์— ์–ผ๋งˆ๋‚˜ ๊ธฐ์—ฌํ–ˆ๋Š”์ง€ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ๊ฐœ๋… ๊ฐœ์š”๋Š” ์˜ˆ์ธก์šฉ ํŠน์„ฑ ๊ธฐ์—ฌ ๋ถ„์„์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์‹œ์ž‘ํ•˜๊ธฐ ์ „์—

์˜ˆ์ธก์„ ๋งŒ๋“ค๋ ค๋ฉด ๋จผ์ € ์˜ˆ์ธก ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผœ์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ž…๋ ฅ ๋ฐ์ดํ„ฐ

์ผ๊ด„ ์˜ˆ์ธก ์š”์ฒญ์˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋Š” ๋ชจ๋ธ์ด ์˜ˆ์ธก์„ ๋งŒ๋“œ๋Š” ๋ฐ ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์ดํ„ฐ์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ ๋‘ ๊ฐ€์ง€ ํ˜•์‹ ์ค‘ ํ•˜๋‚˜๋กœ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • Cloud Storage์˜ CSV ๊ฐ์ฒด
  • BigQuery ํ…Œ์ด๋ธ”

๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉํ•œ ํ˜•์‹๊ณผ ๋™์ผํ•œ ํ˜•์‹์„ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์— ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด BigQuery์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผฐ์œผ๋ฉด BigQuery ํ…Œ์ด๋ธ”์„ ์ผ๊ด„ ์˜ˆ์ธก์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ์ข‹์Šต๋‹ˆ๋‹ค. Vertex AI๋Š” ๋ชจ๋“  CSV ์ž…๋ ฅ ํ•„๋“œ๋ฅผ ๋ฌธ์ž์—ด๋กœ ์ทจ๊ธ‰ํ•˜๋ฏ€๋กœ ํ•™์Šต ๋ฐ์ดํ„ฐ์™€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ˜•์‹์„ ํ˜ผํ•ฉํ•˜๋ฉด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์†Œ์Šค์—๋Š” ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๋ชจ๋“  ์—ด์„ ์–ด๋–ค ์ˆœ์„œ๋Œ€๋กœ๋“  ํฌํ•จํ•˜๋Š” ํ…Œ์ด๋ธ” ํ˜•์‹ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต ๋ฐ์ดํ„ฐ์— ์—†๊ฑฐ๋‚˜, ํ•™์Šต ๋ฐ์ดํ„ฐ์—๋Š” ์žˆ์ง€๋งŒ ํ•™์Šต์— ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋Š” ์—ด์„ ํฌํ•จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ถ”๊ฐ€ ์—ด์€ ์ถœ๋ ฅ์— ํฌํ•จ๋˜์ง€๋งŒ ์˜ˆ์ธก ๊ฒฐ๊ณผ์—๋Š” ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ž…๋ ฅ ๋ฐ์ดํ„ฐ ์š”๊ตฌ์‚ฌํ•ญ

์˜ˆ์ธก ๋ชจ๋ธ์˜ ์ž…๋ ฅ์€ ๋‹ค์Œ ์š”๊ตฌ์‚ฌํ•ญ์„ ์ค€์ˆ˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  • ์‹œ๊ฐ„ ์—ด์˜ ๋ชจ๋“  ๊ฐ’์ด ์žˆ๊ณ  ์œ ํšจํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ๋ฐ ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ๋ฐ์ดํ„ฐ ๋นˆ๋„๋Š” ์ผ์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์‹œ๊ณ„์—ด์— ๋ˆ„๋ฝ๋œ ํ–‰์ด ์žˆ๋Š” ๊ฒฝ์šฐ ์ ์ ˆํ•œ ๋„๋ฉ”์ธ ์ง€์‹์— ๋”ฐ๋ผ ์ˆ˜๋™์œผ๋กœ ํ–‰์„ ์‚ฝ์ž…ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ํƒ€์ž„์Šคํƒฌํ”„๊ฐ€ ์ค‘๋ณต๋œ ์‹œ๊ณ„์—ด์€ ์˜ˆ์ธก์—์„œ ์‚ญ์ œ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์‹œ๊ณ„์—ด์„ ํฌํ•จํ•˜๋ ค๋ฉด ์ค‘๋ณต ํƒ€์ž„์Šคํƒฌํ”„๋ฅผ ์‚ญ์ œํ•˜์„ธ์š”.
  • ์˜ˆ์ธกํ•  ๊ฐ ์‹œ๊ณ„์—ด์˜ ์ด์ „ ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ฐ€์žฅ ์ •ํ™•ํ•œ ์˜ˆ์ธก์„ ์–ป์œผ๋ ค๋ฉด ๋ฐ์ดํ„ฐ ์–‘์ด ๋ชจ๋ธ ํ•™์Šต ์ค‘์— ์„ค์ •๋œ ์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ์™€ ๋™์ผํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ๊ฐ€ 14์ผ์ธ ๊ฒฝ์šฐ ์ตœ์†Œ 14์ผ ์ด์ „์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๋ฅผ ๋” ์ ๊ฒŒ ์ œ๊ณตํ•˜๋ฉด Vertex AI์—์„œ ๊ฐ’์ด ๋น„์–ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํŒจ๋”ฉํ•ฉ๋‹ˆ๋‹ค.
  • ์˜ˆ์ธก์€ ๋Œ€์ƒ ์—ด์˜ null ๊ฐ’์ด ์žˆ๋Š” ์‹œ๊ณ„์—ด์˜ ์ฒซ ๋ฒˆ์งธ ํ–‰(์‹œ๊ฐ„์ˆœ)์—์„œ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. null ๊ฐ’์€ ์‹œ๊ณ„์—ด ๋‚ด์—์„œ ์—ฐ์†์ ์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ํƒ€๊ฒŸ ์—ด์„ ์‹œ๊ฐ„์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜๋Š” ๊ฒฝ์šฐ ๋‹จ์ผ ์‹œ๊ณ„์—ด์— 1, 2, null, 3, 4, null, null๊ณผ ๊ฐ™์ด ์ •๋ ฌํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. CSV ํŒŒ์ผ์˜ ๊ฒฝ์šฐ Vertex AI๊ฐ€ ๋นˆ ๋ฌธ์ž์—ด์„ null๋กœ ์ทจ๊ธ‰ํ•˜๋ฉฐ BigQuery์˜ ๊ฒฝ์šฐ ๊ธฐ๋ณธ์ ์œผ๋กœ null ๊ฐ’์ด ์ง€์›๋ฉ๋‹ˆ๋‹ค.

BigQuery ํ…Œ์ด๋ธ”

BigQuery ํ…Œ์ด๋ธ”์„ ์ž…๋ ฅ์œผ๋กœ ์„ ํƒํ•˜๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ์„ ํ™•์ธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  • BigQuery ๋ฐ์ดํ„ฐ ์†Œ์Šค ํ…Œ์ด๋ธ”์€ 100GB๋ฅผ ๋„˜์ง€ ์•Š์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ํ…Œ์ด๋ธ”์ด ๋‹ค๋ฅธ ํ”„๋กœ์ ํŠธ์— ์žˆ์œผ๋ฉด ํ•ด๋‹น ํ”„๋กœ์ ํŠธ์˜ Vertex AI ์„œ๋น„์Šค ๊ณ„์ •์— BigQuery Data Editor ์—ญํ• ์„ ๋ถ€์—ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

CSV ํŒŒ์ผ

Cloud Storage์—์„œ ์ž…๋ ฅ์œผ๋กœ CSV ๊ฐ์ฒด๋ฅผ ์„ ํƒํ•  ๊ฒฝ์šฐ ๋‹ค์Œ ์‚ฌํ•ญ์„ ํ™•์ธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  • ๋ฐ์ดํ„ฐ ์†Œ์Šค๋Š” ์—ด ์ด๋ฆ„์ด ์žˆ๋Š” ํ—ค๋” ํ–‰์œผ๋กœ ์‹œ์ž‘ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐ ๋ฐ์ดํ„ฐ ์†Œ์Šค ๊ฐ์ฒด๋Š” 10GB๋ฅผ ๋„˜์ง€ ์•Š์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ ํŒŒ์ผ์„ ํฌํ•จํ•  ์ˆ˜๋„ ์žˆ์ง€๋งŒ ์ตœ๋Œ€ ์šฉ๋Ÿ‰์€ 100GB๋กœ ์ œํ•œ๋ฉ๋‹ˆ๋‹ค.
  • Cloud Storage ๋ฒ„ํ‚ท์ด ๋‹ค๋ฅธ ํ”„๋กœ์ ํŠธ์— ์žˆ์œผ๋ฉด ํ•ด๋‹น ํ”„๋กœ์ ํŠธ์˜ Vertex AI ์„œ๋น„์Šค ๊ณ„์ •์— Storage Object Creator ์—ญํ• ์„ ๋ถ€์—ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ชจ๋“  ๋ฌธ์ž์—ด์„ ํฐ๋”ฐ์˜ดํ‘œ(")๋กœ ๋ฌถ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ถœ๋ ฅ ํ˜•์‹

์ผ๊ด„ ์˜ˆ์ธก ์š”์ฒญ์˜ ์ถœ๋ ฅ ํ˜•์‹์€ ์ž…๋ ฅ์— ์‚ฌ์šฉํ•œ ํ˜•์‹๊ณผ ๋™์ผํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด BigQuery ํ…Œ์ด๋ธ”์„ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ Cloud Storage์˜ CSV ๊ฐ์ฒด๋กœ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ์— ์ผ๊ด„ ์˜ˆ์ธก ์š”์ฒญ ๋ณด๋‚ด๊ธฐ

์ผ๊ด„ ์˜ˆ์ธก ์š”์ฒญ์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด Google Cloud ์ฝ˜์†” ๋˜๋Š” Vertex AI API๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋Š” Cloud Storage ๋ฒ„ํ‚ท์ด๋‚˜ BigQuery ํ…Œ์ด๋ธ”์— ์ €์žฅ๋œ CSV ๊ฐ์ฒด์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ž…๋ ฅ์œผ๋กœ ์ œ์ถœํ•˜๋Š” ๋ฐ์ดํ„ฐ ์–‘์— ๋”ฐ๋ผ ์ผ๊ด„ ์˜ˆ์ธก ํƒœ์Šคํฌ๊ฐ€ ์™„๋ฃŒ๋˜๋Š” ๋ฐ ๋‹ค์†Œ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Google Cloud ์ฝ˜์†”

Google Cloud ์ฝ˜์†”์„ ์‚ฌ์šฉํ•˜์—ฌ ์ผ๊ด„ ์˜ˆ์ธก์„ ์š”์ฒญํ•ฉ๋‹ˆ๋‹ค.

  1. Google Cloud ์ฝ˜์†”์˜ Vertex AI ์„น์…˜์—์„œ ์ผ๊ด„ ์˜ˆ์ธก ํŽ˜์ด์ง€๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.

    ์ผ๊ด„ ์˜ˆ์ธก ํŽ˜์ด์ง€๋กœ ์ด๋™

  2. ๋งŒ๋“ค๊ธฐ๋ฅผ ํด๋ฆญํ•˜์—ฌ ์ƒˆ ์ผ๊ด„ ์˜ˆ์ธก ์ฐฝ์„ ์—ฝ๋‹ˆ๋‹ค.
  3. ์ผ๊ด„ ์˜ˆ์ธก ์ •์˜์—์„œ ๋‹ค์Œ ๋‹จ๊ณ„๋ฅผ ์™„๋ฃŒํ•ฉ๋‹ˆ๋‹ค.
    1. ์ผ๊ด„ ์˜ˆ์ธก์˜ ์ด๋ฆ„์„ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    2. ๋ชจ๋ธ ์ด๋ฆ„์— ์ด ์ผ๊ด„ ์˜ˆ์ธก์— ์‚ฌ์šฉํ•  ๋ชจ๋ธ์˜ ์ด๋ฆ„์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
    3. ๋ฒ„์ „์—์„œ ๋ชจ๋ธ ๋ฒ„์ „์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
    4. ์†Œ์Šค ์„ ํƒ์—์„œ ์†Œ์Šค ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๊ฐ€ Cloud Storage์˜ CSV ํŒŒ์ผ์ธ์ง€ ๋˜๋Š” BigQuery์˜ ํ…Œ์ด๋ธ”์ธ์ง€ ์—ฌ๋ถ€๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
      • CSV ํŒŒ์ผ์˜ ๊ฒฝ์šฐ CSV ์ž…๋ ฅ ํŒŒ์ผ์ด ์žˆ๋Š” Cloud Storage ์œ„์น˜๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
      • BigQuery ํ…Œ์ด๋ธ”์˜ ๊ฒฝ์šฐ ํ…Œ์ด๋ธ”์ด ์žˆ๋Š” ํ”„๋กœ์ ํŠธ ID, BigQuery ๋ฐ์ดํ„ฐ ์„ธํŠธ ID, BigQuery ํ…Œ์ด๋ธ” ๋˜๋Š” ๋ทฐ ID๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
    5. ์ผ๊ด„ ์˜ˆ์ธก ์ถœ๋ ฅ์— CSV ๋˜๋Š” BigQuery๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
      • CSV์˜ ๊ฒฝ์šฐ Vertex AI์—์„œ ์ถœ๋ ฅ์„ ์ €์žฅํ•˜๋Š” Cloud Storage ๋ฒ„ํ‚ท์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
      • BigQuery์˜ ๊ฒฝ์šฐ ํ”„๋กœ์ ํŠธ ID ๋˜๋Š” ๊ธฐ์กด ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
        • ํ”„๋กœ์ ํŠธ ID๋ฅผ ์ง€์ •ํ•˜๋ ค๋ฉด Google Cloud ํ”„๋กœ์ ํŠธ ID ํ•„๋“œ์— ํ”„๋กœ์ ํŠธ ID๋ฅผ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค. Vertex AI์—์„œ ์ƒˆ๋กœ์šด ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ž๋™์œผ๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
        • ๊ธฐ์กด ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ง€์ •ํ•˜๋ ค๋ฉด Google Cloud ํ”„๋กœ์ ํŠธ ID ํ•„๋“œ์— BigQuery ๊ฒฝ๋กœ๋ฅผ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค(์˜ˆ: bq://projectid.datasetid).
      • ์„ ํƒ์‚ฌํ•ญ. ์ถœ๋ ฅ ๋Œ€์ƒ์ด Cloud Storage์˜ BigQuery ๋˜๋Š” JSONL์ด๋ฉด ์˜ˆ์ธก ์™ธ์— ํŠน์„ฑ ๊ธฐ์—ฌ ๋ถ„์„์„ ์‚ฌ์šฉ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ ค๋ฉด ์ด ๋ชจ๋ธ์˜ ํŠน์„ฑ ๊ธฐ์—ฌ ๋ถ„์„ ์‚ฌ์šฉ ์„ค์ •์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. Cloud Storage์˜ CSV์—๋Š” ํŠน์„ฑ ๊ธฐ์—ฌ ๋ถ„์„์ด ์ง€์›๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ž์„ธํžˆ ์•Œ์•„๋ณด๊ธฐ
  4. ์„ ํƒ์‚ฌํ•ญ: ์ผ๊ด„ ์˜ˆ์ธก์„ ์œ„ํ•œ ๋ชจ๋ธ ๋ชจ๋‹ˆํ„ฐ๋ง ๋ถ„์„์€ ๋ฏธ๋ฆฌ๋ณด๊ธฐ๋กœ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. ์ผ๊ด„ ์˜ˆ์ธก ์ž‘์—…์— ํŽธํ–ฅ ๊ฐ์ง€ ๊ตฌ์„ฑ์„ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๊ธฐ๋ณธ ์š”๊ฑด์„ ์ฐธ์กฐํ•˜์„ธ์š”.
    1. ์ด ์ผ๊ด„ ์˜ˆ์ธก์— ๋ชจ๋ธ ๋ชจ๋‹ˆํ„ฐ๋ง ์‚ฌ์šฉ ์„ค์ •์„ ํด๋ฆญํ•˜์—ฌ ์ผœ๊ฑฐ๋‚˜ ๋•๋‹ˆ๋‹ค.
    2. ํ•™์Šต ๋ฐ์ดํ„ฐ ์†Œ์Šค๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์„ ํƒํ•œ ํ•™์Šต ๋ฐ์ดํ„ฐ ์†Œ์Šค์˜ ๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ ๋˜๋Š” ์œ„์น˜๋ฅผ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    3. ์„ ํƒ์‚ฌํ•ญ: ์•Œ๋ฆผ ๊ธฐ์ค€ ์•„๋ž˜์—์„œ ์•Œ๋ฆผ์„ ํŠธ๋ฆฌ๊ฑฐํ•  ์ž„๊ณ—๊ฐ’์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
    4. ์•Œ๋ฆผ ์ด๋ฉ”์ผ์˜ ๊ฒฝ์šฐ ๋ชจ๋ธ์ด ์•Œ๋ฆผ ๊ธฐ์ค€์„ ์ดˆ๊ณผํ•˜๋ฉด ์•Œ๋ฆผ์„ ๋ฐ›์„ ์ด๋ฉ”์ผ ์ฃผ์†Œ ํ•˜๋‚˜ ์ด์ƒ์„ ์‰ผํ‘œ๋กœ ๊ตฌ๋ถ„ํ•˜์—ฌ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    5. ์„ ํƒ์‚ฌํ•ญ: ์•Œ๋ฆผ ์ฑ„๋„์˜ ๊ฒฝ์šฐ ๋ชจ๋ธ์ด ์•Œ๋ฆผ ๊ธฐ์ค€์„ ์ดˆ๊ณผํ•˜๋ฉด ์•Œ๋ฆผ์„ ๋ฐ›์„ Cloud Monitoring ์ฑ„๋„์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด Cloud Monitoring ์ฑ„๋„์„ ์„ ํƒํ•˜๊ฑฐ๋‚˜ ์•Œ๋ฆผ ์ฑ„๋„ ๊ด€๋ฆฌ๋ฅผ ํด๋ฆญํ•˜์—ฌ ์ƒˆ ์ฑ„๋„์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฝ˜์†”์—์„œ๋Š” PagerDuty, Slack, Pub/Sub ์•Œ๋ฆผ ์ฑ„๋„์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
  5. ๋งŒ๋“ค๊ธฐ๋ฅผ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

API : BigQuery

REST

batchPredictionJobs.create ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•ด ์ผ๊ด„ ์˜ˆ์ธก์„ ์š”์ฒญํ•ฉ๋‹ˆ๋‹ค.

์š”์ฒญ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์ „์— ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

  • LOCATION_ID: ๋ชจ๋ธ์ด ์ €์žฅ๋˜๊ณ  ์ผ๊ด„ ์˜ˆ์ธก ์ž‘์—…์ด ์‹คํ–‰๋˜๋Š” ๋ฆฌ์ „์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด us-central1์ž…๋‹ˆ๋‹ค.
  • PROJECT_ID: ํ”„๋กœ์ ํŠธ ID์ž…๋‹ˆ๋‹ค.
  • BATCH_JOB_NAME: ์ผ๊ด„ ์ž‘์—…์˜ ํ‘œ์‹œ ์ด๋ฆ„์ž…๋‹ˆ๋‹ค.
  • MODEL_ID: ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ๋ชจ๋ธ์˜ ID์ž…๋‹ˆ๋‹ค.
  • INPUT_URI: BigQuery ๋ฐ์ดํ„ฐ ์†Œ์Šค์— ๋Œ€ํ•œ ์ฐธ์กฐ์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ ์•ˆ๋‚ด๋ฅผ ๋”ฐ๋ผ ์–‘์‹์„ ์ž‘์„ฑํ•˜์„ธ์š”.
    bq://bqprojectId.bqDatasetId.bqTableId
    
  • OUTPUT_URI: ์˜ˆ์ธก์ด ๊ธฐ๋ก๋˜๋Š” BigQuery ๋Œ€์ƒ์— ๋Œ€ํ•œ ์ฐธ์กฐ์ž…๋‹ˆ๋‹ค. ํ”„๋กœ์ ํŠธ ID๋ฅผ ์ง€์ •ํ•˜๊ณ  ์„ ํƒ์ ์œผ๋กœ ๊ธฐ์กด ๋ฐ์ดํ„ฐ ์„ธํŠธ ID๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ ํ˜•์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    bq://bqprojectId.bqDatasetId
    ํ”„๋กœ์ ํŠธ ID๋งŒ ์ง€์ •ํ•˜๋ฉด Vertex AI์—์„œ ์ž๋™์œผ๋กœ ์ƒˆ ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๋‹ค์Œ ํ˜•์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    bq://bqprojectId
  • GENERATE_EXPLANATION: ๊ธฐ๋ณธ๊ฐ’์€ false์ž…๋‹ˆ๋‹ค. ํŠน์„ฑ ๊ธฐ์—ฌ ๋ถ„์„์„ ์‚ฌ์šฉ ์„ค์ •ํ•˜๋ ค๋ฉด true๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์˜ˆ์ธก์šฉ ํŠน์„ฑ ๊ธฐ์—ฌ ๋ถ„์„์„ ์ฐธ์กฐํ•˜์„ธ์š”.

HTTP ๋ฉ”์„œ๋“œ ๋ฐ URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs

JSON ์š”์ฒญ ๋ณธ๋ฌธ:

{
  "displayName": "BATCH_JOB_NAME",
  "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID",
  "inputConfig": {
    "instancesFormat": "bigquery",
    "bigquerySource": {
      "inputUri": "INPUT_URI"
    }
  },
  "outputConfig": {
    "predictionsFormat": "bigquery",
    "bigqueryDestination": {
      "outputUri": "OUTPUT_URI"
    }
  },
  "generate_explanation": GENERATE_EXPLANATION
}

์š”์ฒญ์„ ๋ณด๋‚ด๋ ค๋ฉด ๋‹ค์Œ ์˜ต์…˜ ์ค‘ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

curl

์š”์ฒญ ๋ณธ๋ฌธ์„ request.json ํŒŒ์ผ์— ์ €์žฅํ•˜๊ณ  ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs"

PowerShell

์š”์ฒญ ๋ณธ๋ฌธ์„ request.json ํŒŒ์ผ์— ์ €์žฅํ•˜๊ณ  ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs" | Select-Object -Expand Content

๋‹ค์Œ๊ณผ ๋น„์Šทํ•œ JSON ์‘๋‹ต์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

{
  "name": "projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs/67890",
  "displayName": "batch_job_1 202005291958",
  "model": "projects/12345/locations/us-central1/models/5678",
  "state": "JOB_STATE_PENDING",
  "inputConfig": {
    "instancesFormat": "bigquery",
    "bigquerySource": {
      "inputUri": "INPUT_URI"
    }
  },
  "outputConfig": {
    "predictionsFormat": "bigquery",
    "bigqueryDestination": {
        "outputUri": bq://12345
    }
  },
  "dedicatedResources": {
    "machineSpec": {
      "machineType": "n1-standard-32",
      "acceleratorCount": "0"
    },
    "startingReplicaCount": 2,
    "maxReplicaCount": 6
  },
  "manualBatchTuningParameters": {
    "batchSize": 4
  },
  "outputInfo": {
    "bigqueryOutputDataset": "bq://12345.reg_model_2020_10_02_06_04
  }
  "state": "JOB_STATE_PENDING",
  "createTime": "2020-09-30T02:58:44.341643Z",
  "updateTime": "2020-09-30T02:58:44.341643Z",
}

Java

์ด ์ƒ˜ํ”Œ์„ ์‚ฌ์šฉํ•ด ๋ณด๊ธฐ ์ „์— Vertex AI ๋น ๋ฅธ ์‹œ์ž‘: ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์‚ฌ์šฉ์˜ Java ์„ค์ • ์•ˆ๋‚ด๋ฅผ ๋”ฐ๋ฅด์„ธ์š”. ์ž์„ธํ•œ ๋‚ด์šฉ์€ Vertex AI Java API ์ฐธ๊ณ  ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Vertex AI์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.

๋‹ค์Œ ์ƒ˜ํ”Œ์—์„œ INSTANCES_FORMAT ๋ฐ PREDICTIONS_FORMAT์„ `bigquery`๋กœ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์ž๋ฆฌํ‘œ์‹œ์ž๋ฅผ ๊ต์ฒดํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณด๋ ค๋ฉด ์ด ์„น์…˜์˜ `REST & CMD LINE` ํƒญ์„ ์ฐธ์กฐํ•˜์„ธ์š”.
import com.google.cloud.aiplatform.v1.BatchPredictionJob;
import com.google.cloud.aiplatform.v1.BigQueryDestination;
import com.google.cloud.aiplatform.v1.BigQuerySource;
import com.google.cloud.aiplatform.v1.JobServiceClient;
import com.google.cloud.aiplatform.v1.JobServiceSettings;
import com.google.cloud.aiplatform.v1.LocationName;
import com.google.cloud.aiplatform.v1.ModelName;
import com.google.gson.JsonObject;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;

public class CreateBatchPredictionJobBigquerySample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "PROJECT";
    String displayName = "DISPLAY_NAME";
    String modelName = "MODEL_NAME";
    String instancesFormat = "INSTANCES_FORMAT";
    String bigquerySourceInputUri = "BIGQUERY_SOURCE_INPUT_URI";
    String predictionsFormat = "PREDICTIONS_FORMAT";
    String bigqueryDestinationOutputUri = "BIGQUERY_DESTINATION_OUTPUT_URI";
    createBatchPredictionJobBigquerySample(
        project,
        displayName,
        modelName,
        instancesFormat,
        bigquerySourceInputUri,
        predictionsFormat,
        bigqueryDestinationOutputUri);
  }

  static void createBatchPredictionJobBigquerySample(
      String project,
      String displayName,
      String model,
      String instancesFormat,
      String bigquerySourceInputUri,
      String predictionsFormat,
      String bigqueryDestinationOutputUri)
      throws IOException {
    JobServiceSettings settings =
        JobServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();
    String location = "us-central1";

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (JobServiceClient client = JobServiceClient.create(settings)) {
      JsonObject jsonModelParameters = new JsonObject();
      Value.Builder modelParametersBuilder = Value.newBuilder();
      JsonFormat.parser().merge(jsonModelParameters.toString(), modelParametersBuilder);
      Value modelParameters = modelParametersBuilder.build();
      BigQuerySource bigquerySource =
          BigQuerySource.newBuilder().setInputUri(bigquerySourceInputUri).build();
      BatchPredictionJob.InputConfig inputConfig =
          BatchPredictionJob.InputConfig.newBuilder()
              .setInstancesFormat(instancesFormat)
              .setBigquerySource(bigquerySource)
              .build();
      BigQueryDestination bigqueryDestination =
          BigQueryDestination.newBuilder().setOutputUri(bigqueryDestinationOutputUri).build();
      BatchPredictionJob.OutputConfig outputConfig =
          BatchPredictionJob.OutputConfig.newBuilder()
              .setPredictionsFormat(predictionsFormat)
              .setBigqueryDestination(bigqueryDestination)
              .build();
      String modelName = ModelName.of(project, location, model).toString();
      BatchPredictionJob batchPredictionJob =
          BatchPredictionJob.newBuilder()
              .setDisplayName(displayName)
              .setModel(modelName)
              .setModelParameters(modelParameters)
              .setInputConfig(inputConfig)
              .setOutputConfig(outputConfig)
              .build();
      LocationName parent = LocationName.of(project, location);
      BatchPredictionJob response = client.createBatchPredictionJob(parent, batchPredictionJob);
      System.out.format("response: %s\n", response);
      System.out.format("\tName: %s\n", response.getName());
    }
  }
}

Vertex AI SDK for Python

Vertex AI SDK for Python์„ ์„ค์น˜ํ•˜๊ฑฐ๋‚˜ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐฉ๋ฒ•์€ Vertex AI SDK for Python ์„ค์น˜๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”. ์ž์„ธํ•œ ๋‚ด์šฉ์€ Vertex AI SDK for Python API ์ฐธ์กฐ ๋ฌธ์„œ๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

def create_batch_prediction_job_bigquery_sample(
    project: str,
    location: str,
    model_resource_name: str,
    job_display_name: str,
    bigquery_source: str,
    bigquery_destination_prefix: str,
    sync: bool = True,
):
    aiplatform.init(project=project, location=location)

    my_model = aiplatform.Model(model_resource_name)

    batch_prediction_job = my_model.batch_predict(
        job_display_name=job_display_name,
        bigquery_source=bigquery_source,
        bigquery_destination_prefix=bigquery_destination_prefix,
        sync=sync,
    )

    batch_prediction_job.wait()

    print(batch_prediction_job.display_name)
    print(batch_prediction_job.resource_name)
    print(batch_prediction_job.state)
    return batch_prediction_job

API : Cloud Storage

REST

batchPredictionJobs.create ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•ด ์ผ๊ด„ ์˜ˆ์ธก์„ ์š”์ฒญํ•ฉ๋‹ˆ๋‹ค.

์š”์ฒญ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์ „์— ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

  • LOCATION_ID: ๋ชจ๋ธ์ด ์ €์žฅ๋˜๊ณ  ์ผ๊ด„ ์˜ˆ์ธก ์ž‘์—…์ด ์‹คํ–‰๋˜๋Š” ๋ฆฌ์ „์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด us-central1์ž…๋‹ˆ๋‹ค.
  • PROJECT_ID: ํ”„๋กœ์ ํŠธ ID์ž…๋‹ˆ๋‹ค.
  • BATCH_JOB_NAME: ์ผ๊ด„ ์ž‘์—…์˜ ํ‘œ์‹œ ์ด๋ฆ„์ž…๋‹ˆ๋‹ค.
  • MODEL_ID: ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ๋ชจ๋ธ์˜ ID์ž…๋‹ˆ๋‹ค.
  • URI: ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ํฌํ•จ๋œ Cloud Storage ๋ฒ„ํ‚ท์˜ ๊ฒฝ๋กœ(URI)์ž…๋‹ˆ๋‹ค. ๋‘ ๊ฐœ ์ด์ƒ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ URI์˜ ํ˜•์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
    gs://bucketName/pathToFileName
    
  • OUTPUT_URI_PREFIX: ์˜ˆ์ธก์ด ๊ธฐ๋ก๋˜๋Š” Cloud Storage ๋Œ€์ƒ์˜ ๊ฒฝ๋กœ์ž…๋‹ˆ๋‹ค. Vertex AI์—์„œ ์ด ๊ฒฝ๋กœ์˜ ํƒ€์ž„์Šคํƒฌํ”„๊ฐ€ ์ ์šฉ๋œ ํ•˜์œ„ ๋””๋ ‰ํ„ฐ๋ฆฌ์— ์ผ๊ด„ ์˜ˆ์ธก์„ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฐ’์„ ๋‹ค์Œ ํ˜•์‹์˜ ๋ฌธ์ž์—ด์— ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
    gs://bucketName/pathToOutputDirectory
    
  • GENERATE_EXPLANATION: ๊ธฐ๋ณธ๊ฐ’์€ false์ž…๋‹ˆ๋‹ค. ํŠน์„ฑ ๊ธฐ์—ฌ ๋ถ„์„์„ ์‚ฌ์šฉ ์„ค์ •ํ•˜๋ ค๋ฉด true๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด ์˜ต์…˜์€ ์ถœ๋ ฅ ๋Œ€์ƒ์ด JSONL์ธ ๊ฒฝ์šฐ์—๋งŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. Cloud Storage์˜ CSV์—์„œ๋Š” ํŠน์„ฑ ๊ธฐ์—ฌ ๋ถ„์„์ด ์ง€์›๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์˜ˆ์ธก์šฉ ํŠน์„ฑ ๊ธฐ์—ฌ ๋ถ„์„์„ ์ฐธ์กฐํ•˜์„ธ์š”.

HTTP ๋ฉ”์„œ๋“œ ๋ฐ URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs

JSON ์š”์ฒญ ๋ณธ๋ฌธ:

{
  "displayName": "BATCH_JOB_NAME",
  "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID",
  "inputConfig": {
    "instancesFormat": "csv",
    "gcsSource": {
      "uris": [
        URI1,...
      ]
    },
  },
  "outputConfig": {
    "predictionsFormat": "csv",
    "gcsDestination": {
      "outputUriPrefix": "OUTPUT_URI_PREFIX"
    }
  },
  "generate_explanation": GENERATE_EXPLANATION
}

์š”์ฒญ์„ ๋ณด๋‚ด๋ ค๋ฉด ๋‹ค์Œ ์˜ต์…˜ ์ค‘ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

curl

์š”์ฒญ ๋ณธ๋ฌธ์„ request.json ํŒŒ์ผ์— ์ €์žฅํ•˜๊ณ  ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs"

PowerShell

์š”์ฒญ ๋ณธ๋ฌธ์„ request.json ํŒŒ์ผ์— ์ €์žฅํ•˜๊ณ  ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs" | Select-Object -Expand Content

๋‹ค์Œ๊ณผ ๋น„์Šทํ•œ JSON ์‘๋‹ต์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

{
  "name": "projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs/67890",
  "displayName": "batch_job_1 202005291958",
  "model": "projects/12345/locations/us-central1/models/5678",
  "state": "JOB_STATE_PENDING",
  "inputConfig": {
    "instancesFormat": "csv",
    "gcsSource": {
      "uris": [
        "gs://bp_bucket/reg_mode_test"
      ]
    }
  },
  "outputConfig": {
    "predictionsFormat": "csv",
    "gcsDestination": {
      "outputUriPrefix": "OUTPUT_URI_PREFIX"
    }
  },
  "dedicatedResources": {
    "machineSpec": {
      "machineType": "n1-standard-32",
      "acceleratorCount": "0"
    },
    "startingReplicaCount": 2,
    "maxReplicaCount": 6
  }
  "outputInfo": {
    "gcsOutputDataset": "OUTPUT_URI_PREFIX/prediction-batch_job_1 202005291958-2020-09-30T02:58:44.341643Z"
  }
  "state": "JOB_STATE_PENDING",
  "createTime": "2020-09-30T02:58:44.341643Z",
  "updateTime": "2020-09-30T02:58:44.341643Z",
}

Vertex AI SDK for Python

Vertex AI SDK for Python์„ ์„ค์น˜ํ•˜๊ฑฐ๋‚˜ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐฉ๋ฒ•์€ Vertex AI SDK for Python ์„ค์น˜๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”. ์ž์„ธํ•œ ๋‚ด์šฉ์€ Vertex AI SDK for Python API ์ฐธ์กฐ ๋ฌธ์„œ๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

def create_batch_prediction_job_sample(
    project: str,
    location: str,
    model_resource_name: str,
    job_display_name: str,
    gcs_source: Union[str, Sequence[str]],
    gcs_destination: str,
    sync: bool = True,
):
    aiplatform.init(project=project, location=location)

    my_model = aiplatform.Model(model_resource_name)

    batch_prediction_job = my_model.batch_predict(
        job_display_name=job_display_name,
        gcs_source=gcs_source,
        gcs_destination_prefix=gcs_destination,
        sync=sync,
    )

    batch_prediction_job.wait()

    print(batch_prediction_job.display_name)
    print(batch_prediction_job.resource_name)
    print(batch_prediction_job.state)
    return batch_prediction_job

์ผ๊ด„ ์˜ˆ์ธก ๊ฒฐ๊ณผ ๊ฒ€์ƒ‰

Vertex AI๋Š” ์ผ๊ด„ ์˜ˆ์ธก ์ถœ๋ ฅ์„ ์ง€์ •๋œ ๋Œ€์ƒ(BigQuery ๋˜๋Š” Cloud Storage)์œผ๋กœ ์ „์†กํ•ฉ๋‹ˆ๋‹ค.

ํŠน์„ฑ ๊ธฐ์—ฌ ๋ถ„์„์„ ์œ„ํ•œ Cloud Storage ์ถœ๋ ฅ์€ ํ˜„์žฌ ์ง€์›๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

BigQuery

์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ์„ธํŠธ

BigQuery๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์ผ๊ด„ ์˜ˆ์ธก์˜ ์ถœ๋ ฅ์€ ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. Vertex AI์— ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ œ๊ณตํ•œ ๊ฒฝ์šฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ ์ด๋ฆ„(BQ_DATASET_NAME)์€ ์ด์ „์— ์ œ๊ณตํ•œ ์ด๋ฆ„์ž…๋‹ˆ๋‹ค. ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ œ๊ณตํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ Vertex AI๊ฐ€ ์ž๋™์œผ๋กœ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ ๋‹จ๊ณ„์— ๋”ฐ๋ผ ์ด๋ฆ„(BQ_DATASET_NAME)์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. Google Cloud ์ฝ˜์†”์—์„œ Vertex AI ์ผ๊ด„ ์˜ˆ์ธก ํŽ˜์ด์ง€๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.

    ์ผ๊ด„ ์˜ˆ์ธก ํŽ˜์ด์ง€๋กœ ์ด๋™

  2. ์ƒ์„ฑํ•œ ์˜ˆ์ธก์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
  3. ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋Š” ๋‚ด๋ณด๋‚ด๊ธฐ ์œ„์น˜์— ์ง€์ •๋ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์„ธํŠธ ์ด๋ฆ„์€ prediction_MODEL_NAME_TIMESTAMP ํ˜•์‹์ž…๋‹ˆ๋‹ค.

์ถœ๋ ฅ ํ…Œ์ด๋ธ”

์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—๋Š” ๋‹ค์Œ ์„ธ ๊ฐ€์ง€ ์ถœ๋ ฅ ํ…Œ์ด๋ธ” ์ค‘ ํ•˜๋‚˜ ์ด์ƒ์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

  • ์˜ˆ์ธก ํ…Œ์ด๋ธ”

    ์ด ํ…Œ์ด๋ธ”์—๋Š” ์˜ˆ์ธก์ด ์š”์ฒญ๋œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ๋ชจ๋“  ํ–‰์— ๋Œ€ํ•œ ํ–‰์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค(์ฆ‰, TARGET_COLUMN_NAME = null). ์˜ˆ๋ฅผ ๋“ค์–ด ํƒ€๊ฒŸ ์—ด์— ๋Œ€ํ•œ null ํ•ญ๋ชฉ 14๊ฐœ(์˜ˆ: ๋‹ค์Œ 14์ผ ๋™์•ˆ์˜ ํŒ๋งค)๊ฐ€ ์ž…๋ ฅ์— ํฌํ•จ๋œ ๊ฒฝ์šฐ ์˜ˆ์ธก ์š”์ฒญ์€ ๊ฐ ๋‚ ์งœ์˜ ํŒ๋งค ๋ฒˆํ˜ธ์ธ 14๊ฐœ ํ–‰์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ์ธก ์š”์ฒญ์ด ๋ชจ๋ธ ์˜ˆ์ธก ๋ฒ”์œ„๋ฅผ ์ดˆ๊ณผํ•˜๋ฉด Vertex AI๋Š” ์˜ˆ์ธก ๋ฒ”์œ„๊นŒ์ง€์˜ ์˜ˆ์ธก๋งŒ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

  • ์˜ค๋ฅ˜ ๊ฒ€์ฆ ํ…Œ์ด๋ธ”

    ์ด ํ…Œ์ด๋ธ”์—๋Š” ์ผ๊ด„ ์˜ˆ์ธก ์ „์— ๋ฐœ์ƒํ•˜๋Š” ์ง‘๊ณ„ ๋‹จ๊ณ„ ์ค‘์— ๋ฐœ์ƒํ•˜๋Š” ์ค‘์š”ํ•˜์ง€ ์•Š์€ ์˜ค๋ฅ˜์— ๋Œ€ํ•œ ํ–‰์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์ค‘์š”ํ•˜์ง€ ์•Š์€ ๊ฐ ์˜ค๋ฅ˜๋Š” Vertex AI๊ฐ€ ์˜ˆ์ธก์„ ๋ฐ˜ํ™˜ํ•  ์ˆ˜ ์—†๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ํ–‰์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.

  • ์˜ค๋ฅ˜ ํ…Œ์ด๋ธ”

    ์ด ํ…Œ์ด๋ธ”์—๋Š” ์ผ๊ด„ ์˜ˆ์ธก ์ค‘์— ๋ฐœ์ƒํ•˜๋Š” ์ค‘์š”ํ•˜์ง€ ์•Š์€ ์˜ค๋ฅ˜์— ๋Œ€ํ•œ ํ–‰์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์ค‘์š”ํ•˜์ง€ ์•Š์€ ๊ฐ ์˜ค๋ฅ˜๋Š” Vertex AI๊ฐ€ ์˜ˆ์ธก์„ ๋ฐ˜ํ™˜ํ•  ์ˆ˜ ์—†๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ํ–‰์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ์ธก ํ…Œ์ด๋ธ”

ํ…Œ์ด๋ธ”์˜ ์ด๋ฆ„(BQ_PREDICTIONS_TABLE_NAME)์€ ์ผ๊ด„ ์˜ˆ์ธก ์ž‘์—…์ด ์‹œ์ž‘๋œ ํƒ€์ž„์Šคํƒฌํ”„์™€ ํ•จ๊ป˜ 'predictions_'์„ ์ถ”๊ฐ€ํ•˜์—ฌ ํ˜•์„ฑ๋ฉ๋‹ˆ๋‹ค. predictions_TIMESTAMP

์˜ˆ์ธก ํ…Œ์ด๋ธ”์„ ๊ฒ€์ƒ‰ํ•˜๋ ค๋ฉด ๋‹ค์Œ ์•ˆ๋‚ด๋ฅผ ๋”ฐ๋ฅด์„ธ์š”.

  1. ์ฝ˜์†”์—์„œ BigQuery ํŽ˜์ด์ง€๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.
    BigQuery๋กœ ์ด๋™
  2. ๋‹ค์Œ ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
    SELECT * FROM BQ_DATASET_NAME.BQ_PREDICTIONS_TABLE_NAME
          

Vertex AI๋Š” predicted_TARGET_COLUMN_NAME.value ์—ด์— ์˜ˆ์ธก์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

Temporal Fusion Transformer(TFT)๋กœ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚จ ๊ฒฝ์šฐ predicted_TARGET_COLUMN_NAME.tft_feature_importance ์—ด์—์„œ TFT ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ ์ถœ๋ ฅ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ์—ด์€ ์ถ”๊ฐ€๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ถ„ํ• ๋ฉ๋‹ˆ๋‹ค.

  • context_columns: ์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ ๊ฐ’์ด TFT ์žฅ๋‹จ๊ธฐ ๋ฉ”๋ชจ๋ฆฌ(LSTM) ์ธ์ฝ”๋” ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ์˜ˆ์ธก ํŠน์„ฑ์ž…๋‹ˆ๋‹ค.
  • context_weights: ์˜ˆ์ธก ์ธ์Šคํ„ด์Šค์˜ ๊ฐ context_columns์— ์—ฐ๊ฒฐ๋œ ํŠน์„ฑ ์ค‘์š”๋„ ๊ฐ€์ค‘์น˜์ž…๋‹ˆ๋‹ค.
  • horizon_columns: ์˜ˆ์ธก ๋ฒ”์œ„ ๊ฐ’์ด TFT ์žฅ๋‹จ๊ธฐ ๋ฉ”๋ชจ๋ฆฌ(LSTM) ๋””์ฝ”๋” ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ์˜ˆ์ธก ํŠน์„ฑ์ž…๋‹ˆ๋‹ค.
  • horizon_weights: ์˜ˆ์ธก ์ธ์Šคํ„ด์Šค์˜ ๊ฐ horizon_columns์— ์—ฐ๊ฒฐ๋œ ํŠน์„ฑ ์ค‘์š”๋„ ๊ฐ€์ค‘์น˜์ž…๋‹ˆ๋‹ค.
  • attribute_columns: ์‹œ๋ถˆ๋ณ€์ธ ์˜ˆ์ธก ํŠน์„ฑ์ž…๋‹ˆ๋‹ค.
  • attribute_weights: ๊ฐ attribute_columns์— ์—ฐ๊ฒฐ๋œ ๊ฐ€์ค‘์น˜์ž…๋‹ˆ๋‹ค.

๋ชจ๋ธ์ด ๋ถ„์œ„์ˆ˜ ์†์‹ค์— ์ตœ์ ํ™”๋˜๊ณ  ๋ถ„์œ„์ˆ˜ ์ง‘ํ•ฉ์— ์ค‘์•™๊ฐ’์ด ํฌํ•จ๋˜์–ด ์žˆ์œผ๋ฉด predicted_TARGET_COLUMN_NAME.value๊ฐ€ ์ค‘์•™๊ฐ’์˜ ์˜ˆ์ธก ๊ฐ’์ž…๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด predicted_TARGET_COLUMN_NAME.value๊ฐ€ ์ง‘ํ•ฉ์—์„œ ๋ถ„์œ„์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ์€ ์˜ˆ์ธก ๊ฐ’์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋ถ„์œ„์ˆ˜ ์ง‘ํ•ฉ์ด [0.1, 0.5, 0.9]๋ฉด value๊ฐ€ 0.5 ๋ถ„์œ„์ˆ˜์˜ ์˜ˆ์ธก์ž…๋‹ˆ๋‹ค. ๋ถ„์œ„์ˆ˜ ์ง‘ํ•ฉ์ด [0.1, 0.9]๋ฉด value๊ฐ€ 0.1 ๋ถ„์œ„์ˆ˜์˜ ์˜ˆ์ธก์ž…๋‹ˆ๋‹ค.

๋˜ํ•œ Vertex AI๋Š” ๋‹ค์Œ ์—ด์— ๋ถ„์œ„์ˆ˜ ๊ฐ’ ๋ฐ ์˜ˆ์ธก์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

  • predicted_TARGET_COLUMN_NAME.quantile_values: ๋ชจ๋ธ ํ•™์Šต ์ค‘์— ์„ค์ •๋œ ๋ถ„์œ„์ˆ˜์˜ ๊ฐ’์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด 0.1, 0.5, 0.9์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • predicted_TARGET_COLUMN_NAME.quantile_predictions: ๋ถ„์œ„์ˆ˜ ๊ฐ’์— ์—ฐ๊ฒฐ๋œ ์˜ˆ์ธก ๊ฐ’์ž…๋‹ˆ๋‹ค.

๋ชจ๋ธ์—์„œ ํ™•๋ฅ ์  ์ถ”๋ก ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ predicted_TARGET_COLUMN_NAME.value์—๋Š” ์ตœ์ ํ™” ๋ชฉํ‘œ์˜ ์ตœ์†Œํ™” ๋„๊ตฌ๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์ตœ์ ํ™” ๋ชฉํ‘œ๊ฐ€ minimize-rmse๋ฉด predicted_TARGET_COLUMN_NAME.value์—๋Š” ํ‰๊ท ๊ฐ’์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. minimize-mae๋ฉด predicted_TARGET_COLUMN_NAME.value์— ์ค‘์•™๊ฐ’์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ์—์„œ ๋ถ„์œ„์ˆ˜๋กœ ํ™•๋ฅ ์  ์ถ”๋ก ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ Vertex AI๋Š” ๋‹ค์Œ ์—ด์— ๋ถ„์œ„์ˆ˜ ๊ฐ’๊ณผ ์˜ˆ์ธก์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

  • predicted_TARGET_COLUMN_NAME.quantile_values: ๋ชจ๋ธ ํ•™์Šต ์ค‘์— ์„ค์ •๋œ ๋ถ„์œ„์ˆ˜์˜ ๊ฐ’์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด 0.1, 0.5, 0.9์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • predicted_TARGET_COLUMN_NAME.quantile_predictions: ๋ถ„์œ„์ˆ˜ ๊ฐ’์— ์—ฐ๊ฒฐ๋œ ์˜ˆ์ธก ๊ฐ’์ž…๋‹ˆ๋‹ค.

ํŠน์„ฑ ๊ธฐ์—ฌ ๋ถ„์„์„ ์‚ฌ์šฉ ์„ค์ •ํ•œ ๊ฒฝ์šฐ ์˜ˆ์ธก ํ…Œ์ด๋ธ”์—์„œ๋„ ๊ธฐ์—ฌ ๋ถ„์„์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠน์„ฑ BQ_FEATURE_NAME์˜ ๊ธฐ์—ฌ ๋ถ„์„์— ์•ก์„ธ์Šคํ•˜๋ ค๋ฉด ๋‹ค์Œ ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

SELECT explanation.attributions[OFFSET(0)].featureAttributions.BQ_FEATURE_NAME FROM BQ_DATASET_NAME.BQ_PREDICTIONS_TABLE_NAME
  

์ž์„ธํ•œ ๋‚ด์šฉ์€ ์˜ˆ์ธก์šฉ ํŠน์„ฑ ๊ธฐ์—ฌ ๋ถ„์„์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์˜ค๋ฅ˜ ๊ฒ€์ฆ ํ…Œ์ด๋ธ”

ํ…Œ์ด๋ธ”์˜ ์ด๋ฆ„(BQ_ERRORS_VALIDATION_TABLE_NAME)์€ ์ผ๊ด„ ์˜ˆ์ธก ์ž‘์—…์ด ์‹œ์ž‘๋œ ํƒ€์ž„์Šคํƒฌํ”„์™€ ํ•จ๊ป˜ 'errors_validation'์„ ์ถ”๊ฐ€ํ•˜์—ฌ ํ˜•์„ฑ๋ฉ๋‹ˆ๋‹ค. errors_validation_TIMESTAMP

์˜ค๋ฅ˜ ๊ฒ€์ฆ ํ…Œ์ด๋ธ”์„ ๊ฒ€์ƒ‰ํ•˜๋ ค๋ฉด ๋‹ค์Œ ์•ˆ๋‚ด๋ฅผ ๋”ฐ๋ฅด์„ธ์š”.
  1. ์ฝ˜์†”์—์„œ BigQuery ํŽ˜์ด์ง€๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.
    BigQuery๋กœ ์ด๋™
  2. ๋‹ค์Œ ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
    SELECT * FROM BQ_DATASET_NAME.BQ_ERRORS_VALIDATION_TABLE_NAME
          
์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€๋Š” ๋‹ค์Œ ์—ด์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค.
  • errors_TARGET_COLUMN_NAME

์˜ค๋ฅ˜ ํ…Œ์ด๋ธ”

ํ…Œ์ด๋ธ”์˜ ์ด๋ฆ„(BQ_ERRORS_TABLE_NAME)์€ ์ผ๊ด„ ์˜ˆ์ธก ์ž‘์—…์ด ์‹œ์ž‘๋œ ํƒ€์ž„์Šคํƒฌํ”„(errors_TIMESTAMP)์™€ ํ•จ๊ป˜ 'errors_'๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ํ˜•์„ฑ๋ฉ๋‹ˆ๋‹ค.

์˜ค๋ฅ˜ ๊ฒ€์ฆ ํ…Œ์ด๋ธ”์„ ๊ฒ€์ƒ‰ํ•˜๋ ค๋ฉด ๋‹ค์Œ ์•ˆ๋‚ด๋ฅผ ๋”ฐ๋ฅด์„ธ์š”.
  1. ์ฝ˜์†”์—์„œ BigQuery ํŽ˜์ด์ง€๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.
    BigQuery๋กœ ์ด๋™
  2. ๋‹ค์Œ ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
    SELECT * FROM BQ_DATASET_NAME.BQ_ERRORS_TABLE_NAME
          
์˜ค๋ฅ˜๋Š” ๋‹ค์Œ ์—ด์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค.
  • error_TARGET_COLUMN_NAME.code
  • errors_TARGET_COLUMN_NAME.message

Cloud Storage

Cloud Storage๋ฅผ ์ถœ๋ ฅ ๋Œ€์ƒ์œผ๋กœ ์ง€์ •ํ•˜๋ฉด ์ผ๊ด„ ์˜ˆ์ธก ์š”์ฒญ์˜ ๊ฒฐ๊ณผ๊ฐ€ ์ง€์ •ํ•œ ๋ฒ„ํ‚ท์˜ ์ƒˆ ํด๋”์— CSV ๊ฐ์ฒด๋กœ ๋ฐ˜ํ™˜๋ฉ๋‹ˆ๋‹ค. ํด๋” ์ด๋ฆ„์€ ๋ชจ๋ธ ์ด๋ฆ„ ์•ž์— 'prediction_'๊ณผ ์ผ๊ด„ ์˜ˆ์ธก ์ž‘์—…์ด ์‹œ์ž‘๋œ ์‹œ์ ์˜ ํƒ€์ž„์Šคํƒฌํ”„๋ฅผ ์ถ”๊ฐ€ํ•ด ์ง€์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์˜ ์ผ๊ด„ ์˜ˆ์ธก ํƒญ์—์„œ Cloud Storage ํด๋” ์ด๋ฆ„์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Cloud Storage ํด๋”์—๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋‘ ๊ฐ€์ง€ ๊ฐ์ฒด๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์˜ˆ์ธก ๊ฐ์ฒด

    ์˜ˆ์ธก ๊ฐ์ฒด์˜ ์ด๋ฆ„์€ 'predictions_1.csv', 'predictions_2.csv' ๋“ฑ์œผ๋กœ ์ง€์ •๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์˜ˆ์ธก ํŒŒ์ผ์—๋Š” ์—ด ์ด๋ฆ„์ด ์ง€์ •๋œ ํ—ค๋” ํ–‰๊ณผ ๋ฐ˜ํ™˜๋œ ๋ชจ๋“  ์˜ˆ์ธก์— ๋Œ€ํ•œ ํ•˜๋‚˜์˜ ํ–‰์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ์ธก ๊ฐ’ ์ˆ˜๋Š” ์˜ˆ์ธก ์ž…๋ ฅ ๋ฐ ์˜ˆ์ธก ๋ฒ”์œ„์— ๋”ฐ๋ผ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ํƒ€๊ฒŸ ์—ด์— ๋Œ€ํ•œ null ํ•ญ๋ชฉ 14๊ฐœ(์˜ˆ: ๋‹ค์Œ 14์ผ ๋™์•ˆ์˜ ํŒ๋งค)๊ฐ€ ์ž…๋ ฅ์— ํฌํ•จ๋œ ๊ฒฝ์šฐ ์˜ˆ์ธก ์š”์ฒญ์€ ๊ฐ ๋‚ ์งœ์˜ ํŒ๋งค ๋ฒˆํ˜ธ์ธ 14๊ฐœ ํ–‰์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ์ธก ์š”์ฒญ์ด ๋ชจ๋ธ ์˜ˆ์ธก ๋ฒ”์œ„๋ฅผ ์ดˆ๊ณผํ•˜๋ฉด Vertex AI๋Š” ์˜ˆ์ธก ๋ฒ”์œ„๊นŒ์ง€์˜ ์˜ˆ์ธก๋งŒ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    ์˜ˆ์ธก ๊ฐ’์€ ์ด๋ฆ„์ด `predicted_TARGET_COLUMN_NAME`์ธ ์—ด์— ๋ฐ˜ํ™˜๋ฉ๋‹ˆ๋‹ค. ๋ถ„์œ„์ˆ˜ ์˜ˆ์ธก์˜ ๊ฒฝ์šฐ ์ถœ๋ ฅ ์—ด์—๋Š” JSON ํ˜•์‹์˜ ๋ถ„์œ„์ˆ˜ ์˜ˆ์ธก ๋ฐ ๋ถ„์œ„์ˆ˜ ๊ฐ’์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

  • ์˜ค๋ฅ˜ ๊ฐ์ฒด

    ์˜ค๋ฅ˜ ๊ฐ์ฒด์˜ ์ด๋ฆ„์€ `errors_1.csv`, `errors_2.csv` ๋“ฑ์œผ๋กœ ์ง€์ •๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” ํ—ค๋” ํ–‰๊ณผ Vertex AI๊ฐ€ ์˜ˆ์ธก์„ ๋ฐ˜ํ™˜ํ•˜์ง€ ๋ชปํ•˜๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ๋ชจ๋“  ํ–‰์— ๋Œ€ํ•œ ํ•˜๋‚˜์˜ ํ–‰์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด null ๋น„ํ—ˆ์šฉ ํŠน์„ฑ์ด null์ธ ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค

์ฐธ๊ณ : ๊ฒฐ๊ณผ๊ฐ€ ํฌ๋ฉด ์—ฌ๋Ÿฌ ๊ฐ์ฒด๋กœ ๋ถ„ํ• ๋ฉ๋‹ˆ๋‹ค.

BigQuery์˜ ํŠน์„ฑ ๊ธฐ์—ฌ ๋ถ„์„ ์ฟผ๋ฆฌ ์ƒ˜ํ”Œ

์˜ˆ์‹œ 1: ๋‹จ์ผ ์˜ˆ์ธก์— ๋Œ€ํ•œ ๊ธฐ์—ฌ ๋ถ„์„ ๊ฒฐ์ •

๋‹ค์Œ ์งˆ๋ฌธ์„ ์ƒ๊ฐํ•ด ๋ด…์‹œ๋‹ค.

11์›” 24์ผ ํŠน์ • ๋งค์žฅ์˜ ์ œํ’ˆ ๊ด‘๊ณ  ์˜ˆ์ธก ๋งค์ถœ์ด ์–ผ๋งˆ๋‚˜ ์ฆ๊ฐ€ํ–ˆ๋Š”๊ฐ€?

ํ•ด๋‹น ์ฟผ๋ฆฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

SELECT
  * EXCEPT(explanation, predicted_sales),
  ROUND(predicted_sales.value, 2) AS predicted_sales,
  ROUND(
    explanation.attributions[OFFSET(0)].featureAttributions.advertisement,
    2
  ) AS attribution_advertisement
FROM
  `project.dataset.predictions`
WHERE
  product = 'product_0'
  AND store = 'store_0'
  AND date = '2019-11-24'

์˜ˆ์‹œ 2: ์ „์—ญ ํŠน์„ฑ ์ค‘์š”๋„ ๊ฒฐ์ •

๋‹ค์Œ ์งˆ๋ฌธ์„ ์ƒ๊ฐํ•ด ๋ด…์‹œ๋‹ค.

๊ฐ ๊ธฐ๋Šฅ์€ ์ „๋ฐ˜์ ์ธ ์˜ˆ์ƒ ํŒ๋งค๋Ÿ‰์— ์–ผ๋งˆ๋‚˜ ๊ธฐ์—ฌํ–ˆ๋Š”๊ฐ€?

๋กœ์ปฌ ํŠน์„ฑ ์ค‘์š”๋„ ๊ธฐ์—ฌ ๋ถ„์„์„ ์ง‘๊ณ„ํ•˜์—ฌ ์ „์—ญ ํŠน์„ฑ ์ค‘์š”๋„๋ฅผ ์ˆ˜๋™์œผ๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น ์ฟผ๋ฆฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

WITH

/*
* Aggregate from (id, date) level attributions to global feature importance.
*/
attributions_aggregated AS (
 SELECT
   SUM(ABS(attributions.featureAttributions.date)) AS date,
   SUM(ABS(attributions.featureAttributions.advertisement)) AS advertisement,
   SUM(ABS(attributions.featureAttributions.holiday)) AS holiday,
   SUM(ABS(attributions.featureAttributions.sales)) AS sales,
   SUM(ABS(attributions.featureAttributions.store)) AS store,
   SUM(ABS(attributions.featureAttributions.product)) AS product,
 FROM
   project.dataset.predictions,
   UNNEST(explanation.attributions) AS attributions

),

/*
* Calculate the normalization constant for global feature importance.
*/
attributions_aggregated_with_total AS (
 SELECT
   *,
   date + advertisement + holiday + sales + store + product AS total
 FROM
   attributions_aggregated
)

/*
* Calculate the normalized global feature importance.
*/
SELECT
 ROUND(date / total, 2) AS date,
 ROUND(advertisement / total, 2) AS advertisement,
 ROUND(holiday / total, 2) AS holiday,
 ROUND(sales / total, 2) AS sales,
 ROUND(store / total, 2) AS store,
 ROUND(product / total, 2) AS product,
FROM
 attributions_aggregated_with_total

BigQuery์˜ ์ผ๊ด„ ์˜ˆ์ธก ์ถœ๋ ฅ ์˜ˆ์‹œ

์ฃผ๋ฅ˜ ํŒ๋งค์˜ ์˜ˆ์‹œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ 'Ida Grove' ๋„์‹œ์—๋Š” 4๊ฐœ์˜ ๋งค์žฅ('Ida Grove Food Pride', 'Discount Liquors of Ida Grove', 'Casey's General Store #3757', 'Brew Ida Grove')์ด ์žˆ์Šต๋‹ˆ๋‹ค. store_name์€ series identifier์ด๋ฉฐ 4๊ฐœ์˜ ๋งค์žฅ ์ค‘ 3๊ฐœ๋Š” ๋Œ€์ƒ ์—ด sale_dollars์— ๋Œ€ํ•œ ์˜ˆ์ธก์„ ์š”์ฒญํ•ฉ๋‹ˆ๋‹ค. 'Discount Liquors of Ida Grove'์— ๋Œ€ํ•œ ์˜ˆ์ธก์ด ์š”์ฒญ๋˜์ง€ ์•Š์•˜์œผ๋ฏ€๋กœ ๊ฒ€์ฆ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์Œ์€ ์˜ˆ์ธก์— ์‚ฌ์šฉ๋˜๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ์ถ”์ถœํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์˜ˆ์ธก์šฉ ์ƒ˜ํ”Œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ์„ธํŠธ

๋‹ค์Œ์€ ์˜ˆ์ธก ๊ฒฐ๊ณผ์—์„œ ์ถ”์ถœํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ƒ˜ํ”Œ ์˜ˆ์ธก ๊ฒฐ๊ณผ

๋‹ค์Œ์€ ๊ฒ€์ฆ ์˜ค๋ฅ˜์—์„œ ์ถ”์ถœํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ƒ˜ํ”Œ ๊ฒ€์ฆ ์˜ค๋ฅ˜

๋ถ„์œ„์ˆ˜ ์†์‹ค ์ตœ์ ํ™” ๋ชจ๋ธ์˜ ์ผ๊ด„ ์˜ˆ์ธก ์ถœ๋ ฅ ์˜ˆ์‹œ

๋‹ค์Œ์€ ๋ถ„์œ„์ˆ˜ ์†์‹ค์— ์ตœ์ ํ™”๋œ ๋ชจ๋ธ์˜ ์ผ๊ด„ ์˜ˆ์ธก ์ถœ๋ ฅ์„ ๋ณด์—ฌ์ฃผ๋Š” ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค. ์ด ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ์˜ˆ์ธก ๋ชจ๋ธ์€ ๊ฐ ๋งค์žฅ์˜ ๋‹ค์Œ 14์ผ ๋™์•ˆ์˜ ํŒ๋งค๋ฅผ ์˜ˆ์ธกํ–ˆ์Šต๋‹ˆ๋‹ค.

๋ถ„์œ„์ˆ˜ ์†์‹ค ์ตœ์ ํ™” ๋ชจ๋ธ์˜ ์ƒ˜ํ”Œ ์ผ๊ด„ ์˜ˆ์ธก ์ถœ๋ ฅ

๋ถ„์œ„์ˆ˜ ๊ฐ’์€ predicted_Sales.quantile_values ์—ด์— ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. ์ด ์˜ˆ์‹œ์—์„œ๋Š” ๋ชจ๋ธ์ด 0.1, 0.5, 0.9 ๋ถ„์œ„ ์ˆ˜์—์„œ ๊ฐ’์„ ์˜ˆ์ธกํ–ˆ์Šต๋‹ˆ๋‹ค.

์˜ˆ์ธก ๊ฐ’์€ predicted_Sales.quantile_predictions ์—ด์— ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. predicted_Sales.quantile_values ์—ด์˜ ๋ถ„์œ„์ˆ˜ ๊ฐ’์— ๋งคํ•‘๋˜๋Š” ํŒ๋งค ๊ฐ’์˜ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ํ–‰์—์„œ ํŒ๋งค ๊ฐ’์ด 4484.04๋ณด๋‹ค ์ž‘์„ ํ™•๋ฅ ์€ 10%์ž…๋‹ˆ๋‹ค. ํŒ๋งค ๊ฐ’์ด 5615.64๋ณด๋‹ค ์ž‘์„ ํ™•๋ฅ ์€ 50%์ž…๋‹ˆ๋‹ค. ํŒ๋งค ๊ฐ’์ด 6853.29๋ณด๋‹ค ์ž‘์„ ํ™•๋ฅ ์€ 90%์ž…๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ํ–‰์˜ ์˜ˆ์ธก์€ ๋‹จ์ผ ๊ฐ’์œผ๋กœ ํ‘œ์‹œ๋˜๋ฉฐ 5615.64์ž…๋‹ˆ๋‹ค.