NVIDIA Triton์œผ๋กœ ์ถ”๋ก  ์ œ๊ณต

์ด ํŽ˜์ด์ง€์—์„œ๋Š” Vertex AI๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ NVIDIA Triton ์ถ”๋ก  ์„œ๋ฒ„๋กœ ์ถ”๋ก  ์š”์ฒญ์„ ์ œ๊ณตํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. NVIDIA Triton ์ถ”๋ก  ์„œ๋ฒ„(Triton)๋Š” CPU ๋ฐ GPU ๋ชจ๋‘์— ์ตœ์ ํ™”๋œ NVIDIA์˜ ์˜คํ”ˆ์†Œ์Šค ์ถ”๋ก  ์ œ๊ณต ์†”๋ฃจ์…˜์ด๋ฉฐ ์ถ”๋ก  ์ œ๊ณต ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฐ„์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค.

Vertex AI์˜ NVIDIA Triton

Vertex AI๋Š” NVIDIA GPU Cloud(NGC) - NVIDIA Triton ์ถ”๋ก  ์„œ๋ฒ„ ์ด๋ฏธ์ง€๋กœ ๊ฒŒ์‹œ๋œ ์ปค์Šคํ…€ ์ปจํ…Œ์ด๋„ˆ์—์„œ ์‹คํ–‰๋˜๋Š” Triton ์ถ”๋ก  ์„œ๋ฒ„์— ๋ชจ๋ธ ๋ฐฐํฌ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. NVIDIA์˜ Triton ์ด๋ฏธ์ง€์—๋Š” ์ปค์Šคํ…€ ์ œ๊ณต ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ Vertex AI ์š”๊ตฌ์‚ฌํ•ญ์„ ์ถฉ์กฑํ•˜๋Š” ๋ชจ๋“  ํ•„์ˆ˜ ํŒจํ‚ค์ง€ ๋ฐ ๊ตฌ์„ฑ์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€์—๋Š” TensorFlow, PyTorch, TensorRT, ONNX, OpenVINO ๋ชจ๋ธ ์ง€์›๊ณผ ํ•จ๊ป˜ Triton ์ถ”๋ก  ์„œ๋ฒ„๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์ด๋ฏธ์ง€์—๋Š” ๋˜ํ•œ XGBoost, LightGBM, Scikit-Learn๊ณผ ๊ฐ™์€ ML ํ”„๋ ˆ์ž„์›Œํฌ ์‹คํ–‰์„ ์ง€์›ํ•˜๋Š” FIL(ํฌ๋ ˆ์ŠคํŠธ ์ถ”๋ก  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ) ๋ฐฑ์—”๋“œ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

Triton์€ ๋ชจ๋ธ์„ ๋กœ๋“œํ•˜๊ณ  ํ‘œ์ค€ ์ถ”๋ก  ํ”„๋กœํ† ์ฝœ์„ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ ๊ด€๋ฆฌ REST ์—”๋“œํฌ์ธํŠธ, ์ถ”๋ก , ์ƒํƒœ๋ฅผ ๋…ธ์ถœํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ Vertex AI์— ๋ฐฐํฌํ•˜๋Š” ๋™์•ˆ Triton์€ Vertex AI ํ™˜๊ฒฝ์„ ์ธ์‹ํ•˜์—ฌ ์ƒํƒœ ์ ๊ฒ€ ๋ฐ ์ถ”๋ก  ์š”์ฒญ์— Vertex AI ์ถ”๋ก  ํ”„๋กœํ† ์ฝœ์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์Œ์€ NVIDIA Triton ์ถ”๋ก  ์„œ๋ฒ„์˜ ์ฃผ์š” ๊ธฐ๋Šฅ ๋ฐ ์‚ฌ์šฉ ์‚ฌ๋ก€๋ฅผ ์š”์•ฝํ•œ ๋ชฉ๋ก์ž…๋‹ˆ๋‹ค.

  • ์—ฌ๋Ÿฌ ๋”ฅ ๋Ÿฌ๋‹ ๋ฐ ๋จธ์‹ ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ ์ง€์›: Triton์€ ์—ฌ๋Ÿฌ ๋ชจ๋ธ ๋ฐ ํ˜ผํ•ฉ๋œ ํ”„๋ ˆ์ž„์›Œํฌ์™€ ๋ชจ๋ธ ํ˜•์‹์˜ ๋ฐฐํฌ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” XGBoost, LightGBM, Scikit-Learn, C++ ๋ชจ๋ธ ํ˜•์‹์˜ ๋ชจ๋“  ์ปค์Šคํ…€ Python ๋“ฑ์˜ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ง€์›ํ•˜๋Š” TensorFlow(SavedModel๊ณผ GraphDef), PyTorch(TorchScript), TensorRT, ONNX, OpenVINO, FIL ๋ฐฑ์—”๋“œ๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
  • ๋™์‹œ ๋‹ค์ค‘ ๋ชจ๋ธ ์‹คํ–‰: Triton์„ ์‚ฌ์šฉํ•˜๋ฉด ์—ฌ๋Ÿฌ ๋ชจ๋ธ, ๋™์ผ ๋ชจ๋ธ์˜ ์—ฌ๋Ÿฌ ์ธ์Šคํ„ด์Šค, ๋˜๋Š” ๋‘ ๊ฐ€์ง€ ๋ชจ๋‘๊ฐ€ GPU 0๊ฐœ ์ด์ƒ์˜ ๋™์ผํ•œ ์ปดํ“จํŒ… ๋ฆฌ์†Œ์Šค์—์„œ ๋™์‹œ์— ์‹คํ–‰๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋ชจ๋ธ ์•™์ƒ๋ธ”(์—ฐ๊ฒฐ ๋˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ): Triton ์•™์ƒ๋ธ”์€ ์—ฌ๋Ÿฌ ๋ชจ๋ธ์ด ์„œ๋กœ๊ฐ„์— ์ž…๋ ฅ ๋ฐ ์ถœ๋ ฅ ํ…์„œ๊ฐ€ ์—ฐ๊ฒฐ๋œ ํŒŒ์ดํ”„๋ผ์ธ(๋˜๋Š” DAG, ๋ฐฉํ–ฅ์„ฑ ๋น„์ˆœํ™˜ ๊ทธ๋ž˜ํ”„)์œผ๋กœ ๊ตฌ์„ฑ๋œ ์‚ฌ์šฉ ์‚ฌ๋ก€๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ Triton Python ๋ฐฑ์—”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋น„์ฆˆ๋‹ˆ์Šค ๋กœ์ง ์Šคํฌ๋ฆฝํŒ…(BLS)์— ์ •์˜๋œ ๋ชจ๋“  ์‚ฌ์ „ ์ฒ˜๋ฆฌ, ์‚ฌํ›„ ์ฒ˜๋ฆฌ ๋˜๋Š” ์ œ์–ด ํ๋ฆ„ ๋กœ์ง์„ ํฌํ•จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • CPU ๋ฐ GPU ๋ฐฑ์—”๋“œ์—์„œ ์‹คํ–‰: Triton์€ CPU ๋ฐ GPU ํฌํ•จ ๋…ธ๋“œ์— ๋ฐฐํฌ๋œ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ถ”๋ก ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
  • ์ถ”๋ก  ์š”์ฒญ์˜ ๋™์  ์ผ๊ด„ ์ฒ˜๋ฆฌ: ์ผ๊ด„ ์ฒ˜๋ฆฌ๋ฅผ ์ง€์›ํ•˜๋Š” ๋ชจ๋ธ์„ ์œ„ํ•ด Triton์—๋Š” ๊ธฐ๋ณธ ์ œ๊ณต๋˜๋Š” ์˜ˆ์•ฝ ๋ฐ ์ผ๊ด„ ์ฒ˜๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ถ”๋ก  ์ฒ˜๋ฆฌ๋Ÿ‰ ํ–ฅ์ƒ ๋ฐ GPU ์‚ฌ์šฉ๋ฅ  ์ฆ๊ฐ€๋ฅผ ์œ„ํ•ด ์„œ๋ฒ„ ์ธก์—์„œ ๊ฐœ๋ณ„ ์ถ”๋ก  ์š”์ฒญ์„ ์ผ๊ด„ ์ฒ˜๋ฆฌ๋กœ ๋™์ ์œผ๋กœ ์กฐํ•ฉํ•ฉ๋‹ˆ๋‹ค.

NVIDIA Triton ์ถ”๋ก  ์„œ๋ฒ„์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ Triton ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ NVIDIA Triton ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€

๋‹ค์Œ ํ‘œ์—์„œ๋Š” NVIDIA NGC ์นดํƒˆ๋กœ๊ทธ์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ Triton Docker ์ด๋ฏธ์ง€๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋ชจ๋ธ ํ”„๋ ˆ์ž„์›Œํฌ, ๋ฐฑ์—”๋“œ, ์‚ฌ์šฉ๋˜๋Š” ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ด๋ฏธ์ง€๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

xx ๋ฐ yy๋Š” ๊ฐ๊ฐ Triton์˜ ์ฃผ ๋ฒ„์ „๊ณผ ๋ถ€ ๋ฒ„์ „์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

NVIDIA Triton ์ด๋ฏธ์ง€ ์ง€์›
xx.yy-py3 TensorFlow, PyTorch, TensorRT, ONNX, OpenVINO ๋ชจ๋ธ ์ง€์›์ด ํฌํ•จ๋œ ์ „์ฒด ์ปจํ…Œ์ด๋„ˆ
xx.yy-pyt-python-py3 PyTorch ๋ฐ Python ๋ฐฑ์—”๋“œ๋งŒ
xx.yy-tf2-python-py3 TensorFlow 2.x ๋ฐ Python ๋ฐฑ์—”๋“œ๋งŒ
xx.yy-py3-min ํ•„์š”์— ๋”ฐ๋ผ Triton ์ปจํ…Œ์ด๋„ˆ ๋งž์ถค์„ค์ •

์‹œ์ž‘ํ•˜๊ธฐ: NVIDIA Triton์œผ๋กœ ์ถ”๋ก  ์ œ๊ณต

๋‹ค์Œ ๊ทธ๋ฆผ์€ Vertex AI ์ถ”๋ก ์— ๋Œ€ํ•œ Triton์˜ ๊ณ ์ˆ˜์ค€ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

triton-on-vertex-ai-prediction

  • Triton์—์„œ ์ œ๊ณต๋˜๋Š” ML ๋ชจ๋ธ์ด Vertex AI Model Registry์— ๋“ฑ๋ก๋ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋Š” Cloud Storage์˜ ๋ชจ๋ธ ์•„ํ‹ฐํŒฉํŠธ์˜ ์œ„์น˜, ์ปค์Šคํ…€ ์ œ๊ณต ์ปจํ…Œ์ด๋„ˆ, ํ•ด๋‹น ๊ตฌ์„ฑ์„ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค.
  • Vertex AI Model Registry์˜ ๋ชจ๋ธ์ด CPU ๋ฐ GPU ํฌํ•จ ์ปดํ“จํŒ… ๋…ธ๋“œ์—์„œ ์ปค์Šคํ…€ ์ปจํ…Œ์ด๋„ˆ๋กœ Triton ์ถ”๋ก  ์„œ๋ฒ„๋ฅผ ์‹คํ–‰ ์ค‘์ธ Vertex AI ์ถ”๋ก  ์—”๋“œํฌ์ธํŠธ์— ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค.
  • ์ถ”๋ก  ์š”์ฒญ์€ Vertex AI ์ถ”๋ก  ์—”๋“œํฌ์ธํŠธ๋ฅผ ํ†ตํ•ด Triton ์ถ”๋ก  ์„œ๋ฒ„์— ๋„๋‹ฌํ•˜์—ฌ ์ ์ ˆํ•œ ์Šค์ผ€์ค„๋Ÿฌ๋กœ ๋ผ์šฐํŒ…๋ฉ๋‹ˆ๋‹ค.
  • ๋ฐฑ์—”๋“œ๊ฐ€ ์ผ๊ด„ ์š”์ฒญ์— ์ œ๊ณต๋œ ์ž…๋ ฅ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ์‘๋‹ต์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • Triton์€ Vertex AI์™€ ๊ฐ™์€ ๋ฐฐํฌ ํ™˜๊ฒฝ์— Triton์„ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ๋Š” ์ค€๋น„ ์ƒํƒœ ๋ฐ ํ™œ์„ฑ ์ƒํƒœ ์—”๋“œํฌ์ธํŠธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” Vertex AI์—์„œ ๋จธ์‹ ๋Ÿฌ๋‹ (ML) ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๊ธฐ ์œ„ํ•ด NVIDIA Triton ์ถ”๋ก  ์„œ๋ฒ„๋ฅผ ์‹คํ–‰ ์ค‘์ด๊ณ , ์˜จ๋ผ์ธ ์ถ”๋ก ์„ ์ œ๊ณตํ•˜๋Š” ์ปค์Šคํ…€ ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. COCO 2017 ๋ฐ์ดํ„ฐ ์„ธํŠธ๋กœ ์‚ฌ์ „ ํ•™์Šต๋œ TensorFlow Hub์˜ ๊ฐ์ฒด ๊ฐ์ง€ ๋ชจ๋ธ์—์„œ ์ถ”๋ก ์„ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด Triton์„ ์‹คํ–‰ํ•˜๋Š” ์ปจํ…Œ์ด๋„ˆ๋ฅผ ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ํ›„ Vertex AI๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€์—์„œ ๊ฐ์ฒด๋ฅผ ๊ฐ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋…ธํŠธ๋ถ ํ˜•์‹์œผ๋กœ ์ด ํŠœํ† ๋ฆฌ์–ผ์„ ์‹คํ–‰ํ•˜๋ ค๋ฉด ๋‹ค์Œ ๋‹จ๊ณ„๋ฅผ ๋”ฐ๋ฅด์„ธ์š”.

Colab์—์„œ ์—ด๊ธฐ | Colab Enterprise์—์„œ ์—ด๊ธฐ | GitHub์—์„œ ๋ณด๊ธฐ | Vertex AI Workbench์—์„œ ์—ด๊ธฐ |

์‹œ์ž‘ํ•˜๊ธฐ ์ „์—

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the Vertex AI API and Artifact Registry API APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the Vertex AI API and Artifact Registry API APIs.

    Enable the APIs

  8. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

  9. ์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” Cloud Shell์„ ์‚ฌ์šฉํ•˜์—ฌ Google Cloud์™€ ์ƒํ˜ธ์ž‘์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. Cloud Shell ๋Œ€์‹  ๋‹ค๋ฅธ Bash ์…ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ๋‹ค์Œ ์ถ”๊ฐ€ ๊ตฌ์„ฑ์„ ์ˆ˜ํ–‰ํ•˜์„ธ์š”.

    1. Install the Google Cloud CLI.

    2. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

    3. To initialize the gcloud CLI, run the following command:

      gcloud init
    4. Artifact Registry ๋ฌธ์„œ์— ๋”ฐ๋ผ Docker๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.

    ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€ ๋นŒ๋“œ ๋ฐ ํ‘ธ์‹œ

    ์ปค์Šคํ…€ ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ์ปค์Šคํ…€ ์ปจํ…Œ์ด๋„ˆ ์š”๊ตฌ์‚ฌํ•ญ์„ ์ถฉ์กฑํ•˜๋Š” Docker ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€๋ฅผ ์ง€์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด ์„น์…˜์—์„œ๋Š” ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค๊ณ  ์ด๋ฅผ Artifact Registry์— ํ‘ธ์‹œํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

    ๋ชจ๋ธ ์•„ํ‹ฐํŒฉํŠธ ๋‹ค์šด๋กœ๋“œ

    ๋ชจ๋ธ ์•„ํ‹ฐํŒฉํŠธ๋Š” ์ถ”๋ก  ์ œ๊ณต์„ ์œ„ํ•ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ML ํ•™์Šต์œผ๋กœ ์ƒ์„ฑ๋˜๋Š” ํŒŒ์ผ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” ์ตœ์†Œํ•œ ํ•™์Šต๋œ ML ๋ชจ๋ธ์˜ ๊ตฌ์กฐ ๋ฐ ๊ฐ€์ค‘์น˜๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ ์•„ํ‹ฐํŒฉํŠธ์˜ ํ˜•์‹์€ ํ•™์Šต์— ์‚ฌ์šฉํ•˜๋Š” ML ํ”„๋ ˆ์ž„์›Œํฌ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค.

    ์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” ๋ชจ๋ธ์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ํ•™์Šตํ•˜๋Š” ๋Œ€์‹  COCO 2017 ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ํ•™์Šต๋œ TensorFlow Hub์—์„œ ๊ฐ์ฒด ๊ฐ์ง€ ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค. Triton์€ TensorFlow SavedModel ํ˜•์‹์„ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ ์ €์žฅ์†Œ๊ฐ€ ๋‹ค์Œ ๊ตฌ์กฐ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๊ธฐ๋ฅผ ๊ธฐ๋Œ€ํ•ฉ๋‹ˆ๋‹ค.

    โ””โ”€โ”€ model-repository-path
           โ””โ”€โ”€ model_name
                  โ”œโ”€โ”€ config.pbtxt
                  โ””โ”€โ”€ 1
                      โ””โ”€โ”€ model.savedmodel
                            โ””โ”€โ”€ <saved-model-files>
    

    config.pbtxt ํŒŒ์ผ์€ ๋ชจ๋ธ์— ๋Œ€ํ•œ ๋ชจ๋ธ ๊ตฌ์„ฑ์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ ํ•„์š”ํ•œ ์„ค์ •์ด ํฌํ•จ๋œ ๋ชจ๋ธ ๊ตฌ์„ฑ ํŒŒ์ผ์„ ์ œ๊ณตํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ Triton์ด --strict-model-config=false ์˜ต์…˜์œผ๋กœ ์‹œ์ž‘๋œ ๊ฒฝ์šฐ์—๋Š” ๊ฒฝ์šฐ์— ๋”ฐ๋ผ Triton์—์„œ ๋ชจ๋ธ ๊ตฌ์„ฑ์ด ์ž๋™์œผ๋กœ ์ƒ์„ฑ๋  ์ˆ˜ ์žˆ๊ณ  ๋ช…์‹œ์ ์œผ๋กœ ์ œ๊ณตํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ํŠนํžˆ TensorRT, TensorFlow SavedModel, ONNX ๋ชจ๋ธ์€ Triton์ด ๋ชจ๋“  ํ•„์ˆ˜ ์„ค์ •์„ ์ž๋™์œผ๋กœ ํŒŒ์ƒํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ ๊ตฌ์„ฑ ํŒŒ์ผ์„ ํ•„์š”๋กœ ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ๋ชจ๋“  ๋ชจ๋ธ ์œ ํ˜•์€ ๋ชจ๋ธ ๊ตฌ์„ฑ ํŒŒ์ผ์„ ์ œ๊ณตํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

    # Download and organize model artifacts according to the Triton model repository spec
    mkdir -p models/object_detector/1/model.savedmodel/
    curl -L "https://tfhub.dev/tensorflow/faster_rcnn/resnet101_v1_640x640/1?tf-hub-format=compressed" | \
        tar -zxvC ./models/object_detector/1/model.savedmodel/
    ls -ltr ./models/object_detector/1/model.savedmodel/
    

    ๋ชจ๋ธ์„ ๋กœ์ปฌ๋กœ ๋‹ค์šด๋กœ๋“œํ•œ ํ›„ ๋ชจ๋ธ ์ €์žฅ์†Œ๊ฐ€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

    ./models
    โ””โ”€โ”€ object_detector
        โ””โ”€โ”€ 1
            โ””โ”€โ”€ model.savedmodel
                โ”œโ”€โ”€ saved_model.pb
                โ””โ”€โ”€ variables
                    โ”œโ”€โ”€ variables.data-00000-of-00001
                    โ””โ”€โ”€ variables.index
    

    ๋ชจ๋ธ ์•„ํ‹ฐํŒฉํŠธ๋ฅผ Cloud Storage ๋ฒ„ํ‚ท์— ๋ณต์‚ฌ

    ๋ชจ๋ธ ๊ตฌ์„ฑ ํŒŒ์ผ์„ ํฌํ•จํ•˜๋Š” ๋‹ค์šด๋กœ๋“œํ•œ ๋ชจ๋ธ ์•„ํ‹ฐํŽ™ํŠธ๊ฐ€ Vertex AI ๋ชจ๋ธ ๋ฆฌ์†Œ์Šค๋ฅผ ๋งŒ๋“ค ๋•Œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” MODEL_ARTIFACTS_REPOSITORY์— ์ง€์ •๋œ Cloud Storage ๋ฒ„ํ‚ท์— ํ‘ธ์‹œ๋ฉ๋‹ˆ๋‹ค.

    gcloud storage cp ./models/object_detector MODEL_ARTIFACTS_REPOSITORY/ --recursive
    

    Artifact Registry ์ €์žฅ์†Œ ๋งŒ๋“ค๊ธฐ

    ๋‹ค์Œ ์„น์…˜์—์„œ ๋งŒ๋“ค ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€๋ฅผ ์ €์žฅํ•  Artifact Registry ์ €์žฅ์†Œ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

    ํ”„๋กœ์ ํŠธ์— Artifact Registry API ์„œ๋น„์Šค๋ฅผ ์‚ฌ์šฉ ์„ค์ •ํ•˜์„ธ์š”.

    gcloud services enable artifactregistry.googleapis.com
    

    ์…ธ์—์„œ ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜์—ฌ Artifact Registry ์ €์žฅ์†Œ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

    gcloud artifacts repositories create getting-started-nvidia-triton \
        --repository-format=docker \
        --location=LOCATION_ID \
        --description="NVIDIA Triton Docker repository"
    

    LOCATION_ID์„ Artifact Registry์˜ ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€ ์ €์žฅ ๋ฆฌ์ „์œผ๋กœ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค. ๋‚˜์ค‘์— ์ด ๋ฆฌ์ „๊ณผ ์ผ์น˜ํ•˜๋Š” ์œ„์น˜ ์—”๋“œํฌ์ธํŠธ์—์„œ Vertex AI ๋ชจ๋ธ ๋ฆฌ์†Œ์Šค๋ฅผ ๋งŒ๋“ค์–ด์•ผ ํ•˜๋ฏ€๋กœ, us-central1๊ณผ ๊ฐ™์ด Vertex AI์— ์œ„์น˜ ์—”๋“œํฌ์ธํŠธ๊ฐ€ ์žˆ๋Š” ๋ฆฌ์ „์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

    ์ž‘์—…์ด ์™„๋ฃŒ๋˜๋ฉด ๋ช…๋ น์–ด๊ฐ€ ๋‹ค์Œ์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

    Created repository [getting-started-nvidia-triton].
    

    ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€ ๋นŒ๋“œ

    NVIDIA์—์„œ Triton์„ ์‹คํ–‰ ์ค‘์ธ ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€๋ฅผ ๋นŒ๋“œํ•  ์ˆ˜ ์žˆ๋„๋ก Docker ์ด๋ฏธ์ง€๋ฅผ ์ œ๊ณตํ•˜๊ณ  ์ œ๊ณต์— ๋Œ€ํ•œ Vertex AI ์ปค์Šคํ…€ ์ปจํ…Œ์ด๋„ˆ ์š”๊ตฌ์‚ฌํ•ญ์— ๋งž๊ฒŒ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. docker๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ๊ฐ€์ ธ์˜ค๊ณ  ์ด๋ฏธ์ง€๊ฐ€ ๊ฒŒ์‹œ๋˜๋Š” Artifact Registry ๊ฒฝ๋กœ์— ํƒœ๊ทธ๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    NGC_TRITON_IMAGE_URI="nvcr.io/nvidia/tritonserver:22.01-py3"
    docker pull $NGC_TRITON_IMAGE_URI
    docker tag $NGC_TRITON_IMAGE_URI LOCATION_ID-docker.pkg.dev/PROJECT_ID/getting-started-nvidia-triton/vertex-triton-inference
    

    ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    • LOCATION_ID: ์ด์ „ ์„น์…˜์— ์ง€์ •๋œ ๋Œ€๋กœ Artifact Registry ์ €์žฅ์†Œ์˜ ๋ฆฌ์ „
    • PROJECT_ID: Google Cloudํ”„๋กœ์ ํŠธ ID

    ์ด ๋ช…๋ น์–ด๋Š” ๋ช‡ ๋ถ„ ๋™์•ˆ ์‹คํ–‰๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    ์ถ”๋ก  ์š”์ฒญ ํ…Œ์ŠคํŠธ๋ฅผ ์œ„ํ•œ ํŽ˜์ด๋กœ๋“œ ํŒŒ์ผ ์ค€๋น„

    ์ปจํ…Œ์ด๋„ˆ ์„œ๋ฒ„์— ์ถ”๋ก  ์š”์ฒญ์„ ์ „์†กํ•˜๋ ค๋ฉด Python์„ ์‚ฌ์šฉํ•˜๋Š” ์ƒ˜ํ”Œ ์ด๋ฏธ์ง€ ํŒŒ์ผ๋กœ ํŽ˜์ด๋กœ๋“œ๋ฅผ ์ค€๋น„ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ python ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•˜์—ฌ ํŽ˜์ด๋กœ๋“œ ํŒŒ์ผ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

    import json
    import requests
    
    # install required packages before running
    # pip install pillow numpy --upgrade
    from PIL import Image
    import numpy as np
    
    # method to generate payload from image url
    def generate_payload(image_url):
        # download image from url and resize
        image_inputs = Image.open(requests.get(image_url, stream=True).raw)
        image_inputs = image_inputs.resize((200, 200))
    
        # convert image to numpy array
        image_tensor = np.asarray(image_inputs)
        # derive image shape
        image_shape = [1] + list(image_tensor.shape)
    
        # create payload request
        payload = {
            "id": "0",
            "inputs": [
                {
                    "name": "input_tensor",
                    "shape": image_shape,
                    "datatype": "UINT8",
                    "parameters": {},
                    "data": image_tensor.tolist(),
                }
            ],
        }
    
        # save payload as json file
        payload_file = "instances.json"
        with open(payload_file, "w") as f:
            json.dump(payload, f)
        print(f"Payload generated at {payload_file}")
    
        return payload_file
    
    if __name__ == '__main__':
      image_url = "https://github.com/tensorflow/models/raw/master/research/object_detection/test_images/image2.jpg"
      payload_file = generate_payload(image_url)
    

    Python ์Šคํฌ๋ฆฝํŠธ๊ฐ€ ํŽ˜์ด๋กœ๋“œ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ๋‹ค์Œ ์‘๋‹ต์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

    Payload generated at instances.json
    

    ๋กœ์ปฌ๋กœ ์ปจํ…Œ์ด๋„ˆ ์‹คํ–‰(์„ ํƒ์‚ฌํ•ญ)

    ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€๋ฅผ Vertex AI์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด Artifact Registry๋กœ ํ‘ธ์‹œํ•˜๊ธฐ ์ „์— ๋กœ์ปฌ ํ™˜๊ฒฝ์—์„œ ์ปจํ…Œ์ด๋„ˆ๋กœ ์‹คํ–‰ํ•˜์—ฌ ์„œ๋ฒ„๊ฐ€ ์˜ˆ์ƒํ•œ ๋Œ€๋กœ ์ž‘๋™ํ•˜๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    1. ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€๋ฅผ ๋กœ์ปฌ์—์„œ ์‹คํ–‰ํ•˜๋ ค๋ฉด ์…ธ์—์„œ ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

      docker run -t -d -p 8000:8000 --rm \
        --name=local_object_detector \
        -e AIP_MODE=True \
        LOCATION_ID-docker.pkg.dev/PROJECT_ID/getting-started-nvidia-triton/vertex-triton-inference \
        --model-repository MODEL_ARTIFACTS_REPOSITORY \
        --strict-model-config=false
      

      ์ด์ „ ์„น์…˜์—์„œ ํ•œ ๊ฒƒ์ฒ˜๋Ÿผ ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

      ์ด ๋ช…๋ น์–ด๋Š” ์ปจํ…Œ์ด๋„ˆ์˜ ํฌํŠธ 8000์„ ๋กœ์ปฌ ํ™˜๊ฒฝ์˜ ํฌํŠธ 8000์œผ๋กœ ๋งคํ•‘ํ•˜์—ฌ ๋ถ„๋ฆฌ ๋ชจ๋“œ์—์„œ ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. NGC์˜ Triton ์ด๋ฏธ์ง€๋Š” ํฌํŠธ 8000์„ ์‚ฌ์šฉํ•˜๋„๋ก Triton์„ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

    2. ์ปจํ…Œ์ด๋„ˆ ์„œ๋ฒ„์— ์ƒํƒœ ์ ๊ฒ€์„ ๋ณด๋‚ด๋ ค๋ฉด ์…ธ์—์„œ ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

      curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/v2/health/ready
      

      ์„ฑ๊ณตํ•˜๋ฉด ์„œ๋ฒ„๊ฐ€ ์ƒํƒœ ์ฝ”๋“œ๋ฅผ 200์œผ๋กœ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    3. ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜์—ฌ ์ด์ „์— ์ƒ์„ฑ๋œ ํŽ˜์ด๋กœ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ปจํ…Œ์ด๋„ˆ ์„œ๋ฒ„์— ์ถ”๋ก  ์š”์ฒญ์„ ์ „์†กํ•˜๊ณ  ์ถ”๋ก  ์‘๋‹ต์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.

      curl -X POST \
          -H "Content-Type: application/json" \
          -d @instances.json \
          localhost:8000/v2/models/object_detector/infer |
             jq -c '.outputs[] | select(.name == "detection_classes")'
      

      ์ด ์š”์ฒญ์—๋Š” TensorFlow ๊ฐ์ฒด ๊ฐ์ง€ ์˜ˆ์‹œ์— ํฌํ•จ๋œ ํ…Œ์ŠคํŠธ ์ด๋ฏธ์ง€ ์ค‘ ํ•˜๋‚˜๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

      ์„ฑ๊ณตํ•˜๋ฉด ์„œ๋ฒ„๊ฐ€ ๋‹ค์Œ ์ถ”๋ก ์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

      {"name":"detection_classes","datatype":"FP32","shape":[1,300],"data":[38,1,...,44]}
      
    4. ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์ค‘์ง€ํ•˜๋ ค๋ฉด ์…ธ์—์„œ ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

      docker stop local_object_detector
      

    ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€๋ฅผ Artifact Registry์— ํ‘ธ์‹œ

    Artifact Registry์— ์•ก์„ธ์Šคํ•˜๋„๋ก Docker๋ฅผ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ํ›„ ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€๋ฅผ Artifact Registry ์ €์žฅ์†Œ์— ํ‘ธ์‹œํ•ฉ๋‹ˆ๋‹ค.

    1. ์„ ํƒํ•œ ๋ฆฌ์ „์˜ Artifact Registry๋กœ ํ‘ธ์‹œํ•  ์ˆ˜ ์žˆ๋„๋ก ๋กœ์ปฌ Docker์— ์„ค์น˜ ๊ถŒํ•œ์„ ์ œ๊ณตํ•˜๋ ค๋ฉด ์…ธ์—์„œ ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

      gcloud auth configure-docker LOCATION_ID-docker.pkg.dev
      
      • LOCATION_ID์„ ์ด์ „ ์„น์…˜์—์„œ ์ €์žฅ์†Œ๋ฅผ ๋งŒ๋“  ๋ฆฌ์ „์œผ๋กœ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.
    2. Artifact Registry์— ๋ฐ”๋กœ ์ „์— ๋นŒ๋“œํ•œ ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€๋ฅผ ํ‘ธ์‹œํ•˜๋ ค๋ฉด ์…ธ์—์„œ ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

      docker push LOCATION_ID-docker.pkg.dev/PROJECT_ID/getting-started-nvidia-triton/vertex-triton-inference
      

      ์ด์ „ ์„น์…˜์—์„œ ํ•œ ๊ฒƒ์ฒ˜๋Ÿผ ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

      • LOCATION_ID: ์ด์ „ ์„น์…˜์— ์ง€์ •๋œ ๋Œ€๋กœ Artifact Registry ์ €์žฅ์†Œ์˜ Rhe ๋ฆฌ์ „
      • PROJECT_ID: Google Cloudํ”„๋กœ์ ํŠธ์˜ ID์ž…๋‹ˆ๋‹ค.

    ๋ชจ๋ธ ๋ฐฐํฌ

    ์ด ์„น์…˜์—์„œ๋Š” ๋ชจ๋ธ๊ณผ ์—”๋“œํฌ์ธํŠธ๋ฅผ ๋งŒ๋“  ๋‹ค์Œ ๋ชจ๋ธ์„ ์—”๋“œํฌ์ธํŠธ์— ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค.

    ๋ชจ๋ธ ๋งŒ๋“ค๊ธฐ

    Triton์„ ์‹คํ–‰ํ•˜๋Š” ์ปค์Šคํ…€ ์ปจํ…Œ์ด๋„ˆ๊ฐ€ ์‚ฌ์šฉ๋˜๋Š” Model ๋ฆฌ์†Œ์Šค๋ฅผ ๋งŒ๋“ค๋ ค๋ฉด gcloud ai models upload ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

    ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ ์ „์— ์ปค์Šคํ…€ ์ปจํ…Œ์ด๋„ˆ ์„ค์ •์„ ์ฝ๊ณ  ์ปจํ…Œ์ด๋„ˆ์— ์„ ํƒ์‚ฌํ•ญ์ธ sharedMemorySizeMb, startupProbe, healthProbe ํ•„๋“œ๋ฅผ ์ง€์ •ํ•ด์•ผ ํ•˜๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”.

    gcloud ai models upload \
        --region=LOCATION_ID \
        --display-name=DEPLOYED_MODEL_NAME \
        --container-image-uri=LOCATION_ID-docker.pkg.dev/PROJECT_ID/getting-started-nvidia-triton/vertex-triton-inference \
        --artifact-uri=MODEL_ARTIFACTS_REPOSITORY \
        --container-args='--strict-model-config=false'
    
    • LOCATION_ID: Vertex AI๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฆฌ์ „
    • PROJECT_ID: Google Cloudํ”„๋กœ์ ํŠธ ID
    • DEPLOYED_MODEL_NAME: DeployedModel์˜ ์ด๋ฆ„. DeployedModel์˜ Model ํ‘œ์‹œ ์ด๋ฆ„๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    --container-args='--strict-model-config=false' ์ธ์ˆ˜๋Š” Triton์ด ๋ชจ๋ธ ๊ตฌ์„ฑ์„ ์ž๋™์œผ๋กœ ์ƒ์„ฑํ•˜๋„๋ก ํ—ˆ์šฉํ•ฉ๋‹ˆ๋‹ค.

    ์—”๋“œํฌ์ธํŠธ ๋งŒ๋“ค๊ธฐ

    ์˜จ๋ผ์ธ ์ถ”๋ก  ์ œ๊ณต์„ ์œ„ํ•ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ๋จผ์ € ๋ชจ๋ธ์„ ์—”๋“œํฌ์ธํŠธ์— ๋ฐฐํฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ์—”๋“œํฌ์ธํŠธ์— ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋Š” ๊ฒฝ์šฐ ์ด ๋‹จ๊ณ„๋ฅผ ๊ฑด๋„ˆ๋›ธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ ์˜ˆ์‹œ์—์„œ๋Š” gcloud ai endpoints create ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

    gcloud ai endpoints create \
        --region=LOCATION_ID \
        --display-name=ENDPOINT_NAME
    

    ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    • LOCATION_ID: Vertex AI๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฆฌ์ „
    • ENDPOINT_NAME: ์—”๋“œํฌ์ธํŠธ์˜ ํ‘œ์‹œ ์ด๋ฆ„

    Google Cloud CLI ๋„๊ตฌ๊ฐ€ ์—”๋“œํฌ์ธํŠธ๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐ ๋ช‡ ์ดˆ ์ •๋„ ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    ์—”๋“œํฌ์ธํŠธ์— ๋ชจ๋ธ ๋ฐฐํฌ

    ์—”๋“œํฌ์ธํŠธ๊ฐ€ ์ค€๋น„๋˜๋ฉด ์—”๋“œํฌ์ธํŠธ์— ๋ชจ๋ธ์„ ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค. ์—”๋“œํฌ์ธํŠธ์— ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋ฉด ์„œ๋น„์Šค๊ฐ€ ๋ฌผ๋ฆฌ์  ๋ฆฌ์†Œ์Šค๋ฅผ Triton ์‹คํ–‰ ๋ชจ๋ธ๊ณผ ์—ฐ๊ฒฐํ•˜์—ฌ ์˜จ๋ผ์ธ ์ถ”๋ก ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

    ๋‹ค์Œ ์˜ˆ์‹œ์—์„œ๋Š” gcloud ai endpoints deploy-model ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ GPU์— Triton์„ ์‹คํ–‰ํ•˜๋Š” endpoint์— Model์„ ๋ฐฐํฌํ•˜์—ฌ ์—ฌ๋Ÿฌ DeployedModel ๋ฆฌ์†Œ์Šค ๊ฐ„ ํŠธ๋ž˜ํ”ฝ์„ ๋ถ„ํ• ํ•˜์ง€ ์•Š๊ณ  ์ถ”๋ก  ์ œ๊ณต์„ ๊ฐ€์†ํ™”ํ•ฉ๋‹ˆ๋‹ค.

    ENDPOINT_ID=$(gcloud ai endpoints list \
        --region=LOCATION_ID \
        --filter=display_name=ENDPOINT_NAME \
        --format="value(name)")
    
    MODEL_ID=$(gcloud ai models list \
        --region=LOCATION_ID \
        --filter=display_name=DEPLOYED_MODEL_NAME \
        --format="value(name)")
    
    gcloud ai endpoints deploy-model $ENDPOINT_ID \
        --region=LOCATION_ID \
        --model=$MODEL_ID \
        --display-name=DEPLOYED_MODEL_NAME \
        --machine-type=MACHINE_TYPE \
        --min-replica-count=MIN_REPLICA_COUNT \
        --max-replica-count=MAX_REPLICA_COUNT \
        --accelerator=count=ACCELERATOR_COUNT,type=ACCELERATOR_TYPE \
        --traffic-split=0=100

    ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    • LOCATION_ID: Vertex AI๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฆฌ์ „
    • ENDPOINT_NAME: ์—”๋“œํฌ์ธํŠธ์˜ ํ‘œ์‹œ ์ด๋ฆ„
    • DEPLOYED_MODEL_NAME: DeployedModel์˜ ์ด๋ฆ„. DeployedModel์˜ Model ํ‘œ์‹œ ์ด๋ฆ„๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • MACHINE_TYPE: (์„ ํƒ์‚ฌํ•ญ) ์ด ๋ฐฐํฌ์˜ ๊ฐ ๋…ธ๋“œ์— ์‚ฌ์šฉ๋˜๋Š” ๋จธ์‹  ๋ฆฌ์†Œ์Šค. ๊ธฐ๋ณธ ์„ค์ •์€ n1-standard-2์ž…๋‹ˆ๋‹ค. ๋จธ์‹  ์œ ํ˜•์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด์„ธ์š”.
    • MIN_REPLICA_COUNT: ์ด ๋ฐฐํฌ์˜ ์ตœ์†Œ ๋…ธ๋“œ ์ˆ˜. ์ถ”๋ก  ๋กœ๋“œ ์‹œ ํ•„์š”์— ๋”ฐ๋ผ ๋…ธ๋“œ ์ˆ˜๋ฅผ ์ตœ๋Œ€ ๋…ธ๋“œ ์ˆ˜๊นŒ์ง€ ๋Š˜๋ฆฌ๊ฑฐ๋‚˜ ์ด ๋…ธ๋“œ ์ˆ˜๊นŒ์ง€ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • MAX_REPLICA_COUNT: ์ด ๋ฐฐํฌ์˜ ์ตœ๋Œ€ ๋…ธ๋“œ ์ˆ˜. ์ถ”๋ก  ๋กœ๋“œ ์‹œ ํ•„์š”์— ๋”ฐ๋ผ ์ด ๋…ธ๋“œ ์ˆ˜๋ฅผ ๋…ธ๋“œ ์ˆ˜๊นŒ์ง€ ๋Š˜๋ฆฌ๊ฑฐ๋‚˜ ์ตœ์†Œ ๋…ธ๋“œ ์ˆ˜๊นŒ์ง€ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ACCELERATOR_COUNT: ์ž‘์—…์„ ์‹คํ–‰ํ•˜๋Š” ๊ฐ ๋จธ์‹ ์— ์—ฐ๊ฒฐํ•  ๊ฐ€์†๊ธฐ ์ˆ˜. ์ผ๋ฐ˜์ ์œผ๋กœ 1์ž…๋‹ˆ๋‹ค. ์ง€์ •ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ ๊ธฐ๋ณธ๊ฐ’์€ 1์ž…๋‹ˆ๋‹ค.

    • ACCELERATOR_TYPE: GPU ์ œ๊ณต์„ ์œ„ํ•œ ๊ฐ€์†๊ธฐ ๊ตฌ์„ฑ์„ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ Compute Engine ๋จธ์‹  ์œ ํ˜•์œผ๋กœ ๋ชจ๋ธ์„ ๋ฐฐํฌํ•  ๋•Œ๋Š” GPU ๊ฐ€์†๊ธฐ๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์žˆ๊ณ  ์œ ํ˜•์„ ์ง€์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์„ ํƒ์ง€๋Š” nvidia-tesla-a100, nvidia-tesla-p100, nvidia-tesla-p4, nvidia-tesla-t4, nvidia-tesla-v100์ž…๋‹ˆ๋‹ค.

    Google Cloud CLI ๋„๊ตฌ๋กœ ๋ชจ๋ธ์„ ์—”๋“œํฌ์ธํŠธ์— ๋ฐฐํฌํ•˜๋ ค๋ฉด ๋ช‡ ์ดˆ ์ •๋„ ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ์„ฑ๊ณต์ ์œผ๋กœ ๋ฐฐํฌ๋˜๋ฉด ์ด ๋ช…๋ น์–ด๊ฐ€ ๋‹ค์Œ ์ถœ๋ ฅ์„ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค.

      Deployed a model to the endpoint xxxxx. Id of the deployed model: xxxxx.
    

    ๋ฐฐํฌ๋œ ๋ชจ๋ธ์—์„œ ์˜จ๋ผ์ธ ์ถ”๋ก  ๊ฐ€์ ธ์˜ค๊ธฐ

    Vertex AI ์ถ”๋ก  ์—”๋“œํฌ์ธํŠธ๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์„ ํ˜ธ์ถœํ•˜๋ ค๋ฉด ํ‘œ์ค€ ์ถ”๋ก  ์š”์ฒญ JSON ๊ฐ์ฒด ๋˜๋Š” ๋ฐ”์ด๋„ˆ๋ฆฌ ํ™•์žฅ์ž๊ฐ€ ์žˆ๋Š” ์ถ”๋ก  ์š”์ฒญ JSON ๊ฐ์ฒด ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก  ์š”์ฒญ์˜ ํ˜•์‹์„ ์ง€์ •ํ•˜๊ณ  Vertex AI REST rawPredict ์—”๋“œํฌ์ธํŠธ์— ์š”์ฒญ์„ ์ œ์ถœํ•ฉ๋‹ˆ๋‹ค.

    ๋‹ค์Œ ์˜ˆ์‹œ์—์„œ๋Š” gcloud ai endpoints raw-predict ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

    ENDPOINT_ID=$(gcloud ai endpoints list \
        --region=LOCATION_ID \
        --filter=display_name=ENDPOINT_NAME \
        --format="value(name)")
    
    gcloud ai endpoints raw-predict $ENDPOINT_ID \
        --region=LOCATION_ID \
        --http-headers=Content-Type=application/json \
        --request=@instances.json
    

    ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    • LOCATION_ID: Vertex AI๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฆฌ์ „
    • ENDPOINT_NAME: ์—”๋“œํฌ์ธํŠธ์˜ ํ‘œ์‹œ ์ด๋ฆ„

    ์—”๋“œํฌ์ธํŠธ๊ฐ€ ์œ ํšจํ•œ ์š”์ฒญ์— ๋Œ€ํ•ด ๋‹ค์Œ ์‘๋‹ต์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    {
        "id": "0",
        "model_name": "object_detector",
        "model_version": "1",
        "outputs": [{
            "name": "detection_anchor_indices",
            "datatype": "FP32",
            "shape": [1, 300],
            "data": [2.0, 1.0, 0.0, 3.0, 26.0, 11.0, 6.0, 92.0, 76.0, 17.0, 58.0, ...]
        }]
    }
    

    ์‚ญ์ œ

    Vertex AI ์š”๊ธˆ ๋ฐ Artifact Registry ์š”๊ธˆ์ด ๊ณ„์† ์ฒญ๊ตฌ๋˜์ง€ ์•Š๊ฒŒ ํ•˜๋ ค๋ฉด ์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ ๋งŒ๋“  Google Cloud ๋ฆฌ์†Œ์Šค๋ฅผ ์‚ญ์ œํ•ฉ๋‹ˆ๋‹ค.

    1. ์—”๋“œํฌ์ธํŠธ์—์„œ ๋ชจ๋ธ์„ ๋ฐฐํฌ ํ•ด์ œํ•˜๊ณ  ์—”๋“œํฌ์ธํŠธ๋ฅผ ์‚ญ์ œํ•˜๋ ค๋ฉด ์…ธ์—์„œ ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

      ENDPOINT_ID=$(gcloud ai endpoints list \
          --region=LOCATION_ID \
          --filter=display_name=ENDPOINT_NAME \
          --format="value(name)")
      
      DEPLOYED_MODEL_ID=$(gcloud ai endpoints describe $ENDPOINT_ID \
          --region=LOCATION_ID \
          --format="value(deployedModels.id)")
      
      gcloud ai endpoints undeploy-model $ENDPOINT_ID \
          --region=LOCATION_ID \
          --deployed-model-id=$DEPLOYED_MODEL_ID
      
      gcloud ai endpoints delete $ENDPOINT_ID \
          --region=LOCATION_ID \
          --quiet
      

      LOCATION_ID์„ ์ด์ „ ์„น์…˜์—์„œ ๋ชจ๋ธ์„ ๋งŒ๋“  ๋ฆฌ์ „์œผ๋กœ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    2. ๋ชจ๋ธ์„ ์‚ญ์ œํ•˜๋ ค๋ฉด ์…ธ์—์„œ ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

      MODEL_ID=$(gcloud ai models list \
          --region=LOCATION_ID \
          --filter=display_name=DEPLOYED_MODEL_NAME \
          --format="value(name)")
      
      gcloud ai models delete $MODEL_ID \
          --region=LOCATION_ID \
          --quiet
      

      LOCATION_ID์„ ์ด์ „ ์„น์…˜์—์„œ ๋ชจ๋ธ์„ ๋งŒ๋“  ๋ฆฌ์ „์œผ๋กœ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    3. Artifact Registry ์ €์žฅ์†Œ ๋ฐ ๊ทธ ์•ˆ์˜ ์ปจํ…Œ์ด๋„ˆ ์ด๋ฏธ์ง€๋ฅผ ์‚ญ์ œํ•˜๋ ค๋ฉด ์…ธ์—์„œ ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

      gcloud artifacts repositories delete getting-started-nvidia-triton \
        --location=LOCATION_ID \
        --quiet
      

      LOCATION_ID์„ ์ด์ „ ์„น์…˜์—์„œ Artifact Registry ์ €์žฅ์†Œ๋ฅผ ๋งŒ๋“  ๋ฆฌ์ „์œผ๋กœ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    ์ œํ•œ์‚ฌํ•ญ

    ๋‹ค์Œ ๋‹จ๊ณ„

    • Vertex AI์—์„œ NVIDIA Triton ์ถ”๋ก  ์„œ๋ฒ„๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฐํฌ ํŒจํ„ด์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด๋ ค๋ฉด NVIDIA Triton ๋…ธํŠธ๋ถ ํŠœํ† ๋ฆฌ์–ผ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.