Saxml์„ ํ†ตํ•ด Vertex AI์—์„œ TPU๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Gemma ๊ฐœ๋ฐฉํ˜• ๋ชจ๋ธ ์ œ๊ณต

์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” Saxml์„ ์‚ฌ์šฉํ•ด์„œ Vertex AI์—์„œ Tensor Processing Unit(TPU)์„ ์‚ฌ์šฉํ•˜์—ฌ Gemma ๊ฐœ๋ฐฉํ˜• ๋ชจ๋ธ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์„ ์ œ๊ณตํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” 2B ๋ฐ 7B ๋งค๊ฐœ๋ณ€์ˆ˜ ๋ช…๋ น์ด ์กฐ์ •๋œ Gemma ๋ชจ๋ธ์„ Cloud Storage์— ๋‹ค์šด๋กœ๋“œํ•˜๊ณ  TPU์—์„œ Saxml์„ ์‹คํ–‰ํ•˜๋Š” Vertex AI์— ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค.

๋ฐฐ๊ฒฝ

Saxml์„ ํ†ตํ•ด Vertex AI์—์„œ TPU๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Gemma๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ํ•˜์œ„ ์ˆ˜์ค€ ์ธํ”„๋ผ๋ฅผ ๊ด€๋ฆฌํ•˜๊ณ  LLM์„ ์ œ๊ณตํ•˜๋Š” ๋น„์šฉ ํšจ์œจ์ ์ธ ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•˜๋Š” ๊ด€๋ฆฌํ˜• AI ์†”๋ฃจ์…˜์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์„น์…˜์—์„œ๋Š” ์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์ฃผ์š” ๊ธฐ์ˆ ์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

Gemma

Gemma๋Š” ์˜คํ”ˆ ๋ผ์ด์„ ์Šค๋กœ ์ถœ์‹œ๋œ ๊ณต๊ฐœ์ ์œผ๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ฐ€๋ฒผ์šด ์ƒ์„ฑํ˜• ์ธ๊ณต์ง€๋Šฅ(AI) ๋ชจ๋ธ ์ง‘ํ•ฉ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ AI ๋ชจ๋ธ์€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜, ํ•˜๋“œ์›จ์–ด, ํœด๋Œ€๊ธฐ๊ธฐ ๋˜๋Š” ํ˜ธ์ŠคํŒ…๋œ ์„œ๋น„์Šค์—์„œ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ…์ŠคํŠธ ์ƒ์„ฑ์— Gemma ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ํŠน์ˆ˜ํ•œ ํƒœ์Šคํฌ๋ฅผ ์œ„ํ•ด ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์„ ์กฐ์ •ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

์ž์„ธํ•œ ๋‚ด์šฉ์€ Gemma ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Saxml

Saxml์€ ์ถ”๋ก ์„ ์œ„ํ•ด Paxml, JAX, PyTorch ๋ชจ๋ธ์„ ์ œ๊ณตํ•˜๋Š” ์‹คํ—˜์šฉ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. ์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” Saxml์— ๋” ๋น„์šฉ ํšจ์œจ์ ์ธ TPU์—์„œ Gemma๋ฅผ ์ œ๊ณตํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. GPU ์„ค์ •๋„ ๋น„์Šทํ•ฉ๋‹ˆ๋‹ค. Saxml์€ ์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ ์‚ฌ์šฉํ•  Vertex AI์šฉ ์ปจํ…Œ์ด๋„ˆ๋ฅผ ๋นŒ๋“œํ•˜๋Š” ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

TPU

TPU๋Š” Google์—์„œ ์ปค์Šคํ…€ ๊ฐœ๋ฐœํ•œ ASIC(Application-Specific Integrated Circuits)๋กœ์„œ TensorFlow, PyTorch, JAX์™€ ๊ฐ™์€ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ฐ€์†ํ™”ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

์ด ํŠœํ† ๋ฆฌ์–ผ์€ Gemma 2B ๋ฐ Gemma 7B ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. Vertex AI๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋‹จ์ผ ํ˜ธ์ŠคํŠธ TPU v5e ๋…ธ๋“œ ํ’€์—์„œ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์„ ํ˜ธ์ŠคํŒ…ํ•ฉ๋‹ˆ๋‹ค.

  • Gemma 2B: ํ•˜๋‚˜์˜ TPU ์นฉ์„ ๋‚˜ํƒ€๋‚ด๋Š” 1x1 ํ† ํด๋กœ์ง€๊ฐ€ ์žˆ๋Š” TPU v5e ๋…ธ๋“œ ํ’€์—์„œ ํ˜ธ์ŠคํŒ…๋ฉ๋‹ˆ๋‹ค. ๋…ธ๋“œ์˜ ๋จธ์‹  ์œ ํ˜•์€ ct5lp-hightpu-1t์ž…๋‹ˆ๋‹ค.
  • Gemma 7B: 4๊ฐœ์˜ TPU ์นฉ์„ ๋‚˜ํƒ€๋‚ด๋Š” 2x2 ํ† ํด๋กœ์ง€๊ฐ€ ์žˆ๋Š” TPU v5e ๋…ธ๋“œ ํ’€์—์„œ ํ˜ธ์ŠคํŒ…๋ฉ๋‹ˆ๋‹ค. ๋…ธ๋“œ์˜ ๋จธ์‹  ์œ ํ˜•์€ ct5lp-hightpu-4t์ž…๋‹ˆ๋‹ค.

์‹œ์ž‘ํ•˜๊ธฐ ์ „์—

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Vertex AI API.

    Enable the API

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Vertex AI API.

    Enable the API

  8. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

  9. ์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” Cloud Shell์„ ์‚ฌ์šฉํ•˜์—ฌ Google Cloud์™€ ์ƒํ˜ธ์ž‘์šฉํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. Cloud Shell ๋Œ€์‹  ๋‹ค๋ฅธ ์…ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ๋‹ค์Œ ์ถ”๊ฐ€ ๊ตฌ์„ฑ์„ ์ˆ˜ํ–‰ํ•˜์„ธ์š”.

    1. Install the Google Cloud CLI.

    2. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

    3. To initialize the gcloud CLI, run the following command:

      gcloud init
    4. Vertex AI์šฉ TPU v5e ์นฉ์— ์ถฉ๋ถ„ํ•œ ํ• ๋‹น๋Ÿ‰์ด ์žˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ ์ด ํ• ๋‹น๋Ÿ‰์€ 0์ž…๋‹ˆ๋‹ค. 1x1 ํ† ํด๋กœ์ง€์˜ ๊ฒฝ์šฐ 1์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. 2x2์˜ ๊ฒฝ์šฐ 4์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋‘ ํ† ํด๋กœ์ง€๋ฅผ ๋ชจ๋‘ ์‹คํ–‰ํ•˜๋ ค๋ฉด 5์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.
    5. Kaggle ๊ณ„์ •์ด ์—†๋Š” ๊ฒฝ์šฐ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

    ๋ชจ๋ธ ์•ก์„ธ์Šค ๊ถŒํ•œ ์–ป๊ธฐ

    Cloud Shell์—๋Š” ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•œ ๋ฆฌ์†Œ์Šค๊ฐ€ ์—†์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ ํ•ด๋‹น ์ž‘์—…์„ ์‹คํ–‰ํ•  Vertex AI Workbench ์ธ์Šคํ„ด์Šค๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    Vertex AI์— ๋ฐฐํฌํ•˜๊ธฐ ์œ„ํ•ด Gemma ๋ชจ๋ธ์— ์•ก์„ธ์Šคํ•˜๋ ค๋ฉด Kaggle ํ”Œ๋žซํผ์— ๋กœ๊ทธ์ธํ•˜๊ณ  ๋ผ์ด์„ ์Šค ๋™์˜ ๊ณ„์•ฝ์— ์„œ๋ช…ํ•˜๊ณ  Kaggle API ํ† ํฐ์„ ๊ฐ€์ ธ์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” Kaggle ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด์— Kubernetes ๋ณด์•ˆ ๋น„๋ฐ€์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

    Gemma๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ๋™์˜ ๊ณ„์•ฝ์— ์„œ๋ช…ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ ์•ˆ๋‚ด๋ฅผ ๋”ฐ๋ฅด์„ธ์š”.

    1. Kaggle.com์˜ ๋ชจ๋ธ ๋™์˜ ํŽ˜์ด์ง€์— ์•ก์„ธ์Šคํ•ฉ๋‹ˆ๋‹ค.
    2. ์•„์ง ๋กœ๊ทธ์ธํ•˜์ง€ ์•Š์•˜๋‹ค๋ฉด Kaggle์— ๋กœ๊ทธ์ธํ•ฉ๋‹ˆ๋‹ค.
    3. ์•ก์„ธ์Šค ์š”์ฒญ์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.
    4. ๋™์˜๋ฅผ ์œ„ํ•œ ๊ณ„์ • ์„ ํƒ ์„น์…˜์—์„œ Kaggle ๊ณ„์ •์„ ํ†ตํ•ด ์ธ์ฆ์„ ์„ ํƒํ•˜์—ฌ ๋™์˜๋ฅผ ์œ„ํ•ด Kaggle ๊ณ„์ •์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    5. ๋ชจ๋ธ ์ด์šฉ์•ฝ๊ด€์— ๋™์˜ํ•ฉ๋‹ˆ๋‹ค.

    ์•ก์„ธ์Šค ํ† ํฐ ์ƒ์„ฑ

    Kaggle์„ ํ†ตํ•ด ๋ชจ๋ธ์— ์•ก์„ธ์Šคํ•˜๋ ค๋ฉด Kaggle API ํ† ํฐ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

    ์•„์ง ํ† ํฐ์ด ์—†์œผ๋ฉด ๋‹ค์Œ ๋‹จ๊ณ„์— ๋”ฐ๋ผ ์ƒˆ ํ† ํฐ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

    1. ๋ธŒ๋ผ์šฐ์ €์—์„œ Kaggle ์„ค์ •์œผ๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.
    2. API ์„น์…˜์—์„œ ์ƒˆ ํ† ํฐ ๋งŒ๋“ค๊ธฐ๋ฅผ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

      kaggle.json์ด๋ผ๋Š” ํŒŒ์ผ์ด ๋‹ค์šด๋กœ๋“œ๋ฉ๋‹ˆ๋‹ค.

    Cloud Shell์— ์•ก์„ธ์Šค ํ† ํฐ ์—…๋กœ๋“œ

    Cloud Shell์—์„œ Kaggle API ํ† ํฐ์„ Google Cloudํ”„๋กœ์ ํŠธ์— ์—…๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    1. Cloud Shell์—์„œ ๋”๋ณด๊ธฐ > ์—…๋กœ๋“œ๋ฅผ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.
    2. ํŒŒ์ผ์„ ์„ ํƒํ•˜๊ณ  ํŒŒ์ผ ์„ ํƒ์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.
    3. kaggle.json ํŒŒ์ผ์„ ์—ฝ๋‹ˆ๋‹ค.
    4. ์—…๋กœ๋“œ๋ฅผ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

    Cloud Storage ๋ฒ„ํ‚ท ๋งŒ๋“ค๊ธฐ

    ๋ชจ๋ธ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ €์žฅํ•  Cloud Storage ๋ฒ„ํ‚ท์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

    Cloud Shell์—์„œ ๋‹ค์Œ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

    gcloud storage buckets create gs://CHECKPOINTS_BUCKET_NAME
    

    CHECKPOINTS_BUCKET_NAME์„ ๋ชจ๋ธ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ €์žฅํ•˜๋Š” Cloud Storage ๋ฒ„ํ‚ท์˜ ์ด๋ฆ„์œผ๋กœ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    Cloud Storage ๋ฒ„ํ‚ท์— ๋ชจ๋ธ ๋ณต์‚ฌ

    Cloud Shell์—์„œ ๋‹ค์Œ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

    pip install kaggle --break-system-packages
    
    # For Gemma 2B
    mkdir -p /data/gemma_2b-it
    kaggle models instances versions download google/gemma/pax/2b-it/1 --untar -p /data/gemma_2b-it
    gcloud storage cp /data/gemma_2b-it/* gs://CHECKPOINTS_BUCKET_NAME/gemma_2b-it/ --recursive
    
    # For Gemma 7B
    mkdir -p /data/gemma_7b-it
    kaggle models instances versions download google/gemma/pax/7b-it/1 --untar -p /data/gemma_7b-it
    gcloud storage cp /data/gemma_7b-it/* gs://CHECKPOINTS_BUCKET_NAME/gemma_7b-it/ --recursive
    

    ๋ชจ๋ธ ๋ฐฐํฌ

    ๋ชจ๋ธ ์—…๋กœ๋“œ

    Saxml ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” Model ๋ฆฌ์†Œ์Šค๋ฅผ ์—…๋กœ๋“œํ•˜๋ ค๋ฉด ๋‹ค์Œ gcloud ai models upload ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

    Gemma 2B-it

    gcloud ai models upload \
      --region=LOCATION \
      --display-name=DEPLOYED_MODEL_NAME \
      --container-image-uri=us-docker.pkg.dev/vertex-ai/prediction/sax-tpu:latest \
      --artifact-uri='gs://CHECKPOINTS_BUCKET_NAME/gemma_2b-it/' \
      --container-args='--model_path=saxml.server.pax.lm.params.gemma.Gemma2BFP16' \
      --container-args='--platform_chip=tpuv5e' \
      --container-args='--platform_topology=2x2' \
      --container-args='--ckpt_path_suffix=checkpoint_00000000' \
      --container-ports=8502
    

    Gemma 7B-it

    gcloud ai models upload \
      --region=LOCATION \
      --display-name=DEPLOYED_MODEL_NAME \
      --container-image-uri=us-docker.pkg.dev/vertex-ai/prediction/sax-tpu:latest \
      --artifact-uri='gs://CHECKPOINTS_BUCKET_NAME/gemma_7b-it/' \
      --container-args='--model_path=saxml.server.pax.lm.params.gemma.Gemma7BFP16' \
      --container-args='--platform_chip=tpuv5e' \
      --container-args='--platform_topology=2x2' \
      --container-args='--ckpt_path_suffix=checkpoint_00000000' \
      --container-ports=8502
    

    ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    • PROJECT_ID: Google Cloudํ”„๋กœ์ ํŠธ ID
    • LOCATION_ID: Vertex AI๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฆฌ์ „ TPU๋Š” us-west1์—์„œ๋งŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • DEPLOYED_MODEL_NAME: DeployedModel์˜ ์ด๋ฆ„. DeployedModel์˜ Model ํ‘œ์‹œ ์ด๋ฆ„๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    ์—”๋“œํฌ์ธํŠธ ๋งŒ๋“ค๊ธฐ

    ์˜จ๋ผ์ธ ์ถ”๋ก  ์ œ๊ณต์„ ์œ„ํ•ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ๋จผ์ € ๋ชจ๋ธ์„ ์—”๋“œํฌ์ธํŠธ์— ๋ฐฐํฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ์—”๋“œํฌ์ธํŠธ์— ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋Š” ๊ฒฝ์šฐ ์ด ๋‹จ๊ณ„๋ฅผ ๊ฑด๋„ˆ๋›ธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ ์˜ˆ์‹œ์—์„œ๋Š” gcloud ai endpoints create ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

    gcloud ai endpoints create \
      --region=LOCATION \
      --display-name=ENDPOINT_NAME
    

    ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    • LOCATION_ID: Vertex AI๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฆฌ์ „
    • ENDPOINT_NAME: ์—”๋“œํฌ์ธํŠธ์˜ ํ‘œ์‹œ ์ด๋ฆ„

    Google Cloud CLI ๋„๊ตฌ๊ฐ€ ์—”๋“œํฌ์ธํŠธ๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐ ๋ช‡ ์ดˆ ์ •๋„ ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    ์—”๋“œํฌ์ธํŠธ์— ๋ชจ๋ธ ๋ฐฐํฌ

    ์—”๋“œํฌ์ธํŠธ๊ฐ€ ์ค€๋น„๋˜๋ฉด ์—”๋“œํฌ์ธํŠธ์— ๋ชจ๋ธ์„ ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค.

    ENDPOINT_ID=$(gcloud ai endpoints list \
       --region=LOCATION \
       --filter=display_name=ENDPOINT_NAME \
       --format="value(name)")
    
    MODEL_ID=$(gcloud ai models list \
       --region=LOCATION \
       --filter=display_name=DEPLOYED_MODEL_NAME \
       --format="value(name)")
    
    gcloud ai endpoints deploy-model $ENDPOINT_ID \
      --region=LOCATION \
      --model=$MODEL_ID \
      --display-name=DEPLOYED_MODEL_NAME \
      --machine-type=ct5lp-hightpu-4t \
      --traffic-split=0=100
    

    ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    • LOCATION_ID: Vertex AI๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฆฌ์ „
    • ENDPOINT_NAME: ์—”๋“œํฌ์ธํŠธ์˜ ํ‘œ์‹œ ์ด๋ฆ„
    • DEPLOYED_MODEL_NAME: DeployedModel์˜ ์ด๋ฆ„. DeployedModel์˜ Model ํ‘œ์‹œ ์ด๋ฆ„๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    Gemma 2B๋Š” ๋” ์ž‘์€ ct5lp-hightpu-1t ๋จธ์‹ ์— ๋ฐฐํฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ ๋ชจ๋ธ์„ ์—…๋กœ๋“œํ•  ๋•Œ --platform_topology=1x1์„ ์ง€์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

    Google Cloud CLI ๋„๊ตฌ๋กœ ๋ชจ๋ธ์„ ์—”๋“œํฌ์ธํŠธ์— ๋ฐฐํฌํ•˜๋ ค๋ฉด ๋ช‡ ๋ถ„ ์ •๋„ ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ์„ฑ๊ณต์ ์œผ๋กœ ๋ฐฐํฌ๋˜๋ฉด ์ด ๋ช…๋ น์–ด๊ฐ€ ๋‹ค์Œ ์ถœ๋ ฅ์„ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค.

      Deployed a model to the endpoint xxxxx. Id of the deployed model: xxxxx.
    

    ๋ฐฐํฌ๋œ ๋ชจ๋ธ์—์„œ ์˜จ๋ผ์ธ ์ถ”๋ก  ๊ฐ€์ ธ์˜ค๊ธฐ

    Vertex AI ์—”๋“œํฌ์ธํŠธ๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์„ ํ˜ธ์ถœํ•˜๋ ค๋ฉด ํ‘œ์ค€ ์ถ”๋ก  ์š”์ฒญ JSON ๊ฐ์ฒด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก  ์š”์ฒญ์˜ ํ˜•์‹์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

    ๋‹ค์Œ ์˜ˆ์‹œ์—์„œ๋Š” gcloud ai endpoints predict ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

    ENDPOINT_ID=$(gcloud ai endpoints list \
       --region=LOCATION \
       --filter=display_name=ENDPOINT_NAME \
       --format="value(name)")
    
    gcloud ai endpoints predict $ENDPOINT_ID \
      --region=LOCATION \
      --http-headers=Content-Type=application/json \
      --json-request instances.json
    

    ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    • LOCATION_ID: Vertex AI๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฆฌ์ „
    • ENDPOINT_NAME: ์—”๋“œํฌ์ธํŠธ์˜ ํ‘œ์‹œ ์ด๋ฆ„
    • instances.json์˜ ํ˜•์‹์€ {"instances": [{"text_batch": "<your prompt>"},{...}]}์ž…๋‹ˆ๋‹ค.

    ์‚ญ์ œ

    Vertex AI ์š”๊ธˆ ๋ฐ Artifact Registry ์š”๊ธˆ์ด ๊ณ„์† ์ฒญ๊ตฌ๋˜์ง€ ์•Š๊ฒŒ ํ•˜๋ ค๋ฉด ์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ ๋งŒ๋“  Google Cloud ๋ฆฌ์†Œ์Šค๋ฅผ ์‚ญ์ œํ•ฉ๋‹ˆ๋‹ค.

    1. ์—”๋“œํฌ์ธํŠธ์—์„œ ๋ชจ๋ธ์„ ๋ฐฐํฌ ํ•ด์ œํ•˜๊ณ  ์—”๋“œํฌ์ธํŠธ๋ฅผ ์‚ญ์ œํ•˜๋ ค๋ฉด ์…ธ์—์„œ ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

      ENDPOINT_ID=$(gcloud ai endpoints list \
         --region=LOCATION \
         --filter=display_name=ENDPOINT_NAME \
         --format="value(name)")
      
      DEPLOYED_MODEL_ID=$(gcloud ai endpoints describe $ENDPOINT_ID \
         --region=LOCATION \
         --format="value(deployedModels.id)")
      
      gcloud ai endpoints undeploy-model $ENDPOINT_ID \
        --region=LOCATION \
        --deployed-model-id=$DEPLOYED_MODEL_ID
      
      gcloud ai endpoints delete $ENDPOINT_ID \
         --region=LOCATION \
         --quiet
      

      LOCATION์„ ์ด์ „ ์„น์…˜์—์„œ ๋ชจ๋ธ์„ ๋งŒ๋“  ๋ฆฌ์ „์œผ๋กœ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    2. ๋ชจ๋ธ์„ ์‚ญ์ œํ•˜๋ ค๋ฉด ์…ธ์—์„œ ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

      MODEL_ID=$(gcloud ai models list \
         --region=LOCATION \
         --filter=display_name=DEPLOYED_MODEL_NAME \
         --format="value(name)")
      
      gcloud ai models delete $MODEL_ID \
         --region=LOCATION \
         --quiet
      

      LOCATION์„ ์ด์ „ ์„น์…˜์—์„œ ๋ชจ๋ธ์„ ๋งŒ๋“  ๋ฆฌ์ „์œผ๋กœ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    ์ œํ•œ์‚ฌํ•ญ

    • Vertex AI์—์„œ๋Š” Cloud TPU๊ฐ€ us-west1์—์„œ๋งŒ ์ง€์›๋ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์œ„์น˜๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

    ๋‹ค์Œ ๋‹จ๊ณ„

    • Llama2 ๋ฐ GPT-J์™€ ๊ฐ™์€ ๋‹ค๋ฅธ Saxml ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ด…๋‹ˆ๋‹ค.