GKE ๊ธฐ๋ฐ˜ Dataproc ๋…ธ๋“œ ํ’€

GKE ๊ธฐ๋ฐ˜ Dataproc ๊ฐ€์ƒ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋งŒ๋“ค๊ฑฐ๋‚˜ ์—…๋ฐ์ดํŠธํ•  ๋•Œ ๊ฐ€์ƒ ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ์ž‘์—… ์‹คํ–‰์„ ์œ„ํ•ด ์‚ฌ์šฉํ•  ํ•˜๋‚˜ ์ด์ƒ์˜ ๋…ธ๋“œ ํ’€์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค(์ด ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์ง€์ •๋œ ๋…ธ๋“œ ํ’€์—์„œ '์‚ฌ์šฉ'ํ•˜๊ฑฐ๋‚˜ '์—ฐ๊ฒฐ'๋œ ํด๋Ÿฌ์Šคํ„ฐ๋ผ๊ณ  ํ•จ). ์ง€์ •๋œ ๋…ธ๋“œ ํ’€์ด GKE ํด๋Ÿฌ์Šคํ„ฐ์— ์—†์œผ๋ฉด GKE ๊ธฐ๋ฐ˜ Dataproc์—์„œ ์‚ฌ์šฉ์ž๊ฐ€ ์ง€์ •ํ•œ ์„ค์ •์œผ๋กœ GKE ํด๋Ÿฌ์Šคํ„ฐ์— ๋…ธ๋“œ ํ’€์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๋…ธ๋“œ ํ’€์ด ์žˆ๊ณ  Dataproc๋กœ ์ƒ์„ฑ๋œ ๊ฒฝ์šฐ ๊ฒ€์‚ฌ๋ฅผ ํ†ตํ•ด ํ•ด๋‹น ์„ค์ •์ด ์ง€์ •๋œ ์„ค์ •๊ณผ ์ผ์น˜ํ•˜๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

GKE ๊ธฐ๋ฐ˜ Dataproc ๋…ธ๋“œ ํ’€ ์„ค์ •

GKE ๊ธฐ๋ฐ˜ Dataproc ๊ฐ€์ƒ ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋…ธ๋“œ ํ’€์— ๋‹ค์Œ ์„ค์ •์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์ด ์„ค์ •์€ GKE ๋…ธ๋“œ ํ’€ ์„ค์ •์˜ ํ•˜์œ„ ์ง‘ํ•ฉ์ž„).

  • accelerators
  • acceleratorCount
  • acceleratorType
  • gpuPartitionSize*
  • localSsdCount
  • machineType
  • minCpuPlatform
  • minNodeCount
  • maxNodeCount
  • preemptible
  • spot*

์ฐธ๊ณ :

  • gpuPartitionSize๋Š” Dataproc API GkeNodePoolAcceleratorConfig์—์„œ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • spot์€ Dataproc API GkeNodeConfig์—์„œ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋…ธ๋“œ ํ’€ ์‚ญ์ œ

GKE ๊ธฐ๋ฐ˜ Dataproc ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์‚ญ์ œํ•ด๋„ ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋…ธ๋“œ ํ’€์€ ์‚ญ์ œ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. GKE ๊ธฐ๋ฐ˜ Dataproc ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ๋” ์ด์ƒ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๋…ธ๋“œ ํ’€์„ ์‚ญ์ œํ•˜๋ ค๋ฉด ๋…ธ๋“œ ํ’€ ์‚ญ์ œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

๋…ธ๋“œ ํ’€ ์œ„์น˜

๊ฐ€์ƒ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋งŒ๋“ค๊ฑฐ๋‚˜ ์—…๋ฐ์ดํŠธํ•  ๋•Œ GKE ๊ธฐ๋ฐ˜ Dataproc ๊ฐ€์ƒ ํด๋Ÿฌ์Šคํ„ฐ์™€ ์—ฐ๊ฒฐ๋œ ๋…ธ๋“œ ํ’€์˜ ์˜์—ญ ์œ„์น˜๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋…ธ๋“œ ํ’€ ์˜์—ญ์ด ์—ฐ๊ฒฐ๋œ ๊ฐ€์ƒ ํด๋Ÿฌ์Šคํ„ฐ์˜ ๋ฆฌ์ „์— ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋…ธ๋“œ ํ’€ ์—ญํ•  ๋งคํ•‘

๋…ธ๋“œ ํ’€ ์—ญํ• ์€ Spark ๋“œ๋ผ์ด๋ฒ„์™€ ์‹คํ–‰์ž ์ž‘์—…์— ๋Œ€ํ•ด ์ •์˜๋˜๋ฉฐ ๊ธฐ๋ณธ ์—ญํ• ์€ ๋…ธ๋“œ ํ’€์˜ ๋ชจ๋“  ์ž‘์—… ์œ ํ˜•์— ๋Œ€ํ•ด ์ •์˜๋ฉ๋‹ˆ๋‹ค. GKE ๊ธฐ๋ฐ˜ Dataproc ํด๋Ÿฌ์Šคํ„ฐ์—๋Š” default ์—ญํ• ์ด ํ• ๋‹น๋œ ๋…ธ๋“œ ํ’€์ด ์ตœ์†Œ ํ•˜๋‚˜ ์ด์ƒ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์—ญํ•  ํ• ๋‹น์€ ์„ ํƒ์‚ฌํ•ญ์ž…๋‹ˆ๋‹ค.

๊ถŒ์žฅ์‚ฌํ•ญ: ์—ญํ•  ์š”๊ตฌ์‚ฌํ•ญ์— ๋”ฐ๋ฅธ ๋…ธ๋“œ ์œ ํ˜• ๋ฐ ํฌ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—ญํ•  ์œ ํ˜•๋งˆ๋‹ค ๋ณ„๋„์˜ ๋…ธ๋“œ ํ’€์„ ๋งŒ๋“œ์„ธ์š”.

gcloud CLI ๊ฐ€์ƒ ํด๋Ÿฌ์Šคํ„ฐ ๋งŒ๋“ค๊ธฐ ์˜ˆ์‹œ:

gcloud dataproc clusters gke create "${DP_CLUSTER}" \
  --region=${REGION} \
  --gke-cluster=${GKE_CLUSTER} \
  --spark-engine-version=latest \
  --staging-bucket=${BUCKET} \
  --pools="name=${DP_POOLNAME},roles=default \
  --setup-workload-identity
  --pools="name=${DP_CTRL_POOLNAME},roles=default,machineType=e2-standard-4" \
  --pools="name=${DP_DRIVER_POOLNAME},min=1,max=3,roles=spark-driver,machineType=n2-standard-4" \
  --pools="name=${DP_EXEC_POOLNAME},min=1,max=10,roles=spark-executor,machineType=n2-standard-8"