ํด๋Ÿฌ์Šคํ„ฐ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ

Dataproc์€ ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ์‹คํ–‰๋˜๋Š” ์ธ์Šคํ„ด์Šค์˜ ํŠน์ˆ˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๊ฐ’์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ํ‚ค๊ฐ’
dataproc-bucketํด๋Ÿฌ์Šคํ„ฐ์˜ ์Šคํ…Œ์ด์ง• ๋ฒ„ํ‚ท ์ด๋ฆ„
dataproc-regionํด๋Ÿฌ์Šคํ„ฐ์˜ ์—”๋“œํฌ์ธํŠธ ์ง€์—ญ
dataproc-worker-countํด๋Ÿฌ์Šคํ„ฐ์— ์žˆ๋Š” ์›Œ์ปค ๋…ธ๋“œ ์ˆ˜. ๋‹จ์ผ ๋…ธ๋“œ ํด๋Ÿฌ์Šคํ„ฐ์˜ ๊ฒฝ์šฐ ๊ฐ’์€ 0์ž…๋‹ˆ๋‹ค.
dataproc-cluster-nameํด๋Ÿฌ์Šคํ„ฐ์˜ ์ด๋ฆ„
dataproc-cluster-uuidํด๋Ÿฌ์Šคํ„ฐ์˜ UUID
dataproc-role์ธ์Šคํ„ด์Šค ์—ญํ• (Master ๋˜๋Š” Worker)
dataproc-master์ฒซ ๋ฒˆ์งธ ๋งˆ์Šคํ„ฐ ๋…ธ๋“œ์˜ ํ˜ธ์ŠคํŠธ ์ด๋ฆ„. ๊ฐ’์€ ํ‘œ์ค€ ๋˜๋Š” ๋‹จ์ผ ๋…ธ๋“œ ํด๋Ÿฌ์Šคํ„ฐ์—์„œ๋Š” [CLUSTER_NAME]-m์ด๊ฑฐ๋‚˜ ๊ณ ๊ฐ€์šฉ์„ฑ ํด๋Ÿฌ์Šคํ„ฐ์—์„œ๋Š” [CLUSTER_NAME]-m-0์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ [CLUSTER_NAME]์€ ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ด๋ฆ„์ž…๋‹ˆ๋‹ค.
dataproc-master-additional๊ณ ๊ฐ€์šฉ์„ฑ ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ์ถ”๊ฐ€ ๋งˆ์Šคํ„ฐ ๋…ธ๋“œ์˜ ํ˜ธ์ŠคํŠธ ์ด๋ฆ„์„ ์‰ผํ‘œ๋กœ ๊ตฌ๋ถ„ํ•œ ๋ชฉ๋ก์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋งˆ์Šคํ„ฐ ๋…ธ๋“œ๊ฐ€ 3๊ฐœ์ธ ํด๋Ÿฌ์Šคํ„ฐ์˜ ๊ฒฝ์šฐ [CLUSTER_NAME]-m-1,[CLUSTER_NAME]-m-2์ž…๋‹ˆ๋‹ค.
SPARK_BQ_CONNECTOR_VERSION or SPARK_BQ_CONNECTOR_URLSpark ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ์‚ฌ์šฉํ•  Spark BigQuery ์ปค๋„ฅํ„ฐ ๋ฒ„์ „์„ ๊ฐ€๋ฆฌํ‚ค๋Š” ๋ฒ„์ „ ๋˜๋Š” URL์ž…๋‹ˆ๋‹ค(์˜ˆ: 0.42.1 ๋˜๋Š” gs://spark-lib/bigquery/spark-3.5-bigquery-0.42.1.jar). ๊ธฐ๋ณธ Spark BigQuery ์ปค๋„ฅํ„ฐ ๋ฒ„์ „์€ Dataproc 2.1 ์ด์ƒ ์ด๋ฏธ์ง€ ๋ฒ„์ „ ํด๋Ÿฌ์Šคํ„ฐ์— ์‚ฌ์ „ ์„ค์น˜๋ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ Spark BigQuery ์ปค๋„ฅํ„ฐ ์‚ฌ์šฉ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.

์ด ๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ ์ดˆ๊ธฐํ™” ์ž‘์—…์˜ ๋™์ž‘์„ ๋งž์ถค์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

gcloud dataproc clusters create ๋ช…๋ น์–ด์—์„œ --metadata ํ”Œ๋ž˜๊ทธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž์ฒด ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

gcloud dataproc clusters create CLUSTER_NAME \
    --region=REGION \
    --metadata=name1=value1,name2=value2... \
    ... other flags ...