Pub/Sub Subscription to BigQuery ํ…œํ”Œ๋ฆฟ

Pub/Sub Subscription to BigQuery ํ…œํ”Œ๋ฆฟ์€ Pub/Sub ๊ตฌ๋…์—์„œ JSON ํ˜•์‹์˜ ๋ฉ”์‹œ์ง€๋ฅผ ์ฝ๊ณ  BigQuery ํ…Œ์ด๋ธ”์— ์“ฐ๋Š” ์ŠคํŠธ๋ฆฌ๋ฐ ํŒŒ์ดํ”„๋ผ์ธ์ž…๋‹ˆ๋‹ค. ํ…œํ”Œ๋ฆฟ์„ ์‚ฌ์šฉํ•˜์—ฌ Pub/Sub ๋ฐ์ดํ„ฐ๋ฅผ BigQuery๋กœ ๋น ๋ฅด๊ฒŒ ์ด๋™ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ํ…œํ”Œ๋ฆฟ์€ Pub/Sub์—์„œ JSON ํ˜•์‹์˜ ๋ฉ”์‹œ์ง€๋ฅผ ์ฝ๊ณ  BigQuery ์š”์†Œ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

์ด ์‹œ๋‚˜๋ฆฌ์˜ค์— Dataflow ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•˜๊ธฐ ์ „์— UDF๊ฐ€ ์žˆ๋Š” Pub/Sub BigQuery ๊ตฌ๋…์ด ์š”๊ตฌ์‚ฌํ•ญ์„ ์ถฉ์กฑํ•˜๋Š”์ง€ ๊ณ ๋ คํ•˜์„ธ์š”.

ํŒŒ์ดํ”„๋ผ์ธ ์š”๊ตฌ์‚ฌํ•ญ

  • Pub/Sub ๋ฉ”์‹œ์ง€์˜ data ํ•„๋“œ๋Š” ์ด JSON ๊ฐ€์ด๋“œ์— ์„ค๋ช…๋œ JSON ํ˜•์‹์„ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด {"k1":"v1", "k2":"v2"} ํ˜•์‹์œผ๋กœ ๋œ data ํ•„๋“œ ๊ฐ’์ด ์žˆ๋Š” ๋ฉ”์‹œ์ง€๋Š” k1 ๋ฐ k2๋ผ๋Š” ๋‘ ๊ฐœ์˜ ์—ด ๊ทธ๋ฆฌ๊ณ  ๋ฌธ์ž์—ด ๋ฐ์ดํ„ฐ ์œ ํ˜•์œผ๋กœ BigQuery ํ…Œ์ด๋ธ”์— ์‚ฝ์ž…๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•˜๊ธฐ ์ „์— ์ถœ๋ ฅ ํ…Œ์ด๋ธ”์ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ํ…Œ์ด๋ธ” ์Šคํ‚ค๋งˆ๋Š” ์ž…๋ ฅ JSON ๊ฐ์ฒด์™€ ์ผ์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

ํ…œํ”Œ๋ฆฟ ๋งค๊ฐœ๋ณ€์ˆ˜

ํ•„์ˆ˜ ๋งค๊ฐœ๋ณ€์ˆ˜

  • outputTableSpec: BigQuery ์ถœ๋ ฅ ํ…Œ์ด๋ธ” ์œ„์น˜๋กœ, ํ˜•์‹์€ <PROJECT_ID>:<DATASET_NAME>.<TABLE_NAME>์ž…๋‹ˆ๋‹ค.
  • inputSubscription: ์ฝ์–ด์˜ฌ Pub/Sub ์ž…๋ ฅ ๊ตฌ๋…์œผ๋กœ, ํ˜•์‹์€ projects/<PROJECT_ID>/subscriptions/<SUBSCRIPTION>์ž…๋‹ˆ๋‹ค.

์„ ํƒ์  ๋งค๊ฐœ๋ณ€์ˆ˜

  • outputDeadletterTable: ์ถœ๋ ฅ ํ…Œ์ด๋ธ”์— ๋„๋‹ฌํ•˜์ง€ ๋ชปํ•œ ๋ฉ”์‹œ์ง€์— ์‚ฌ์šฉํ•  BigQuery ํ…Œ์ด๋ธ”๋กœ, <PROJECT_ID>:<DATASET_NAME>.<TABLE_NAME> ํ˜•์‹์ž…๋‹ˆ๋‹ค. ํ…Œ์ด๋ธ”์ด ์—†์œผ๋ฉด ํŒŒ์ดํ”„๋ผ์ธ ์‹คํ–‰ ์ค‘์— ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ง€์ •ํ•˜์ง€ ์•Š์œผ๋ฉด OUTPUT_TABLE_SPEC_error_records์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • javascriptTextTransformGcsPath: ์‚ฌ์šฉํ•  JavaScript ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜(UDF)๋ฅผ ์ •์˜ํ•˜๋Š” .js ํŒŒ์ผ์˜ Cloud Storage URI์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด gs://my-bucket/my-udfs/my_file.js์ž…๋‹ˆ๋‹ค.
  • javascriptTextTransformFunctionName: ์‚ฌ์šฉํ•  JavaScript ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜(UDF) ์ด๋ฆ„์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด JavaScript ํ•จ์ˆ˜๊ฐ€ myTransform(inJson) { /*...do stuff...*/ }์ด๋ฉด ํ•จ์ˆ˜ ์ด๋ฆ„์€ myTransform์ž…๋‹ˆ๋‹ค. ์ƒ˜ํ”Œ JavaScript UDF๋Š” UDF ์˜ˆ์‹œ(https://github.com/GoogleCloudPlatform/DataflowTemplates#udf-examples)๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.
  • javascriptTextTransformReloadIntervalMinutes: ์ž‘์—…์ž๊ฐ€ ํŒŒ์ผ์„ ์ƒˆ๋กœ๊ณ ์นจํ•˜๊ธฐ ์œ„ํ•ด JavaScript UDF ๋ณ€๊ฒฝ์‚ฌํ•ญ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ„๊ฒฉ์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ๊ฐ’์€ 0์ž…๋‹ˆ๋‹ค.

์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜

์„ ํƒ์ ์œผ๋กœ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜(UDF)๋ฅผ ์ž‘์„ฑํ•˜์—ฌ ์ด ํ…œํ”Œ๋ฆฟ์„ ํ™•์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ…œํ”Œ๋ฆฟ์ด ๊ฐ ์ž…๋ ฅ ์š”์†Œ์— ๋Œ€ํ•ด UDF๋ฅผ ํ˜ธ์ถœํ•ฉ๋‹ˆ๋‹ค. ์š”์†Œ ํŽ˜์ด๋กœ๋“œ๋Š” JSON ๋ฌธ์ž์—ด๋กœ ์ง๋ ฌํ™”๋ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ Dataflow ํ…œํ”Œ๋ฆฟ์— ๋Œ€ํ•œ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜ ๋งŒ๋“ค๊ธฐ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

ํ•จ์ˆ˜ ์‚ฌ์–‘

UDF์˜ ์‚ฌ์–‘์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ์ž…๋ ฅ: JSON ๋ฌธ์ž์—ด๋กœ ์ง๋ ฌํ™”๋œ Pub/Sub ๋ฉ”์‹œ์ง€ ๋ฐ์ดํ„ฐ ํ•„๋“œ์ž…๋‹ˆ๋‹ค.
  • ์ถœ๋ ฅ: BigQuery ๋Œ€์ƒ ํ…Œ์ด๋ธ”์˜ ์Šคํ‚ค๋งˆ์™€ ์ผ์น˜ํ•˜๋Š” JSON ๋ฌธ์ž์—ด์ž…๋‹ˆ๋‹ค.
  • ํ…œํ”Œ๋ฆฟ ์‹คํ–‰

    ์ฝ˜์†”

    1. Dataflow ํ…œํ”Œ๋ฆฟ์—์„œ ์ž‘์—… ๋งŒ๋“ค๊ธฐ ํŽ˜์ด์ง€๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.
    2. ํ…œํ”Œ๋ฆฟ์—์„œ ์ž‘์—… ๋งŒ๋“ค๊ธฐ๋กœ ์ด๋™
    3. ์ž‘์—… ์ด๋ฆ„ ํ•„๋“œ์— ๊ณ ์œ ํ•œ ์ž‘์—… ์ด๋ฆ„์„ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    4. (์„ ํƒ์‚ฌํ•ญ) ๋ฆฌ์ „ ์—”๋“œํฌ์ธํŠธ์˜ ๋“œ๋กญ๋‹ค์šด ๋ฉ”๋‰ด์—์„œ ๊ฐ’์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ๋ฆฌ์ „์€ us-central1์ž…๋‹ˆ๋‹ค.

      Dataflow ์ž‘์—…์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฆฌ์ „ ๋ชฉ๋ก์€ Dataflow ์œ„์น˜๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

    5. Dataflow ํ…œํ”Œ๋ฆฟ ๋“œ๋กญ๋‹ค์šด ๋ฉ”๋‰ด์—์„œ the Pub/Sub Subscription to BigQuery template์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
    6. ์ œ๊ณต๋œ ๋งค๊ฐœ๋ณ€์ˆ˜ ํ•„๋“œ์— ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐ’์„ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    7. (์„ ํƒ์‚ฌํ•ญ) ์ •ํ™•ํžˆ ํ•œ ๋ฒˆ ์ฒ˜๋ฆฌ์—์„œ ์ ์–ด๋„ ํ•œ ๋ฒˆ ์ŠคํŠธ๋ฆฌ๋ฐ ๋ชจ๋“œ๋กœ ์ „ํ™˜ํ•˜๋ ค๋ฉด ์ ์–ด๋„ ํ•œ ๋ฒˆ๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
    8. ์ž‘์—… ์‹คํ–‰์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

    gcloud

    ์…ธ ๋˜๋Š” ํ„ฐ๋ฏธ๋„์—์„œ ํ…œํ”Œ๋ฆฟ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

    gcloud dataflow jobs run JOB_NAME \
        --gcs-location gs://dataflow-templates-REGION_NAME/VERSION/PubSub_Subscription_to_BigQuery \
        --region REGION_NAME \
        --staging-location STAGING_LOCATION \
        --parameters \
    inputSubscription=projects/PROJECT_ID/subscriptions/SUBSCRIPTION_NAME,\
    outputTableSpec=PROJECT_ID:DATASET.TABLE_NAME,\
    outputDeadletterTable=PROJECT_ID:DATASET.TABLE_NAME

    ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    • JOB_NAME: ์„ ํƒํ•œ ๊ณ ์œ ํ•œ ์ž‘์—… ์ด๋ฆ„
    • REGION_NAME: Dataflow ์ž‘์—…์„ ๋ฐฐํฌํ•  ๋ฆฌ์ „(์˜ˆ: us-central1)
    • VERSION: ์‚ฌ์šฉํ•  ํ…œํ”Œ๋ฆฟ ๋ฒ„์ „

      ๋‹ค์Œ ๊ฐ’์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

      • latest: ๋ฒ„ํ‚ท์˜ ๋‚ ์งœ๊ฐ€ ์ง€์ •๋˜์ง€ ์•Š์€ ์ƒ์œ„ ํด๋”(gs://dataflow-templates-REGION_NAME/latest/)์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์‹  ๋ฒ„์ „์˜ ํ…œํ”Œ๋ฆฟ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
      • ๋ฒ„์ „ ์ด๋ฆ„(์˜ˆ: 2023-09-12-00_RC00): ๋ฒ„ํ‚ท์˜ ๋‚ ์งœ๊ฐ€ ์ง€์ •๋œ ํ•ด๋‹น ์ƒ์œ„ ํด๋”(gs://dataflow-templates-REGION_NAME/)์— ์ค‘์ฒฉ๋˜์–ด ์žˆ๋Š” ํŠน์ • ๋ฒ„์ „์˜ ํ…œํ”Œ๋ฆฟ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • STAGING_LOCATION: ๋กœ์ปฌ ํŒŒ์ผ์˜ ์Šคํ…Œ์ด์ง• ์œ„์น˜(์˜ˆ: gs://your-bucket/staging)
    • SUBSCRIPTION_NAME: Pub/Sub ๊ตฌ๋… ์ด๋ฆ„
    • DATASET: BigQuery ๋ฐ์ดํ„ฐ ์„ธํŠธ
    • TABLE_NAME: BigQuery ํ…Œ์ด๋ธ” ์ด๋ฆ„

    API

    REST API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ…œํ”Œ๋ฆฟ์„ ์‹คํ–‰ํ•˜๋ ค๋ฉด HTTP POST ์š”์ฒญ์„ ์ „์†กํ•ฉ๋‹ˆ๋‹ค. API ๋ฐ ์Šน์ธ ๋ฒ”์œ„์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ projects.templates.launch๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

    POST https://dataflow.googleapis.com/v1b3/projects/PROJECT_ID/locations/LOCATION/templates:launch?gcsPath=gs://dataflow-templates-LOCATION/VERSION/PubSub_Subscription_to_BigQuery
    {
       "jobName": "JOB_NAME",
       "parameters": {
           "inputSubscription": "projects/PROJECT_ID/subscriptions/SUBSCRIPTION_NAME",
           "outputTableSpec": "PROJECT_ID:DATASET.TABLE_NAME"
       },
       "environment": {
           "ipConfiguration": "WORKER_IP_UNSPECIFIED",
           "additionalExperiments": []
       },
    }

    ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

    • PROJECT_ID: Dataflow ์ž‘์—…์„ ์‹คํ–‰ํ•˜๋ ค๋Š” Google Cloud ํ”„๋กœ์ ํŠธ ID
    • JOB_NAME: ์„ ํƒํ•œ ๊ณ ์œ ํ•œ ์ž‘์—… ์ด๋ฆ„
    • LOCATION: Dataflow ์ž‘์—…์„ ๋ฐฐํฌํ•  ๋ฆฌ์ „(์˜ˆ: us-central1)
    • VERSION: ์‚ฌ์šฉํ•  ํ…œํ”Œ๋ฆฟ ๋ฒ„์ „

      ๋‹ค์Œ ๊ฐ’์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

      • latest: ๋ฒ„ํ‚ท์˜ ๋‚ ์งœ๊ฐ€ ์ง€์ •๋˜์ง€ ์•Š์€ ์ƒ์œ„ ํด๋”(gs://dataflow-templates-REGION_NAME/latest/)์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์‹  ๋ฒ„์ „์˜ ํ…œํ”Œ๋ฆฟ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
      • ๋ฒ„์ „ ์ด๋ฆ„(์˜ˆ: 2023-09-12-00_RC00): ๋ฒ„ํ‚ท์˜ ๋‚ ์งœ๊ฐ€ ์ง€์ •๋œ ํ•ด๋‹น ์ƒ์œ„ ํด๋”(gs://dataflow-templates-REGION_NAME/)์— ์ค‘์ฒฉ๋˜์–ด ์žˆ๋Š” ํŠน์ • ๋ฒ„์ „์˜ ํ…œํ”Œ๋ฆฟ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • STAGING_LOCATION: ๋กœ์ปฌ ํŒŒ์ผ์˜ ์Šคํ…Œ์ด์ง• ์œ„์น˜(์˜ˆ: gs://your-bucket/staging)
    • SUBSCRIPTION_NAME: Pub/Sub ๊ตฌ๋… ์ด๋ฆ„
    • DATASET: BigQuery ๋ฐ์ดํ„ฐ ์„ธํŠธ
    • TABLE_NAME: BigQuery ํ…Œ์ด๋ธ” ์ด๋ฆ„

    ๋‹ค์Œ ๋‹จ๊ณ„