ํŒŒ์ดํ”„๋ผ์ธ ๋กœ๊ทธ ์ž‘์—…

Apache Beam SDK์—์„œ ๊ธฐ๋ณธ ์ œ๊ณต๋˜๋Š” ๋กœ๊น… ์ธํ”„๋ผ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•  ๋•Œ ์ •๋ณด๋ฅผ ๋กœ๊น…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Google Cloud Console์„ ์‚ฌ์šฉํ•˜์—ฌ ํŒŒ์ดํ”„๋ผ์ธ ์‹คํ–‰ ์ค‘๊ณผ ์‹คํ–‰ ํ›„์— ๋กœ๊น… ์ •๋ณด๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋ง ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํŒŒ์ดํ”„๋ผ์ธ์— ๋กœ๊ทธ ๋ฉ”์‹œ์ง€ ์ถ”๊ฐ€

์ž๋ฐ”

์ž๋ฐ”์šฉ Apache Beam SDK๋Š” ์˜คํ”ˆ์†Œ์Šค Simple Logging Facade for Java(SLF4J) ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ†ตํ•œ ์ž‘์—…์ž ๋ฉ”์‹œ์ง€ ๋กœ๊น…์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. ์ž๋ฐ”์šฉ Apache Beam SDK๊ฐ€ ํ•„์š”ํ•œ ๋กœ๊น… ์ธํ”„๋ผ๋ฅผ ๊ตฌํ˜„ํ•˜๋ฏ€๋กœ, ์ž๋ฐ” ์ฝ”๋“œ์—์„œ๋Š” SLF4J API๋งŒ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ํŒŒ์ดํ”„๋ผ์ธ ์ฝ”๋“œ ๋‚ด์—์„œ ๋ฉ”์‹œ์ง€ ๋กœ๊น…์„ ์‚ฌ์šฉ ์„ค์ •ํ•˜๋„๋ก ๋กœ๊ฑฐ๋ฅผ ์ธ์Šคํ„ด์Šคํ™”ํ•ฉ๋‹ˆ๋‹ค.

๊ธฐ์กด ์ฝ”๋“œ ๋˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ ๊ฒฝ์šฐ, ์ž๋ฐ”์šฉ Apache Beam SDK๋Š” ๋กœ๊น… ์ธํ”„๋ผ๋ฅผ ์ถ”๊ฐ€๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ ์ž๋ฐ”์šฉ ๋กœ๊น… ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ†ตํ•ด ์ƒ์„ฑ๋œ ๋กœ๊ทธ ๋ฉ”์‹œ์ง€๊ฐ€ ์บก์ฒ˜๋ฉ๋‹ˆ๋‹ค.

Python

Python์šฉ Apache Beam SDK๋Š” ํŒŒ์ดํ”„๋ผ์ธ ์ž‘์—…์ž๊ฐ€ ๋กœ๊ทธ ๋ฉ”์‹œ์ง€๋ฅผ ์ถœ๋ ฅํ•˜๊ฒŒ ํ•˜๋Š” logging ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํŒจํ‚ค์ง€๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๊ฐ€์ ธ์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค.

import logging

Go

Go์šฉ Apache Beam SDK๋Š” ํŒŒ์ดํ”„๋ผ์ธ ์ž‘์—…์ž๊ฐ€ ๋กœ๊ทธ ๋ฉ”์‹œ์ง€๋ฅผ ์ถœ๋ ฅํ•˜๊ฒŒ ํ•˜๋Š” log ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํŒจํ‚ค์ง€๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๊ฐ€์ ธ์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค.

import "github.com/apache/beam/sdks/v2/go/pkg/beam/log"

์ž‘์—…์ž ๋กœ๊ทธ ๋ฉ”์‹œ์ง€ ์ฝ”๋“œ ์˜ˆ

์ž๋ฐ”

๋‹ค์Œ ์˜ˆ์‹œ์—์„œ๋Š” Dataflow ๋กœ๊น…์— SLF4J๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. Dataflow ๋กœ๊น…์„ ์œ„ํ•œ SLF4J ๊ตฌ์„ฑ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์ž๋ฐ” ํŒ ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Apache Beam WordCount ์˜ˆ์‹œ๋Š” ์ฒ˜๋ฆฌ๋œ ํ…์ŠคํŠธ ์ค„์—์„œ 'love'๋ผ๋Š” ๋‹จ์–ด๊ฐ€ ๋ฐœ๊ฒฌ๋˜๋ฉด ๋กœ๊ทธ ๋ฉ”์‹œ์ง€๋ฅผ ์ถœ๋ ฅํ•˜๋„๋ก ์ˆ˜์ •๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ถ”๊ฐ€๋œ ์ฝ”๋“œ๋Š” ๋‹ค์Œ ์˜ˆ์‹œ์—์„œ ๊ตต๊ฒŒ ํ‘œ์‹œ๋˜์—ˆ์Šต๋‹ˆ๋‹ค(์ฃผ๋ณ€ ์ฝ”๋“œ๋Š” ์ดํ•ด๋ฅผ ๋•๊ธฐ ์œ„ํ•ด ํฌํ•จ๋จ).

 package org.apache.beam.examples;
 // Import SLF4J packages.
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 ...
 public class WordCount {
   ...
   static class ExtractWordsFn extends DoFn<String, String> {
     // Instantiate Logger.
     // Suggestion: As shown, specify the class name of the containing class
     // (WordCount).
     private static final Logger LOG = LoggerFactory.getLogger(WordCount.class);
     ...
     @ProcessElement
     public void processElement(ProcessContext c) {
       ...
       // Output each word encountered into the output PCollection.
       for (String word : words) {
         if (!word.isEmpty()) {
           c.output(word);
         }
         // Log INFO messages when the word "love" is found.
         if(word.toLowerCase().equals("love")) {
           LOG.info("Found " + word.toLowerCase());
         }
       }
     }
   }
 ... // Remaining WordCount example code ...

Python

Apache Beam wordcount.py ์˜ˆ์‹œ๋Š” ์ฒ˜๋ฆฌ๋œ ํ…์ŠคํŠธ ์ค„์—์„œ 'love' ๋‹จ์–ด๊ฐ€ ๋ฐœ๊ฒฌ๋˜๋ฉด ๋กœ๊ทธ ๋ฉ”์‹œ์ง€๋ฅผ ์ถœ๋ ฅํ•˜๋„๋ก ์ˆ˜์ •๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

# import Python logging module.
import logging

class ExtractWordsFn(beam.DoFn):
  def process(self, element):
    words = re.findall(r'[A-Za-z\']+', element)
    for word in words:
      yield word

      if word.lower() == 'love':
        # Log using the root logger at info or higher levels
        logging.info('Found : %s', word.lower())

# Remaining WordCount example code ...

Go

Apache Beam wordcount.go ์˜ˆ์‹œ๋Š” ์ฒ˜๋ฆฌ๋œ ํ…์ŠคํŠธ ์ค„์—์„œ 'love' ๋‹จ์–ด๊ฐ€ ๋ฐœ๊ฒฌ๋˜๋ฉด ๋กœ๊ทธ ๋ฉ”์‹œ์ง€๋ฅผ ์ถœ๋ ฅํ•˜๋„๋ก ์ˆ˜์ •๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

func (f *extractFn) ProcessElement(ctx context.Context, line string, emit func(string)) {
    for _, word := range wordRE.FindAllString(line, -1) {
        // increment the counter for small words if length of words is
        // less than small_word_length
        if strings.ToLower(word) == "love" {
            log.Infof(ctx, "Found : %s", strings.ToLower(word))
        }

        emit(word)
    }
}

// Remaining Wordcount example

์ž๋ฐ”

์ˆ˜์ •๋œ WordCount ํŒŒ์ดํ”„๋ผ์ธ์ด ๋กœ์ปฌ ํŒŒ์ผ์— ๋ณด๋‚ธ ์ถœ๋ ฅ(--output=./local-wordcounts)์ด ์žˆ๋Š” ๊ธฐ๋ณธ DirectRunner๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋กœ์ปฌ๋กœ ์‹คํ–‰๋˜๋Š” ๊ฒฝ์šฐ, Console ์ถœ๋ ฅ์— ์ถ”๊ฐ€๋œ ๋กœ๊ทธ ๋ฉ”์‹œ์ง€๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

INFO: Executing pipeline using the DirectRunner.
...
Feb 11, 2015 1:13:22 PM org.apache.beam.examples.WordCount$ExtractWordsFn processElement
INFO: Found love
Feb 11, 2015 1:13:22 PM org.apache.beam.examples.WordCount$ExtractWordsFn processElement
INFO: Found love
Feb 11, 2015 1:13:22 PM org.apache.beam.examples.WordCount$ExtractWordsFn processElement
INFO: Found love
...
INFO: Pipeline execution complete.

๊ธฐ๋ณธ์ ์œผ๋กœ INFO ์ด์ƒ์œผ๋กœ ํ‘œ์‹œ๋œ ๋กœ๊ทธ ์ค„๋งŒ Cloud Logging์œผ๋กœ ์ „์†ก๋ฉ๋‹ˆ๋‹ค. ์ด ๋™์ž‘์„ ๋ณ€๊ฒฝํ•˜๋ ค๋ฉด ํŒŒ์ดํ”„๋ผ์ธ ์ž‘์—…์ž ๋กœ๊ทธ ์ˆ˜์ค€ ์„ค์ •์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.

Python

์ˆ˜์ •๋œ WordCount ํŒŒ์ดํ”„๋ผ์ธ์ด ๋กœ์ปฌ ํŒŒ์ผ์— ๋ณด๋‚ธ ์ถœ๋ ฅ(--output=./local-wordcounts)์ด ์žˆ๋Š” ๊ธฐ๋ณธ DirectRunner๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋กœ์ปฌ๋กœ ์‹คํ–‰๋˜๋Š” ๊ฒฝ์šฐ, Console ์ถœ๋ ฅ์— ์ถ”๊ฐ€๋œ ๋กœ๊ทธ ๋ฉ”์‹œ์ง€๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

INFO:root:Found : love
INFO:root:Found : love
INFO:root:Found : love

๊ธฐ๋ณธ์ ์œผ๋กœ INFO ์ด์ƒ์œผ๋กœ ํ‘œ์‹œ๋œ ๋กœ๊ทธ ์ค„๋งŒ Cloud Logging์œผ๋กœ ์ „์†ก๋ฉ๋‹ˆ๋‹ค. ์ด ๋™์ž‘์„ ๋ณ€๊ฒฝํ•˜๋ ค๋ฉด ํŒŒ์ดํ”„๋ผ์ธ ์ž‘์—…์ž ๋กœ๊ทธ ์ˆ˜์ค€ ์„ค์ •์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.

ํŒŒ์ดํ”„๋ผ์ธ ๋กœ๊ทธ๋ฅผ Dataflow ๋ฐ Cloud Logging์œผ๋กœ ์ „์†กํ•˜๋Š” ์‚ฌ์ „ ๊ตฌ์„ฑ๋œ ๋กœ๊ทธ ํ•ธ๋“ค๋Ÿฌ๊ฐ€ ์‚ฌ์šฉ ์ค‘์ง€๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ logging.config ํ•จ์ˆ˜๋กœ ๋กœ๊น… ๊ตฌ์„ฑ์„ ๋ฎ์–ด์“ฐ์ง€ ๋งˆ์„ธ์š”.

Go

์ˆ˜์ •๋œ WordCount ํŒŒ์ดํ”„๋ผ์ธ์ด ๋กœ์ปฌ ํŒŒ์ผ์— ๋ณด๋‚ธ ์ถœ๋ ฅ(--output=./local-wordcounts)์ด ์žˆ๋Š” ๊ธฐ๋ณธ DirectRunner๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋กœ์ปฌ๋กœ ์‹คํ–‰๋˜๋Š” ๊ฒฝ์šฐ, Console ์ถœ๋ ฅ์— ์ถ”๊ฐ€๋œ ๋กœ๊ทธ ๋ฉ”์‹œ์ง€๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

2022/05/26 11:36:44 Found : love
2022/05/26 11:36:44 Found : love
2022/05/26 11:36:44 Found : love

๊ธฐ๋ณธ์ ์œผ๋กœ INFO ์ด์ƒ์œผ๋กœ ํ‘œ์‹œ๋œ ๋กœ๊ทธ ์ค„๋งŒ Cloud Logging์œผ๋กœ ์ „์†ก๋ฉ๋‹ˆ๋‹ค.

๋กœ๊ทธ ๋ณผ๋ฅจ ์ œ์–ด

๋˜ํ•œ ํŒŒ์ดํ”„๋ผ์ธ ๋กœ๊ทธ ์ˆ˜์ค€์„ ๋ณ€๊ฒฝํ•˜์—ฌ ์ƒ์„ฑ๋˜๋Š” ๋กœ๊ทธ์˜ ์–‘์„ ์ค„์ผ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. Dataflow ๋กœ๊ทธ์˜ ์ผ๋ถ€ ๋˜๋Š” ์ „๋ถ€๋ฅผ ๊ณ„์† ์ˆ˜์ง‘ํ•˜์ง€ ์•Š์œผ๋ ค๋ฉด Logging ์ œ์™ธ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ Dataflow ๋กœ๊ทธ๋ฅผ ์ œ์™ธํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ BigQuery, Cloud Storage, Pub/Sub ๋“ฑ์˜ ๋‹ค๋ฅธ ๋Œ€์ƒ์œผ๋กœ ๋กœ๊ทธ๋ฅผ ๋‚ด๋ณด๋ƒ…๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ Dataflow ๋กœ๊ทธ ์ˆ˜์ง‘ ์ œ์–ด๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

ํ•œ๋„ ๋ฐ ์ œํ•œ ๋กœ๊น…

์ž‘์—…์ž ๋กœ๊ทธ ๋ฉ”์‹œ์ง€๋Š” ์ž‘์—…์ž๋ณ„๋กœ 30์ดˆ๋งˆ๋‹ค 15,000๊ฐœ๋กœ ์ œํ•œ๋ฉ๋‹ˆ๋‹ค. ์ด ํ•œ๋„์— ๋„๋‹ฌํ•˜๋ฉด ๋กœ๊น…์ด ์ œํ•œ๋จ์„ ์•Œ๋ฆฌ๋Š” ๋‹จ์ผ ์ž‘์—…์ž ๋กœ๊ทธ ๋ฉ”์‹œ์ง€๊ฐ€ ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค.

Throttling logger worker. It used up its 30s quota for logs in only 12.345s
30์ดˆ ๊ฐ„๊ฒฉ์ด ๋๋‚  ๋•Œ๊นŒ์ง€ ๋” ์ด์ƒ ๋ฉ”์‹œ์ง€๊ฐ€ ๋กœ๊น…๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด ํ•œ๋„๋Š” Apache Beam SDK์™€ ์‚ฌ์šฉ์ž ์ฝ”๋“œ๊ฐ€ ์ƒ์„ฑํ•œ ๋กœ๊ทธ ๋ฉ”์‹œ์ง€์— ์˜ํ•ด ๊ณต์œ ๋ฉ๋‹ˆ๋‹ค.

๋กœ๊ทธ ์Šคํ† ๋ฆฌ์ง€ ๋ฐ ๋ณด๊ด€

์ž‘์—… ๋กœ๊ทธ๋Š” _Default ๋กœ๊ทธ ๋ฒ„ํ‚ท์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. Logging API ์„œ๋น„์Šค ์ด๋ฆ„์€ dataflow.googleapis.com์ž…๋‹ˆ๋‹ค. Cloud Logging์— ์‚ฌ์šฉ๋˜๋Š” Google Cloud ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฆฌ์†Œ์Šค ์œ ํ˜• ๋ฐ ์„œ๋น„์Šค์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฆฌ์†Œ์Šค ๋ฐ ์„œ๋น„์Šค๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Logging์—์„œ ๋กœ๊ทธ ํ•ญ๋ชฉ์ด ๋ณด๊ด€๋˜๋Š” ๊ธฐ๊ฐ„์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ํ• ๋‹น๋Ÿ‰ ๋ฐ ํ•œ๋„: ๋กœ๊ทธ ๋ณด๊ด€ ๊ธฐ๊ฐ„์˜ ๋ณด๊ด€ ์ •๋ณด๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์ž‘์—… ๋กœ๊ทธ ๋ณด๊ธฐ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ํŒŒ์ดํ”„๋ผ์ธ ๋กœ๊ทธ ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๋ณด๊ธฐ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

ํŒŒ์ดํ”„๋ผ์ธ ๋กœ๊ทธ ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๋ณด๊ธฐ

Dataflow ์„œ๋น„์Šค์—์„œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•˜๋ฉด Dataflow ๋ชจ๋‹ˆํ„ฐ๋ง ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ๋ฐฉ์ถœํ•œ ๋กœ๊ทธ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Dataflow ์ž‘์—…์ž ๋กœ๊ทธ ์˜ˆ์‹œ

์ˆ˜์ •๋œ WordCount ํŒŒ์ดํ”„๋ผ์ธ์€ ๋‹ค์Œ ์˜ต์…˜์œผ๋กœ ํด๋ผ์šฐ๋“œ์—์„œ ์‹คํ–‰๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ž๋ฐ”

--project=WordCountExample
--output=gs://<bucket-name>/counts
--runner=DataflowRunner
--tempLocation=gs://<bucket-name>/temp
--stagingLocation=gs://<bucket-name>/binaries

Python

--project=WordCountExample
--output=gs://<bucket-name>/counts
--runner=DataflowRunner
--staging_location=gs://<bucket-name>/binaries

Go

--project=WordCountExample
--output=gs://<bucket-name>/counts
--runner=DataflowRunner
--staging_location=gs://<bucket-name>/binaries

๋กœ๊ทธ ๋ณด๊ธฐ

WordCount ํด๋ผ์šฐ๋“œ ํŒŒ์ดํ”„๋ผ์ธ์€ ์ฐจ๋‹จ ์‹คํ–‰์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ, ํŒŒ์ดํ”„๋ผ์ธ ์‹คํ–‰ ์ค‘์— ์ฝ˜์†” ๋ฉ”์‹œ์ง€๊ฐ€ ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค. ์ž‘์—…์ด ์‹œ์ž‘๋˜๋ฉด Google Cloud ์ฝ˜์†” ํŽ˜์ด์ง€์— ๋Œ€ํ•œ ๋งํฌ๊ฐ€ ์ฝ˜์†”๋กœ ์ถœ๋ ฅ๋˜๊ณ , ์ด์–ด์„œ ํŒŒ์ดํ”„๋ผ์ธ ์ž‘์—… ID๊ฐ€ ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.

INFO: To access the Dataflow monitoring console, please navigate to
https://console.developers.google.com/dataflow/job/2017-04-13_13_58_10-6217777367720337669
Submitted job: 2017-04-13_13_58_10-6217777367720337669

์ฝ˜์†” URL์€ ์ œ์ถœ๋œ ์ž‘์—…์˜ ์š”์•ฝ ํŽ˜์ด์ง€๊ฐ€ ์žˆ๋Š” Dataflow ๋ชจ๋‹ˆํ„ฐ๋ง ์ธํ„ฐํŽ˜์ด์Šค๋กœ ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค. ํ•ด๋‹น ํ™”๋ฉด์˜ ์™ผ์ชฝ์—๋Š” ๋™์  ์‹คํ–‰ ๊ทธ๋ž˜ํ”„, ์˜ค๋ฅธ์ชฝ์—๋Š” ์š”์•ฝ ์ •๋ณด๊ฐ€ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. ํ•˜๋‹จ ํŒจ๋„์˜ ์„ ํด๋ฆญํ•˜์—ฌ ๋กœ๊ทธ ํŒจ๋„์„ ํ™•์žฅํ•˜์„ธ์š”.

๋กœ๊ทธ ํŒจ๋„์—๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ์ž‘์—… ์ƒํƒœ๋ฅผ ์ „์ฒด์ ์œผ๋กœ ๋ณด๊ณ ํ•˜๋Š” ์ž‘์—… ๋กœ๊ทธ๊ฐ€ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. ์ •๋ณด์™€ ๋กœ๊ทธ ํ•„ํ„ฐ๋ง์„ ํด๋ฆญํ•˜์—ฌ ๋กœ๊ทธ ํŒจ๋„์— ํ‘œ์‹œ๋˜๋Š” ๋ฉ”์‹œ์ง€๋ฅผ ํ•„ํ„ฐ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋ž˜ํ”„์—์„œ ํŒŒ์ดํ”„๋ผ์ธ ๋‹จ๊ณ„๋ฅผ ์„ ํƒํ•˜๋ฉด ๋ทฐ๋Š” ์‚ฌ์šฉ์ž ์ฝ”๋“œ ๋ฐ ํŒŒ์ดํ”„๋ผ์ธ ๋‹จ๊ณ„์—์„œ ์‹คํ–‰๋˜๋Š” ์ƒ์„ฑ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด ์ƒ์„ฑ๋œ ๋‹จ๊ณ„ ๋กœ๊ทธ๋กœ ๋ณ€๊ฒฝ๋ฉ๋‹ˆ๋‹ค.

์ž‘์—… ๋กœ๊ทธ๋กœ ๋Œ์•„๊ฐ€๋ ค๋ฉด ๊ทธ๋ž˜ํ”„ ๋ฐ”๊นฅ์ชฝ์„ ํด๋ฆญํ•˜๊ฑฐ๋‚˜ ์˜ค๋ฅธ์ชฝ ์ธก๋ฉด ํŒจ๋„์— ์žˆ๋Š” ๋‹จ๊ณ„ ์„ ํƒ ์ทจ์†Œ ๋ฒ„ํŠผ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹จ๊ณ„๋ฅผ ์ง€์›๋‹ˆ๋‹ค.

๋กœ๊ทธ ํƒ์ƒ‰๊ธฐ๋กœ ์ด๋™

๋กœ๊ทธ ํƒ์ƒ‰๊ธฐ๋ฅผ ์—ด๊ณ  ๋‹ค๋ฅธ ๋กœ๊ทธ ์œ ํ˜•์„ ์„ ํƒํ•˜๋ ค๋ฉด ๋กœ๊ทธ ํŒจ๋„์—์„œ ๋กœ๊ทธ ํƒ์ƒ‰๊ธฐ์—์„œ ๋ณด๊ธฐ(์™ธ๋ถ€ ๋งํฌ ๋ฒ„ํŠผ)๋ฅผ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

๋กœ๊ทธ ํƒ์ƒ‰๊ธฐ์—์„œ ๋‹ค์–‘ํ•œ ๋กœ๊ทธ ์œ ํ˜•์ด ํฌํ•จ๋œ ํŒจ๋„์„ ๋ณด๋ ค๋ฉด ๋กœ๊ทธ ํ•„๋“œ ์ „ํ™˜ ๋ฒ„ํŠผ์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

๋กœ๊ทธ ํƒ์ƒ‰๊ธฐ ํŽ˜์ด์ง€์—์„œ ์ฟผ๋ฆฌ๋Š” ์ž‘์—… ๋‹จ๊ณ„ ๋˜๋Š” ๋กœ๊ทธ ์œ ํ˜•๋ณ„๋กœ ๋กœ๊ทธ๋ฅผ ํ•„ํ„ฐ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•„ํ„ฐ๋ฅผ ์‚ญ์ œํ•˜๋ ค๋ฉด ์ฟผ๋ฆฌ ํ‘œ์‹œ ์ „ํ™˜ ๋ฒ„ํŠผ์„ ํด๋ฆญํ•˜๊ณ  ์ฟผ๋ฆฌ๋ฅผ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค.

์ž‘์—…์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋“  ๋กœ๊ทธ๋ฅผ ๋ณด๋ ค๋ฉด ๋‹ค์Œ ๋‹จ๊ณ„๋ฅผ ๋”ฐ๋ฅด์„ธ์š”.

  1. ์ฟผ๋ฆฌ ํ•„๋“œ์— ๋‹ค์Œ ์ฟผ๋ฆฌ๋ฅผ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.

    resource.type="dataflow_step"
    resource.labels.job_id="JOB_ID"
    

    JOB_ID๋ฅผ ์ž‘์—… ID๋กœ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

  2. ์ฟผ๋ฆฌ ์‹คํ–‰์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

  3. ์ด ์ฟผ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ด๋„ ์ž‘์—… ๋กœ๊ทธ๊ฐ€ ํ‘œ์‹œ๋˜์ง€ ์•Š์œผ๋ฉด ์‹œ๊ฐ„ ์ˆ˜์ •์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

  4. ์‹œ์ž‘ ์‹œ๊ฐ„๊ณผ ์ข…๋ฃŒ ์‹œ๊ฐ„์„ ์กฐ์ •ํ•œ ๋‹ค์Œ ์ ์šฉ์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

๋กœ๊ทธ ์œ ํ˜•

๋กœ๊ทธ ํƒ์ƒ‰๊ธฐ์—๋Š” ํŒŒ์ดํ”„๋ผ์ธ์˜ ์ธํ”„๋ผ ๋กœ๊ทธ๋„ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์˜ค๋ฅ˜ ๋ฐ ๊ฒฝ๊ณ  ๋กœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ด€์ฐฐ๋œ ํŒŒ์ดํ”„๋ผ์ธ ๋ฌธ์ œ๋ฅผ ์ง„๋‹จํ•ฉ๋‹ˆ๋‹ค. ํŒŒ์ดํ”„๋ผ์ธ ๋ฌธ์ œ์™€ ๊ด€๋ จ์ด ์—†๋Š” ์ธํ”„๋ผ ๋กœ๊ทธ์˜ ์˜ค๋ฅ˜ ๋ฐ ๊ฒฝ๊ณ ๊ฐ€ ๋ฐ˜๋“œ์‹œ ๋ฌธ์ œ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์€ ์•„๋‹™๋‹ˆ๋‹ค.

๋‹ค์Œ์€ ๋กœ๊ทธ ํƒ์ƒ‰๊ธฐ ํŽ˜์ด์ง€์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋กœ๊ทธ ์œ ํ˜•์˜ ์š”์•ฝ์ž…๋‹ˆ๋‹ค.

  • job-message ๋กœ๊ทธ์—๋Š” Dataflow์˜ ๋‹ค์–‘ํ•œ ๊ตฌ์„ฑ์š”์†Œ๊ฐ€ ์ƒ์„ฑํ•˜๋Š” ์ž‘์—… ์ˆ˜์ค€ ๋ฉ”์‹œ์ง€๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์‹œ๋Š” ์ž๋™ ํ™•์žฅ ๊ตฌ์„ฑ, ์ž‘์—…์ž ์‹œ์ž‘ ๋˜๋Š” ์ข…๋ฃŒ ์‹œ๊ฐ„, ์ž‘์—… ๋‹จ๊ณ„ ์ง„ํ–‰ ์ƒํ™ฉ๊ณผ ์ž‘์—… ์˜ค๋ฅ˜์ž…๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž ์ฝ”๋“œ ๋น„์ •์ƒ ์ข…๋ฃŒ ๋•Œ๋ฌธ์— ๋ฐœ์ƒํ–ˆ์œผ๋ฉฐ ์ž‘์—…์ž ๋กœ๊ทธ์— ์กด์žฌํ•˜๋Š” ์ž‘์—…์ž ์ˆ˜์ค€ ์˜ค๋ฅ˜๋„ job-message ๋กœ๊ทธ๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • worker ๋กœ๊ทธ๋Š” Dataflow ์ž‘์—…์ž๊ฐ€ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ž‘์—…์ž๋Š” ๋Œ€๋ถ€๋ถ„์˜ ํŒŒ์ดํ”„๋ผ์ธ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค(์˜ˆ: ๋ฐ์ดํ„ฐ์— ParDo ์ ์šฉ). Worker ๋กœ๊ทธ์—๋Š” ๊ฐœ๋ฐœ์ž ์ฝ”๋“œ์™€ Dataflow์—์„œ ๋กœ๊น…ํ•œ ๋ฉ”์‹œ์ง€๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
  • worker-startup ๋กœ๊ทธ๋Š” ๋Œ€๋ถ€๋ถ„์˜ Dataflow ์ž‘์—…์— ํฌํ•จ๋˜๋ฉฐ ์‹œ์ž‘ ํ”„๋กœ์„ธ์Šค์™€ ๊ด€๋ จ๋œ ๋ฉ”์‹œ์ง€๋ฅผ ์บก์ฒ˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹œ์ž‘ ํ”„๋กœ์„ธ์Šค์—๋Š” Cloud Storage์—์„œ ์ž‘์—…์˜ jar๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜๋Š” ๊ฒƒ๊ณผ ์ดํ›„์˜ ์ž‘์—…์ž ์‹œ์ž‘์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์ž‘์—…์ž๋ฅผ ์‹œ์ž‘ํ•˜๋Š” ๋ฐ ๋ฌธ์ œ๊ฐ€ ์žˆ์œผ๋ฉด ์ด ๋กœ๊ทธ๋ฅผ ๋จผ์ € ๋ณด๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.
  • harness ๋กœ๊ทธ์—๋Š” Runner v2 ์‹คํ–‰๊ธฐ ํ•˜๋„ค์Šค์˜ ๋ฉ”์‹œ์ง€๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
  • shuffler ๋กœ๊ทธ์—๋Š” ๋ณ‘๋ ฌ ํŒŒ์ดํ”„๋ผ์ธ ์ž‘์—… ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ฉํ•˜๋Š” ์ž‘์—…์ž์˜ ๋ฉ”์‹œ์ง€๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
  • system ๋กœ๊ทธ์—๋Š” ์ž‘์—…์ž VM ํ˜ธ์ŠคํŠธ ์šด์˜์ฒด์ œ์˜ ๋ฉ”์‹œ์ง€๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์ผ๋ถ€ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ๋Š” ํ”„๋กœ์„ธ์Šค ๋น„์ •์ƒ ์ข…๋ฃŒ๋‚˜ ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ(OOM) ์ด๋ฒคํŠธ๋ฅผ ์บก์ฒ˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • docker์™€ kubelet ๋กœ๊ทธ์—๋Š” Dataflow ์ž‘์—…์ž์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์ด๋“ค ๊ณต๊ฐœ ๊ธฐ์ˆ ๊ณผ ๊ด€๋ จ๋œ ๋ฉ”์‹œ์ง€๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
  • nvidia-mps ๋กœ๊ทธ์—๋Š” NVIDIA Multi-Process Service(MPS) ์ž‘์—…์— ๋Œ€ํ•œ ๋ฉ”์‹œ์ง€๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

ํŒŒ์ดํ”„๋ผ์ธ ์ž‘์—…์ž ๋กœ๊ทธ ์ˆ˜์ค€ ์„ค์ •

์ž๋ฐ”

์ž๋ฐ”์šฉ Apache Beam SDK๋ฅผ ํ†ตํ•ด ์ž‘์—…์ž์— ์ƒ์„ฑ๋˜๋Š” ๊ธฐ๋ณธ SLF4J ๋กœ๊น… ์ˆ˜์ค€์€ INFO์ž…๋‹ˆ๋‹ค. INFO ์ด์ƒ(INFO , WARN, ERROR)์˜ ๋ชจ๋“  ๋กœ๊ทธ ๋ฉ”์‹œ์ง€๊ฐ€ ๋ฐฉ์ถœ๋ฉ๋‹ˆ๋‹ค. ๋” ๋‚ฎ์€ SLF4J ๋กœ๊น… ์ˆ˜์ค€(TRACE ๋˜๋Š” DEBUG)์„ ์ง€์›ํ•˜๋„๋ก ๋‹ค๋ฅธ ๊ธฐ๋ณธ ๋กœ๊ทธ ์ˆ˜์ค€์„ ์„ค์ •ํ•˜๊ฑฐ๋‚˜ ์ฝ”๋“œ์˜ ์—ฌ๋Ÿฌ ํด๋ž˜์Šค ํŒจํ‚ค์ง€์— ๋Œ€ํ•ด ์„œ๋กœ ๋‹ค๋ฅธ ๋กœ๊ทธ ์ˆ˜์ค€์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ช…๋ น์ค„ ๋˜๋Š” ํ”„๋กœ๊ทธ๋ž˜๋งคํ‹ฑ ๋ฐฉ์‹์œผ๋กœ ์ž‘์—…์ž ๋กœ๊ทธ ์ˆ˜์ค€์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋‹ค์Œ ํŒŒ์ดํ”„๋ผ์ธ ์˜ต์…˜์ด ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.

  • --defaultSdkHarnessLogLevel=<level>: ์ด ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ชจ๋“  ๋กœ๊ฑฐ๋ฅผ ์ง€์ •๋œ ๊ธฐ๋ณธ ์ˆ˜์ค€์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋‹ค์Œ ๋ช…๋ น์ค„ ์˜ต์…˜์€ ๊ธฐ๋ณธ Dataflow INFO ๋กœ๊ทธ ์ˆ˜์ค€์„ ์žฌ์ •์˜ํ•˜๊ณ  ์ด๋ฅผ DEBUG:
    --defaultSdkHarnessLogLevel=DEBUG๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
  • --sdkHarnessLogLevelOverrides={"<package or class>":"<level>"}: ์ด ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜๋ฉด ์ง€์ •๋œ ํŒจํ‚ค์ง€ ๋˜๋Š” ํด๋ž˜์Šค์˜ ๋กœ๊น… ์ˆ˜์ค€์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด org.apache.beam.runners.dataflow ํŒจํ‚ค์ง€์˜ ๊ธฐ๋ณธ ํŒŒ์ดํ”„๋ผ์ธ ๋กœ๊ทธ ์ˆ˜์ค€์„ ์žฌ์ •์˜ํ•˜๊ณ  ์ด๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด TRACE๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    --sdkHarnessLogLevelOverrides='{"org.apache.beam.runners.dataflow":"TRACE"}'
    ์—ฌ๋Ÿฌ ์žฌ์ •์˜๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด JSON ๋งต์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
    (--sdkHarnessLogLevelOverrides={"<package/class>":"<level>","<package/class>":"<level>",...})
  • Runner v2 ์—†์ด Apache Beam SDK ๋ฒ„์ „ 2.50.0 ์ดํ•˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ์—์„œ๋Š” defaultSdkHarnessLogLevel ๋ฐ sdkHarnessLogLevelOverrides ํŒŒ์ดํ”„๋ผ์ธ ์˜ต์…˜์ด ์ง€์›๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ --defaultWorkerLogLevel=<level> ๋ฐ --workerLogLevelOverrides={"<package or class>":"<level>"} ํŒŒ์ดํ”„๋ผ์ธ ์˜ต์…˜์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ ์žฌ์ •์˜๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด JSON ๋งต์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
    (--workerLogLevelOverrides={"<package/class>":"<level>","<package/class>":"<level>",...})

๋‹ค์Œ ์˜ˆ์‹œ๋Š” ๋ช…๋ น์ค„์—์„œ ์žฌ์ •์˜ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋ณธ๊ฐ’๊ณผ ํ•จ๊ป˜ ํŒŒ์ดํ”„๋ผ์ธ ๋กœ๊น… ์˜ต์…˜์„ ํ”„๋กœ๊ทธ๋ž˜๋งคํ‹ฑ ๋ฐฉ์‹์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

 PipelineOptions options = ...
 SdkHarnessOptions loggingOptions = options.as(SdkHarnessOptions.class);
 // Overrides the default log level on the worker to emit logs at TRACE or higher.
 loggingOptions.setDefaultSdkHarnessLogLevel(LogLevel.TRACE);
 // Overrides the Foo class and "org.apache.beam.runners.dataflow" package to emit logs at WARN or higher.
 loggingOptions.getSdkHarnessLogLevelOverrides()
     .addOverrideForClass(Foo.class, LogLevel.WARN)
     .addOverrideForPackage(Package.getPackage("org.apache.beam.runners.dataflow"), LogLevel.WARN);

Python

Python์šฉ Apache Beam SDK๋ฅผ ํ†ตํ•ด ์ž‘์—…์ž์— ์ƒ์„ฑ๋˜๋Š” ๊ธฐ๋ณธ ๋กœ๊น… ์ˆ˜์ค€์€ INFO์ž…๋‹ˆ๋‹ค. INFO ์ด์ƒ(INFO, WARNING, ERROR, CRITICAL)์˜ ๋ชจ๋“  ๋กœ๊ทธ ๋ฉ”์‹œ์ง€๊ฐ€ ๋ฐฉ์ถœ๋ฉ๋‹ˆ๋‹ค. ๋” ๋‚ฎ์€ ๋กœ๊น… ์ˆ˜์ค€(DEBUG)์„ ์ง€์›ํ•˜๋„๋ก ๋‹ค๋ฅธ ๊ธฐ๋ณธ ๋กœ๊ทธ ์ˆ˜์ค€์„ ์„ค์ •ํ•˜๊ฑฐ๋‚˜ ์ฝ”๋“œ์˜ ์—ฌ๋Ÿฌ ๋ชจ๋“ˆ์— ๋Œ€ํ•ด ์„œ๋กœ ๋‹ค๋ฅธ ๋กœ๊ทธ ์ˆ˜์ค€์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ช…๋ น์ค„ ๋˜๋Š” ํ”„๋กœ๊ทธ๋ž˜๋งคํ‹ฑ ๋ฐฉ์‹์œผ๋กœ ์ž‘์—…์ž ๋กœ๊ทธ ์ˆ˜์ค€์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ํŒŒ์ดํ”„๋ผ์ธ ์˜ต์…˜ ๋‘ ๊ฐœ๊ฐ€ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.

  • --default_sdk_harness_log_level=<level>: ์ด ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ชจ๋“  ๋กœ๊ฑฐ๋ฅผ ์ง€์ •๋œ ๊ธฐ๋ณธ ์ˆ˜์ค€์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋‹ค์Œ ๋ช…๋ น์ค„ ์˜ต์…˜์€ ๊ธฐ๋ณธ Dataflow INFO ๋กœ๊ทธ ์ˆ˜์ค€์„ ์žฌ์ •์˜ํ•˜๊ณ  ์ด๋ฅผ DEBUG๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
    --default_sdk_harness_log_level=DEBUG
  • --sdk_harness_log_level_overrides={\"<module>\":\"<level>\"}: ์ด ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜์—ฌ ์ง€์ •๋œ ๋ชจ๋“ˆ์˜ ๋กœ๊น… ์ˆ˜์ค€์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด apache_beam.runners.dataflow ๋ชจ๋“ˆ์˜ ๊ธฐ๋ณธ ํŒŒ์ดํ”„๋ผ์ธ ๋กœ๊ทธ ์ˆ˜์ค€์„ ์žฌ์ •์˜ํ•˜๊ณ  ์ด๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด DEBUG๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    --sdk_harness_log_level_overrides={\"apache_beam.runners.dataflow\":\"DEBUG\"}
    ์—ฌ๋Ÿฌ ์žฌ์ •์˜๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด JSON ๋งต์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
    (--sdk_harness_log_level_overrides={\"<module>\":\"<level>\",\"<module>\":\"<level>\",...})

๋‹ค์Œ ์˜ˆ์‹œ์—์„œ๋Š” WorkerOptions ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ช…๋ น์ค„์—์„œ ์žฌ์ •์˜ํ•  ์ˆ˜ ์žˆ๋Š” ํŒŒ์ดํ”„๋ผ์ธ ๋กœ๊น… ์˜ต์…˜์„ ํ”„๋กœ๊ทธ๋ž˜๋งคํ‹ฑ ๋ฐฉ์‹์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

  from apache_beam.options.pipeline_options import PipelineOptions, WorkerOptions

  pipeline_args = [
    '--project=PROJECT_NAME',
    '--job_name=JOB_NAME',
    '--staging_location=gs://STORAGE_BUCKET/staging/',
    '--temp_location=gs://STORAGE_BUCKET/tmp/',
    '--region=DATAFLOW_REGION',
    '--runner=DataflowRunner'
  ]

  pipeline_options = PipelineOptions(pipeline_args)
  worker_options = pipeline_options.view_as(WorkerOptions)
  worker_options.default_sdk_harness_log_level = 'WARNING'

  # Note: In Apache Beam SDK 2.42.0 and earlier versions, use ['{"apache_beam.runners.dataflow":"WARNING"}']
  worker_options.sdk_harness_log_level_overrides = {"apache_beam.runners.dataflow":"WARNING"}

  # Pass in pipeline options during pipeline creation.
  with beam.Pipeline(options=pipeline_options) as pipeline:

๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

  • PROJECT_NAME: ํ”„๋กœ์ ํŠธ ์ด๋ฆ„
  • JOB_NAME: ์ž‘์—…์˜ ์ด๋ฆ„
  • STORAGE_BUCKET: Cloud Storage ์ด๋ฆ„
  • DATAFLOW_REGION: Dataflow ์ž‘์—…์„ ๋ฐฐํฌํ•  ๋ฆฌ์ „

    --region ํ”Œ๋ž˜๊ทธ๋Š” ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์„œ๋ฒ„, ๋กœ์ปฌ ํด๋ผ์ด์–ธํŠธ ๋˜๋Š” ํ™˜๊ฒฝ ๋ณ€์ˆ˜์— ์„ค์ •๋œ ๊ธฐ๋ณธ ๋ฆฌ์ „์„ ์žฌ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

Go

์ด ๊ธฐ๋Šฅ์€ Go์šฉ Apache Beam SDK์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

์‹œ์ž‘๋œ BigQuery ์ž‘์—… ๋กœ๊ทธ ๋ณด๊ธฐ

Dataflow ํŒŒ์ดํ”„๋ผ์ธ์—์„œ BigQuery๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋‹ค์–‘ํ•œ ์ž‘์—…์„ ๋Œ€์‹  ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด BigQuery ์ž‘์—…์ด ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ž‘์—…์—๋Š” ๋ฐ์ดํ„ฐ ๋กœ๋“œ, ๋ฐ์ดํ„ฐ ๋‚ด๋ณด๋‚ด๊ธฐ ๋“ฑ์ด ํฌํ•จ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Dataflow ๋ชจ๋‹ˆํ„ฐ๋ง ์ธํ„ฐํŽ˜์ด์Šค์—๋Š” ๋กœ๊ทธ ํŒจ๋„์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ด๋Ÿฌํ•œ BigQuery ์ž‘์—…์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์ •๋ณด๊ฐ€ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋ฐ ๋ชจ๋‹ˆํ„ฐ๋ง์„ ์œ„ํ•ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

๋กœ๊ทธ ํŒจ๋„์— ํ‘œ์‹œ๋œ BigQuery ์ž‘์—… ์ •๋ณด๋Š” BigQuery ์‹œ์Šคํ…œ ํ…Œ์ด๋ธ”์— ์ €์žฅ๋˜๊ณ  ๋กœ๋“œ๋ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ BigQuery ํ…Œ์ด๋ธ”์„ ์ฟผ๋ฆฌํ•  ๋•Œ ์ฒญ๊ตฌ ๋น„์šฉ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

BigQuery ์ž‘์—… ์„ธ๋ถ€์ •๋ณด ๋ณด๊ธฐ

BigQuery ์ž‘์—… ์ •๋ณด๋ฅผ ๋ณด๋ ค๋ฉด ํŒŒ์ดํ”„๋ผ์ธ์—์„œ Apache Beam 2.24.0 ์ด์ƒ์„ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

BigQuery ์ž‘์—…์„ ๋‚˜์—ดํ•˜๋ ค๋ฉด BigQuery ์ž‘์—… ํƒญ์„ ์—ด๊ณ  BigQuery ์ž‘์—…์˜ ์œ„์น˜๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ BigQuery ์ž‘์—… ๋กœ๋“œ๋ฅผ ํด๋ฆญํ•˜๊ณ  ๋Œ€ํ™”์ƒ์ž๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ์ฟผ๋ฆฌ๊ฐ€ ์™„๋ฃŒ๋˜๋ฉด ์ž‘์—… ๋ชฉ๋ก์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

BigQuery ์ž‘์—… ์ •๋ณด ํ…Œ์ด๋ธ”์˜ BigQuery ์ž‘์—… ๋กœ๋“œ ๋ฒ„ํŠผ

์ž‘์—… ID, ์œ ํ˜•, ๊ธฐ๊ฐ„ ๋“ฑ ๊ฐ ์ž‘์—…์— ๋Œ€ํ•œ ๊ธฐ๋ณธ ์ •๋ณด๊ฐ€ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.

ํ˜„์žฌ ํŒŒ์ดํ”„๋ผ์ธ ์ž‘์—… ์‹คํ–‰ ์ค‘์— ์‹คํ–‰๋œ BigQuery ์ž‘์—…์„ ๋ณด์—ฌ์ฃผ๋Š” ํ…Œ์ด๋ธ”์ž…๋‹ˆ๋‹ค.

ํŠน์ • ์ž‘์—…์— ๋Œ€ํ•œ ์ƒ์„ธ ์ •๋ณด๋ฅผ ๋ณด๋ ค๋ฉด ์ถ”๊ฐ€ ์ •๋ณด ์—ด์—์„œ ๋ช…๋ น์ค„์„ ํด๋ฆญํ•˜์„ธ์š”.

๋ช…๋ น์ค„์˜ ๋ชจ๋‹ฌ ์ฐฝ์—์„œ bq jobs describe ๋ช…๋ น์–ด๋ฅผ ๋ณต์‚ฌํ•˜์—ฌ ๋กœ์ปฌ๋กœ ๋˜๋Š” Cloud Shell์—์„œ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

gcloud alpha bq jobs describe BIGQUERY_JOB_ID

bq jobs describe ๋ช…๋ น์–ด๋Š” JobStatistics๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ธฐ๋Šฅ์€ ์†๋„๊ฐ€ ๋А๋ฆฌ๊ฑฐ๋‚˜ ์ค‘๋‹จ๋œ BigQuery ์ž‘์—… ์ง„๋‹จ ์‹œ ์œ ์šฉํ•œ ์ถ”๊ฐ€ ์„ธ๋ถ€์ •๋ณด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๋˜๋Š” BigQueryIO๋ฅผ SQL ์ฟผ๋ฆฌ์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋ฉด ์ฟผ๋ฆฌ ์ž‘์—…์ด ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. ์ž‘์—…์—์„œ ์‚ฌ์šฉํ•˜๋Š” SQL ์ฟผ๋ฆฌ๋ฅผ ๋ณด๋ ค๋ฉด ์ถ”๊ฐ€ ์ •๋ณด ์—ด์—์„œ ์ฟผ๋ฆฌ ๋ณด๊ธฐ๋ฅผ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.