๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ k-์ต๋ช…์„ฑ ๊ณ„์‚ฐ

K-์ต๋ช…์„ฑ์€ ๋ ˆ์ฝ”๋“œ์˜ ์žฌ์‹๋ณ„์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฐ์ดํ„ฐ์„ธํŠธ์˜ ์†์„ฑ์ž…๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์žˆ๋Š” ๊ฐ ๊ฐœ์ธ์˜ ์œ ์‚ฌ ์‹๋ณ„์ž๊ฐ€ ๋™์ผํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์žˆ๋Š” ์ตœ์†Œ k โ€“ 1๋ช…์˜ ๋‹ค๋ฅธ ์‚ฌ๋žŒ๊ณผ ๋™์ผํ•œ ๊ฒฝ์šฐ ํ•ด๋‹น ๋ฐ์ดํ„ฐ ์„ธํŠธ๋Š” k-์ต๋ช…์„ฑ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ํ•˜๋‚˜ ์ด์ƒ์˜ ์—ด ๋˜๋Š” ํ•„๋“œ๋ฅผ ๊ธฐ์ค€์œผ๋กœ k-์ต๋ช…์„ฑ ๊ฐ’์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์ฃผ์ œ์—์„œ๋Š” Sensitive Data Protection์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ k-์ต๋ช…์„ฑ ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๊ณ„์† ์ง„ํ–‰ํ•˜๊ธฐ ์ „์— k-์ต๋ช…์„ฑ ๋˜๋Š” ์ผ๋ฐ˜ ์œ„ํ—˜ ๋ถ„์„์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์œ„ํ—˜ ๋ถ„์„ ๊ฐœ๋… ์ฃผ์ œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์‹œ์ž‘ํ•˜๊ธฐ ์ „์—

๊ณ„์†ํ•˜๊ธฐ ์ „์— ๋‹ค์Œ ์ž‘์—…์„ ์™„๋ฃŒํ–ˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”.

  1. Google ๊ณ„์ •์œผ๋กœ ๋กœ๊ทธ์ธํ•ฉ๋‹ˆ๋‹ค.
  2. Google Cloud ์ฝ˜์†”์˜ ํ”„๋กœ์ ํŠธ ์„ ํƒ๊ธฐ ํŽ˜์ด์ง€์—์„œ Google Cloud ํ”„๋กœ์ ํŠธ๋ฅผ ์„ ํƒํ•˜๊ฑฐ๋‚˜ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
  3. ํ”„๋กœ์ ํŠธ ์„ ํƒ๊ธฐ๋กœ ์ด๋™
  4. Google Cloud ํ”„๋กœ์ ํŠธ์— ๊ฒฐ์ œ๊ฐ€ ์‚ฌ์šฉ ์„ค์ •๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ํ”„๋กœ์ ํŠธ์— ๊ฒฐ์ œ๊ฐ€ ์‚ฌ์šฉ ์„ค์ •๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณด์„ธ์š”.
  5. Sensitive Data Protection์„ ์‚ฌ์šฉ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
  6. Sensitive Data Protection ์‚ฌ์šฉ ์„ค์ •

  7. ๋ถ„์„ํ•  BigQuery ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. Sensitive Data Protection์€ BigQuery ํ…Œ์ด๋ธ”์„ ์Šค์บ”ํ•˜์—ฌ k-์ต๋ช…์„ฑ ์ธก์ •ํ•ญ๋ชฉ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  8. ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ์‹๋ณ„์ž(ํ•ด๋‹น๋˜๋Š” ๊ฒฝ์šฐ)์™€ ํ•˜๋‚˜ ์ด์ƒ์˜ ์œ ์‚ฌ ์‹๋ณ„์ž๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์œ„ํ—˜ ๋ถ„์„ ์šฉ์–ด ๋ฐ ๊ธฐ๋ฒ•์„ ์ฐธ์กฐํ•˜์„ธ์š”.

k-์ต๋ช…์„ฑ ๊ณ„์‚ฐ

Sensitive Data Protection์€ ์œ„ํ—˜ ๋ถ„์„ ์ž‘์—…์ด ์‹คํ–‰๋  ๋•Œ๋งˆ๋‹ค ์œ„ํ—˜ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๋จผ์ €Google Cloud ์ฝ˜์†”์„ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ DLP API ์š”์ฒญ์„ ์ „์†กํ•˜๊ฑฐ๋‚˜ Sensitive Data Protection ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž‘์—…์„ ๋งŒ๋“ค์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ฝ˜์†”

  1. Google Cloud ์ฝ˜์†”์—์„œ ์œ„ํ—˜ ๋ถ„์„ ๋งŒ๋“ค๊ธฐ ํŽ˜์ด์ง€๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.

    ์œ„ํ—˜ ๋ถ„์„ ๋งŒ๋“ค๊ธฐ๋กœ ์ด๋™

  2. ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ์„ ํƒ ์„น์…˜์—์„œ ํ…Œ์ด๋ธ”์ด ํฌํ•จ๋œ ํ”„๋กœ์ ํŠธ์˜ ํ”„๋กœ์ ํŠธ ID, ํ…Œ์ด๋ธ”์˜ ๋ฐ์ดํ„ฐ์„ธํŠธ ID, ํ…Œ์ด๋ธ” ์ด๋ฆ„์„ ์ž…๋ ฅํ•˜์—ฌ ์Šค์บ”ํ•  BigQuery ํ…Œ์ด๋ธ”์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

  3. ๊ณ„์‚ฐํ•  ๊ฐœ์ธ์ •๋ณด ๋ณดํ˜ธ ์ธก์ •ํ•ญ๋ชฉ์—์„œ k-์ต๋ช…์„ฑ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

  4. ์ž‘์—… ID ์„น์…˜์—์„œ ์ž‘์—…์— ์ปค์Šคํ…€ ์‹๋ณ„์ž๋ฅผ ์ œ๊ณตํ•˜๊ณ  Sensitive Data Protection์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•  ๋ฆฌ์†Œ์Šค ์œ„์น˜๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์™„๋ฃŒ๋˜์—ˆ์œผ๋ฉด ๊ณ„์†์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

  5. ํ•„๋“œ ์ •์˜ ์„น์…˜์—์„œ k-์ต๋ช…์„ฑ ์œ„ํ—˜ ์ž‘์—…์˜ ์‹๋ณ„์ž ๋ฐ ์œ ์‚ฌ ์‹๋ณ„์ž๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. Sensitive Data Protection์€ ์ด์ „ ๋‹จ๊ณ„์—์„œ ์ง€์ •ํ•œ BigQuery ํ…Œ์ด๋ธ”์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ์— ์•ก์„ธ์Šคํ•˜๊ณ  ํ•„๋“œ ๋ชฉ๋ก์„ ์ฑ„์šฐ๋ ค๊ณ  ์‹œ๋„ํ•ฉ๋‹ˆ๋‹ค.

    1. ์ ์ ˆํ•œ ์ฒดํฌ๋ฐ•์Šค๋ฅผ ์„ ํƒํ•˜์—ฌ ํ•„๋“œ๋ฅผ ์‹๋ณ„์ž(ID) ๋˜๋Š” ์œ ์‚ฌ ์‹๋ณ„์ž(QI)๋กœ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. 0๊ฐœ ๋˜๋Š” 1๊ฐœ์˜ ์‹๋ณ„์ž์™€ 1๊ฐœ ์ด์ƒ์˜ ์œ ์‚ฌ ์‹๋ณ„์ž๋ฅผ ์„ ํƒํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
    2. Sensitive Data Protection์—์„œ ํ•„๋“œ๋ฅผ ์ฑ„์šธ ์ˆ˜ ์—†์œผ๋ฉด ํ•„๋“œ ์ด๋ฆ„ ์ž…๋ ฅ์„ ํด๋ฆญํ•˜์—ฌ ํ•„๋“œ๋ฅผ ํ•˜๋‚˜ ์ด์ƒ ์ง์ ‘ ์ž…๋ ฅํ•˜๊ณ  ๊ฐ ํ•„๋“œ๋ฅผ ์‹๋ณ„์ž๋‚˜ ์œ ์‚ฌ ์‹๋ณ„์ž๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์™„๋ฃŒ๋˜์—ˆ์œผ๋ฉด ๊ณ„์†์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.
  6. ์ž‘์—… ์ถ”๊ฐ€ ์„น์…˜์—์„œ ์œ„ํ—˜ ์ž‘์—…์ด ์™„๋ฃŒ๋˜๋ฉด ์ˆ˜ํ–‰ํ•  ์ž‘์—…(์„ ํƒ์‚ฌํ•ญ)์„ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์˜ต์…˜์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

    • BigQuery์— ์ €์žฅ: ์œ„ํ—˜ ๋ถ„์„ ์Šค์บ”์˜ ๊ฒฐ๊ณผ๋ฅผ BigQuery ํ…Œ์ด๋ธ”์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
    • Pub/Sub์— ๊ฒŒ์‹œ: Pub/Sub ์ฃผ์ œ์— ์•Œ๋ฆผ์„ ๊ฒŒ์‹œํ•ฉ๋‹ˆ๋‹ค.

    • ์ด๋ฉ”์ผ๋กœ ์•Œ๋ฆผ: ๊ฒฐ๊ณผ๊ฐ€ ํฌํ•จ๋œ ์ด๋ฉ”์ผ์„ ์ „์†กํ•ฉ๋‹ˆ๋‹ค. ์™„๋ฃŒ๋˜๋ฉด ๋งŒ๋“ค๊ธฐ๋ฅผ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

k-์ต๋ช…์„ฑ ์œ„ํ—˜ ๋ถ„์„ ์ž‘์—…์ด ์ฆ‰์‹œ ์‹œ์ž‘๋ฉ๋‹ˆ๋‹ค.

C#

Sensitive Data Protection์˜ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ Sensitive Data Protection ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Sensitive Data Protection์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.


using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using Google.Cloud.PubSub.V1;
using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using static Google.Cloud.Dlp.V2.Action.Types;
using static Google.Cloud.Dlp.V2.PrivacyMetric.Types;

public class RiskAnalysisCreateKAnonymity
{
    public static AnalyzeDataSourceRiskDetails.Types.KAnonymityResult KAnonymity(
        string callingProjectId,
        string tableProjectId,
        string datasetId,
        string tableId,
        string topicId,
        string subscriptionId,
        IEnumerable<FieldId> quasiIds)
    {
        var dlp = DlpServiceClient.Create();

        // Construct + submit the job
        var KAnonymityConfig = new KAnonymityConfig
        {
            QuasiIds = { quasiIds }
        };

        var config = new RiskAnalysisJobConfig
        {
            PrivacyMetric = new PrivacyMetric
            {
                KAnonymityConfig = KAnonymityConfig
            },
            SourceTable = new BigQueryTable
            {
                ProjectId = tableProjectId,
                DatasetId = datasetId,
                TableId = tableId
            },
            Actions =
            {
                new Google.Cloud.Dlp.V2.Action
                {
                    PubSub = new PublishToPubSub
                    {
                        Topic = $"projects/{callingProjectId}/topics/{topicId}"
                    }
                }
            }
        };

        var submittedJob = dlp.CreateDlpJob(
            new CreateDlpJobRequest
            {
                ParentAsProjectName = new ProjectName(callingProjectId),
                RiskJob = config
            });

        // Listen to pub/sub for the job
        var subscriptionName = new SubscriptionName(callingProjectId, subscriptionId);
        var subscriber = SubscriberClient.CreateAsync(
            subscriptionName).Result;

        // SimpleSubscriber runs your message handle function on multiple
        // threads to maximize throughput.
        var done = new ManualResetEventSlim(false);
        subscriber.StartAsync((PubsubMessage message, CancellationToken cancel) =>
        {
            if (message.Attributes["DlpJobName"] == submittedJob.Name)
            {
                Thread.Sleep(500); // Wait for DLP API results to become consistent
                done.Set();
                return Task.FromResult(SubscriberClient.Reply.Ack);
            }
            else
            {
                return Task.FromResult(SubscriberClient.Reply.Nack);
            }
        });

        done.Wait(TimeSpan.FromMinutes(10)); // 10 minute timeout; may not work for large jobs
        subscriber.StopAsync(CancellationToken.None).Wait();

        // Process results
        var resultJob = dlp.GetDlpJob(new GetDlpJobRequest
        {
            DlpJobName = DlpJobName.Parse(submittedJob.Name)
        });

        var result = resultJob.RiskDetails.KAnonymityResult;

        for (var bucketIdx = 0; bucketIdx < result.EquivalenceClassHistogramBuckets.Count; bucketIdx++)
        {
            var bucket = result.EquivalenceClassHistogramBuckets[bucketIdx];
            Console.WriteLine($"Bucket {bucketIdx}");
            Console.WriteLine($"  Bucket size range: [{bucket.EquivalenceClassSizeLowerBound}, {bucket.EquivalenceClassSizeUpperBound}].");
            Console.WriteLine($"  {bucket.BucketSize} unique value(s) total.");

            foreach (var bucketValue in bucket.BucketValues)
            {
                // 'UnpackValue(x)' is a prettier version of 'x.toString()'
                Console.WriteLine($"    Quasi-ID values: [{String.Join(',', bucketValue.QuasiIdsValues.Select(x => UnpackValue(x)))}]");
                Console.WriteLine($"    Class size: {bucketValue.EquivalenceClassSize}");
            }
        }

        return result;
    }

    public static string UnpackValue(Value protoValue)
    {
        var jsonValue = JsonConvert.DeserializeObject<Dictionary<string, object>>(protoValue.ToString());
        return jsonValue.Values.ElementAt(0).ToString();
    }
}

Go

๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ์˜ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Sensitive Data Protection์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.

import (
	"context"
	"fmt"
	"io"
	"strings"
	"time"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
	"cloud.google.com/go/pubsub"
)

// riskKAnonymity computes the risk of the given columns using K Anonymity.
func riskKAnonymity(w io.Writer, projectID, dataProject, pubSubTopic, pubSubSub, datasetID, tableID string, columnNames ...string) error {
	// projectID := "my-project-id"
	// dataProject := "bigquery-public-data"
	// pubSubTopic := "dlp-risk-sample-topic"
	// pubSubSub := "dlp-risk-sample-sub"
	// datasetID := "nhtsa_traffic_fatalities"
	// tableID := "accident_2015"
	// columnNames := "state_number" "county"
	ctx := context.Background()
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("dlp.NewClient: %w", err)
	}

	// Create a PubSub Client used to listen for when the inspect job finishes.
	pubsubClient, err := pubsub.NewClient(ctx, projectID)
	if err != nil {
		return err
	}
	defer pubsubClient.Close()

	// Create a PubSub subscription we can use to listen for messages.
	// Create the Topic if it doesn't exist.
	t := pubsubClient.Topic(pubSubTopic)
	topicExists, err := t.Exists(ctx)
	if err != nil {
		return err
	}
	if !topicExists {
		if t, err = pubsubClient.CreateTopic(ctx, pubSubTopic); err != nil {
			return err
		}
	}

	// Create the Subscription if it doesn't exist.
	s := pubsubClient.Subscription(pubSubSub)
	subExists, err := s.Exists(ctx)
	if err != nil {
		return err
	}
	if !subExists {
		if s, err = pubsubClient.CreateSubscription(ctx, pubSubSub, pubsub.SubscriptionConfig{Topic: t}); err != nil {
			return err
		}
	}

	// topic is the PubSub topic string where messages should be sent.
	topic := "projects/" + projectID + "/topics/" + pubSubTopic

	// Build the QuasiID slice.
	var q []*dlppb.FieldId
	for _, c := range columnNames {
		q = append(q, &dlppb.FieldId{Name: c})
	}

	// Create a configured request.
	req := &dlppb.CreateDlpJobRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		Job: &dlppb.CreateDlpJobRequest_RiskJob{
			RiskJob: &dlppb.RiskAnalysisJobConfig{
				// PrivacyMetric configures what to compute.
				PrivacyMetric: &dlppb.PrivacyMetric{
					Type: &dlppb.PrivacyMetric_KAnonymityConfig_{
						KAnonymityConfig: &dlppb.PrivacyMetric_KAnonymityConfig{
							QuasiIds: q,
						},
					},
				},
				// SourceTable describes where to find the data.
				SourceTable: &dlppb.BigQueryTable{
					ProjectId: dataProject,
					DatasetId: datasetID,
					TableId:   tableID,
				},
				// Send a message to PubSub using Actions.
				Actions: []*dlppb.Action{
					{
						Action: &dlppb.Action_PubSub{
							PubSub: &dlppb.Action_PublishToPubSub{
								Topic: topic,
							},
						},
					},
				},
			},
		},
	}
	// Create the risk job.
	j, err := client.CreateDlpJob(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateDlpJob: %w", err)
	}
	fmt.Fprintf(w, "Created job: %v\n", j.GetName())

	// Wait for the risk job to finish by waiting for a PubSub message.
	// This only waits for 10 minutes. For long jobs, consider using a truly
	// asynchronous execution model such as Cloud Functions.
	ctx, cancel := context.WithTimeout(ctx, 10*time.Minute)
	defer cancel()
	err = s.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) {
		// If this is the wrong job, do not process the result.
		if msg.Attributes["DlpJobName"] != j.GetName() {
			msg.Nack()
			return
		}
		msg.Ack()
		time.Sleep(500 * time.Millisecond)
		j, err := client.GetDlpJob(ctx, &dlppb.GetDlpJobRequest{
			Name: j.GetName(),
		})
		if err != nil {
			fmt.Fprintf(w, "GetDlpJob: %v", err)
			return
		}
		h := j.GetRiskDetails().GetKAnonymityResult().GetEquivalenceClassHistogramBuckets()
		for i, b := range h {
			fmt.Fprintf(w, "Histogram bucket %v\n", i)
			fmt.Fprintf(w, "  Size range: [%v,%v]\n", b.GetEquivalenceClassSizeLowerBound(), b.GetEquivalenceClassSizeUpperBound())
			fmt.Fprintf(w, "  %v unique values total\n", b.GetBucketSize())
			for _, v := range b.GetBucketValues() {
				var qvs []string
				for _, qv := range v.GetQuasiIdsValues() {
					qvs = append(qvs, qv.String())
				}
				fmt.Fprintf(w, "    QuasiID values: %s\n", strings.Join(qvs, ", "))
				fmt.Fprintf(w, "    Class size: %v\n", v.GetEquivalenceClassSize())
			}
		}
		// Stop listening for more messages.
		cancel()
	})
	if err != nil {
		return fmt.Errorf("Receive: %w", err)
	}
	return nil
}

Java

๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ์˜ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Sensitive Data Protection์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.


import com.google.api.core.SettableApiFuture;
import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.cloud.pubsub.v1.AckReplyConsumer;
import com.google.cloud.pubsub.v1.MessageReceiver;
import com.google.cloud.pubsub.v1.Subscriber;
import com.google.privacy.dlp.v2.Action;
import com.google.privacy.dlp.v2.Action.PublishToPubSub;
import com.google.privacy.dlp.v2.AnalyzeDataSourceRiskDetails.KAnonymityResult;
import com.google.privacy.dlp.v2.AnalyzeDataSourceRiskDetails.KAnonymityResult.KAnonymityEquivalenceClass;
import com.google.privacy.dlp.v2.AnalyzeDataSourceRiskDetails.KAnonymityResult.KAnonymityHistogramBucket;
import com.google.privacy.dlp.v2.BigQueryTable;
import com.google.privacy.dlp.v2.CreateDlpJobRequest;
import com.google.privacy.dlp.v2.DlpJob;
import com.google.privacy.dlp.v2.FieldId;
import com.google.privacy.dlp.v2.GetDlpJobRequest;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrivacyMetric;
import com.google.privacy.dlp.v2.PrivacyMetric.KAnonymityConfig;
import com.google.privacy.dlp.v2.RiskAnalysisJobConfig;
import com.google.privacy.dlp.v2.Value;
import com.google.pubsub.v1.ProjectSubscriptionName;
import com.google.pubsub.v1.ProjectTopicName;
import com.google.pubsub.v1.PubsubMessage;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import java.util.stream.Collectors;

@SuppressWarnings("checkstyle:AbbreviationAsWordInName")
class RiskAnalysisKAnonymity {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String datasetId = "your-bigquery-dataset-id";
    String tableId = "your-bigquery-table-id";
    String topicId = "pub-sub-topic";
    String subscriptionId = "pub-sub-subscription";
    calculateKAnonymity(projectId, datasetId, tableId, topicId, subscriptionId);
  }

  public static void calculateKAnonymity(
      String projectId, String datasetId, String tableId, String topicId, String subscriptionId)
      throws ExecutionException, InterruptedException, IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {

      // Specify the BigQuery table to analyze
      BigQueryTable bigQueryTable =
          BigQueryTable.newBuilder()
              .setProjectId(projectId)
              .setDatasetId(datasetId)
              .setTableId(tableId)
              .build();

      // These values represent the column names of quasi-identifiers to analyze
      List<String> quasiIds = Arrays.asList("Age", "Mystery");

      // Configure the privacy metric for the job
      List<FieldId> quasiIdFields =
          quasiIds.stream()
              .map(columnName -> FieldId.newBuilder().setName(columnName).build())
              .collect(Collectors.toList());
      KAnonymityConfig kanonymityConfig =
          KAnonymityConfig.newBuilder().addAllQuasiIds(quasiIdFields).build();
      PrivacyMetric privacyMetric =
          PrivacyMetric.newBuilder().setKAnonymityConfig(kanonymityConfig).build();

      // Create action to publish job status notifications over Google Cloud Pub/Sub
      ProjectTopicName topicName = ProjectTopicName.of(projectId, topicId);
      PublishToPubSub publishToPubSub =
          PublishToPubSub.newBuilder().setTopic(topicName.toString()).build();
      Action action = Action.newBuilder().setPubSub(publishToPubSub).build();

      // Configure the risk analysis job to perform
      RiskAnalysisJobConfig riskAnalysisJobConfig =
          RiskAnalysisJobConfig.newBuilder()
              .setSourceTable(bigQueryTable)
              .setPrivacyMetric(privacyMetric)
              .addActions(action)
              .build();

      // Build the request to be sent by the client
      CreateDlpJobRequest createDlpJobRequest =
          CreateDlpJobRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setRiskJob(riskAnalysisJobConfig)
              .build();

      // Send the request to the API using the client
      DlpJob dlpJob = dlpServiceClient.createDlpJob(createDlpJobRequest);

      // Set up a Pub/Sub subscriber to listen on the job completion status
      final SettableApiFuture<Boolean> done = SettableApiFuture.create();

      ProjectSubscriptionName subscriptionName =
          ProjectSubscriptionName.of(projectId, subscriptionId);

      MessageReceiver messageHandler =
          (PubsubMessage pubsubMessage, AckReplyConsumer ackReplyConsumer) -> {
            handleMessage(dlpJob, done, pubsubMessage, ackReplyConsumer);
          };
      Subscriber subscriber = Subscriber.newBuilder(subscriptionName, messageHandler).build();
      subscriber.startAsync();

      // Wait for job completion semi-synchronously
      // For long jobs, consider using a truly asynchronous execution model such as Cloud Functions
      try {
        done.get(15, TimeUnit.MINUTES);
      } catch (TimeoutException e) {
        System.out.println("Job was not completed after 15 minutes.");
        return;
      } finally {
        subscriber.stopAsync();
        subscriber.awaitTerminated();
      }

      // Build a request to get the completed job
      GetDlpJobRequest getDlpJobRequest =
          GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build();

      // Retrieve completed job status
      DlpJob completedJob = dlpServiceClient.getDlpJob(getDlpJobRequest);
      System.out.println("Job status: " + completedJob.getState());
      System.out.println("Job name: " + dlpJob.getName());

      // Get the result and parse through and process the information
      KAnonymityResult kanonymityResult = completedJob.getRiskDetails().getKAnonymityResult();
      List<KAnonymityHistogramBucket> histogramBucketList =
          kanonymityResult.getEquivalenceClassHistogramBucketsList();
      for (KAnonymityHistogramBucket result : histogramBucketList) {
        System.out.printf(
            "Bucket size range: [%d, %d]\n",
            result.getEquivalenceClassSizeLowerBound(), result.getEquivalenceClassSizeUpperBound());

        for (KAnonymityEquivalenceClass bucket : result.getBucketValuesList()) {
          List<String> quasiIdValues =
              bucket.getQuasiIdsValuesList().stream()
                  .map(Value::toString)
                  .collect(Collectors.toList());

          System.out.println("\tQuasi-ID values: " + String.join(", ", quasiIdValues));
          System.out.println("\tClass size: " + bucket.getEquivalenceClassSize());
        }
      }
    }
  }

  // handleMessage injects the job and settableFuture into the message reciever interface
  private static void handleMessage(
      DlpJob job,
      SettableApiFuture<Boolean> done,
      PubsubMessage pubsubMessage,
      AckReplyConsumer ackReplyConsumer) {
    String messageAttribute = pubsubMessage.getAttributesMap().get("DlpJobName");
    if (job.getName().equals(messageAttribute)) {
      done.set(true);
      ackReplyConsumer.ack();
    } else {
      ackReplyConsumer.nack();
    }
  }
}

Node.js

๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ์˜ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Sensitive Data Protection์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.

// Import the Google Cloud client libraries
const DLP = require('@google-cloud/dlp');
const {PubSub} = require('@google-cloud/pubsub');

// Instantiates clients
const dlp = new DLP.DlpServiceClient();
const pubsub = new PubSub();

// The project ID to run the API call under
// const projectId = 'my-project';

// The project ID the table is stored under
// This may or (for public datasets) may not equal the calling project ID
// const tableProjectId = 'my-project';

// The ID of the dataset to inspect, e.g. 'my_dataset'
// const datasetId = 'my_dataset';

// The ID of the table to inspect, e.g. 'my_table'
// const tableId = 'my_table';

// The name of the Pub/Sub topic to notify once the job completes
// TODO(developer): create a Pub/Sub topic to use for this
// const topicId = 'MY-PUBSUB-TOPIC'

// The name of the Pub/Sub subscription to use when listening for job
// completion notifications
// TODO(developer): create a Pub/Sub subscription to use for this
// const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'

// A set of columns that form a composite key ('quasi-identifiers')
// const quasiIds = [{ name: 'age' }, { name: 'city' }];
async function kAnonymityAnalysis() {
  const sourceTable = {
    projectId: tableProjectId,
    datasetId: datasetId,
    tableId: tableId,
  };
  // Construct request for creating a risk analysis job

  const request = {
    parent: `projects/${projectId}/locations/global`,
    riskJob: {
      privacyMetric: {
        kAnonymityConfig: {
          quasiIds: quasiIds,
        },
      },
      sourceTable: sourceTable,
      actions: [
        {
          pubSub: {
            topic: `projects/${projectId}/topics/${topicId}`,
          },
        },
      ],
    },
  };

  // Create helper function for unpacking values
  const getValue = obj => obj[Object.keys(obj)[0]];

  // Run risk analysis job
  const [topicResponse] = await pubsub.topic(topicId).get();
  const subscription = await topicResponse.subscription(subscriptionId);
  const [jobsResponse] = await dlp.createDlpJob(request);
  const jobName = jobsResponse.name;
  console.log(`Job created. Job name: ${jobName}`);
  // Watch the Pub/Sub topic until the DLP job finishes
  await new Promise((resolve, reject) => {
    const messageHandler = message => {
      if (message.attributes && message.attributes.DlpJobName === jobName) {
        message.ack();
        subscription.removeListener('message', messageHandler);
        subscription.removeListener('error', errorHandler);
        resolve(jobName);
      } else {
        message.nack();
      }
    };

    const errorHandler = err => {
      subscription.removeListener('message', messageHandler);
      subscription.removeListener('error', errorHandler);
      reject(err);
    };

    subscription.on('message', messageHandler);
    subscription.on('error', errorHandler);
  });
  setTimeout(() => {
    console.log(' Waiting for DLP job to fully complete');
  }, 500);
  const [job] = await dlp.getDlpJob({name: jobName});
  const histogramBuckets =
    job.riskDetails.kAnonymityResult.equivalenceClassHistogramBuckets;

  histogramBuckets.forEach((histogramBucket, histogramBucketIdx) => {
    console.log(`Bucket ${histogramBucketIdx}:`);
    console.log(
      `  Bucket size range: [${histogramBucket.equivalenceClassSizeLowerBound}, ${histogramBucket.equivalenceClassSizeUpperBound}]`
    );

    histogramBucket.bucketValues.forEach(valueBucket => {
      const quasiIdValues = valueBucket.quasiIdsValues
        .map(getValue)
        .join(', ');
      console.log(`  Quasi-ID values: {${quasiIdValues}}`);
      console.log(`  Class size: ${valueBucket.equivalenceClassSize}`);
    });
  });
}
await kAnonymityAnalysis();

PHP

๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ์˜ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Sensitive Data Protection์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.

use Google\Cloud\Dlp\V2\RiskAnalysisJobConfig;
use Google\Cloud\Dlp\V2\BigQueryTable;
use Google\Cloud\Dlp\V2\DlpJob\JobState;
use Google\Cloud\Dlp\V2\Action;
use Google\Cloud\Dlp\V2\Action\PublishToPubSub;
use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\CreateDlpJobRequest;
use Google\Cloud\Dlp\V2\FieldId;
use Google\Cloud\Dlp\V2\GetDlpJobRequest;
use Google\Cloud\Dlp\V2\PrivacyMetric;
use Google\Cloud\Dlp\V2\PrivacyMetric\KAnonymityConfig;
use Google\Cloud\PubSub\PubSubClient;

/**
 * Computes the k-anonymity of a column set in a Google BigQuery table.
 *
 * @param string    $callingProjectId  The project ID to run the API call under
 * @param string    $dataProjectId     The project ID containing the target Datastore
 * @param string    $topicId           The name of the Pub/Sub topic to notify once the job completes
 * @param string    $subscriptionId    The name of the Pub/Sub subscription to use when listening for job
 * @param string    $datasetId         The ID of the dataset to inspect
 * @param string    $tableId           The ID of the table to inspect
 * @param string[]  $quasiIdNames      Array columns that form a composite key (quasi-identifiers)
 */
function k_anonymity(
    string $callingProjectId,
    string $dataProjectId,
    string $topicId,
    string $subscriptionId,
    string $datasetId,
    string $tableId,
    array $quasiIdNames
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();
    $pubsub = new PubSubClient();
    $topic = $pubsub->topic($topicId);

    // Construct risk analysis config
    $quasiIds = array_map(
        function ($id) {
            return (new FieldId())->setName($id);
        },
        $quasiIdNames
    );

    $statsConfig = (new KAnonymityConfig())
        ->setQuasiIds($quasiIds);

    $privacyMetric = (new PrivacyMetric())
        ->setKAnonymityConfig($statsConfig);

    // Construct items to be analyzed
    $bigqueryTable = (new BigQueryTable())
        ->setProjectId($dataProjectId)
        ->setDatasetId($datasetId)
        ->setTableId($tableId);

    // Construct the action to run when job completes
    $pubSubAction = (new PublishToPubSub())
        ->setTopic($topic->name());

    $action = (new Action())
        ->setPubSub($pubSubAction);

    // Construct risk analysis job config to run
    $riskJob = (new RiskAnalysisJobConfig())
        ->setPrivacyMetric($privacyMetric)
        ->setSourceTable($bigqueryTable)
        ->setActions([$action]);

    // Listen for job notifications via an existing topic/subscription.
    $subscription = $topic->subscription($subscriptionId);

    // Submit request
    $parent = "projects/$callingProjectId/locations/global";
    $createDlpJobRequest = (new CreateDlpJobRequest())
        ->setParent($parent)
        ->setRiskJob($riskJob);
    $job = $dlp->createDlpJob($createDlpJobRequest);

    // Poll Pub/Sub using exponential backoff until job finishes
    // Consider using an asynchronous execution model such as Cloud Functions
    $attempt = 1;
    $startTime = time();
    do {
        foreach ($subscription->pull() as $message) {
            if (
                isset($message->attributes()['DlpJobName']) &&
                $message->attributes()['DlpJobName'] === $job->getName()
            ) {
                $subscription->acknowledge($message);
                // Get the updated job. Loop to avoid race condition with DLP API.
                do {
                    $getDlpJobRequest = (new GetDlpJobRequest())
                        ->setName($job->getName());
                    $job = $dlp->getDlpJob($getDlpJobRequest);
                } while ($job->getState() == JobState::RUNNING);
                break 2; // break from parent do while
            }
        }
        print('Waiting for job to complete' . PHP_EOL);
        // Exponential backoff with max delay of 60 seconds
        sleep(min(60, pow(2, ++$attempt)));
    } while (time() - $startTime < 600); // 10 minute timeout

    // Print finding counts
    printf('Job %s status: %s' . PHP_EOL, $job->getName(), JobState::name($job->getState()));
    switch ($job->getState()) {
        case JobState::DONE:
            $histBuckets = $job->getRiskDetails()->getKAnonymityResult()->getEquivalenceClassHistogramBuckets();

            foreach ($histBuckets as $bucketIndex => $histBucket) {
                // Print bucket stats
                printf('Bucket %s:' . PHP_EOL, $bucketIndex);
                printf(
                    '  Bucket size range: [%s, %s]' . PHP_EOL,
                    $histBucket->getEquivalenceClassSizeLowerBound(),
                    $histBucket->getEquivalenceClassSizeUpperBound()
                );

                // Print bucket values
                foreach ($histBucket->getBucketValues() as $percent => $valueBucket) {
                    // Pretty-print quasi-ID values
                    print('  Quasi-ID values:' . PHP_EOL);
                    foreach ($valueBucket->getQuasiIdsValues() as $index => $value) {
                        print('    ' . $value->serializeToJsonString() . PHP_EOL);
                    }
                    printf(
                        '  Class size: %s' . PHP_EOL,
                        $valueBucket->getEquivalenceClassSize()
                    );
                }
            }

            break;
        case JobState::FAILED:
            printf('Job %s had errors:' . PHP_EOL, $job->getName());
            $errors = $job->getErrors();
            foreach ($errors as $error) {
                var_dump($error->getDetails());
            }
            break;
        case JobState::PENDING:
            print('Job has not completed. Consider a longer timeout or an asynchronous execution model' . PHP_EOL);
            break;
        default:
            print('Unexpected job state. Most likely, the job is either running or has not yet started.');
    }
}

Python

๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ์˜ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Sensitive Data Protection์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.


import concurrent.futures

from typing import List

import google.cloud.dlp
from google.cloud.dlp_v2 import types
import google.cloud.pubsub


def k_anonymity_analysis(
    project: str,
    table_project_id: str,
    dataset_id: str,
    table_id: str,
    topic_id: str,
    subscription_id: str,
    quasi_ids: List[str],
    timeout: int = 300,
) -> None:
    """Uses the Data Loss Prevention API to compute the k-anonymity of a
        column set in a Google BigQuery table.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        table_project_id: The Google Cloud project id where the BigQuery table
            is stored.
        dataset_id: The id of the dataset to inspect.
        table_id: The id of the table to inspect.
        topic_id: The name of the Pub/Sub topic to notify once the job
            completes.
        subscription_id: The name of the Pub/Sub subscription to use when
            listening for job completion notifications.
        quasi_ids: A set of columns that form a composite key.
        timeout: The number of seconds to wait for a response from the API.

    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Create helper function for unpacking values
    def get_values(obj: types.Value) -> int:
        return int(obj.integer_value)

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    topic = google.cloud.pubsub.PublisherClient.topic_path(project, topic_id)
    parent = f"projects/{project}/locations/global"

    # Location info of the BigQuery table.
    source_table = {
        "project_id": table_project_id,
        "dataset_id": dataset_id,
        "table_id": table_id,
    }

    # Convert quasi id list to Protobuf type
    def map_fields(field: str) -> dict:
        return {"name": field}

    quasi_ids = map(map_fields, quasi_ids)

    # Tell the API where to send a notification when the job is complete.
    actions = [{"pub_sub": {"topic": topic}}]

    # Configure risk analysis job
    # Give the name of the numeric column to compute risk metrics for
    risk_job = {
        "privacy_metric": {"k_anonymity_config": {"quasi_ids": quasi_ids}},
        "source_table": source_table,
        "actions": actions,
    }

    # Call API to start risk analysis job
    operation = dlp.create_dlp_job(request={"parent": parent, "risk_job": risk_job})

    def callback(message: google.cloud.pubsub_v1.subscriber.message.Message) -> None:
        if message.attributes["DlpJobName"] == operation.name:
            # This is the message we're looking for, so acknowledge it.
            message.ack()

            # Now that the job is done, fetch the results and print them.
            job = dlp.get_dlp_job(request={"name": operation.name})
            print(f"Job name: {job.name}")
            histogram_buckets = (
                job.risk_details.k_anonymity_result.equivalence_class_histogram_buckets
            )
            # Print bucket stats
            for i, bucket in enumerate(histogram_buckets):
                print(f"Bucket {i}:")
                if bucket.equivalence_class_size_lower_bound:
                    print(
                        "   Bucket size range: [{}, {}]".format(
                            bucket.equivalence_class_size_lower_bound,
                            bucket.equivalence_class_size_upper_bound,
                        )
                    )
                    for value_bucket in bucket.bucket_values:
                        print(
                            "   Quasi-ID values: {}".format(
                                map(get_values, value_bucket.quasi_ids_values)
                            )
                        )
                        print(
                            "   Class size: {}".format(
                                value_bucket.equivalence_class_size
                            )
                        )
            subscription.set_result(None)
        else:
            # This is not the message we're looking for.
            message.drop()

    # Create a Pub/Sub client and find the subscription. The subscription is
    # expected to already be listening to the topic.
    subscriber = google.cloud.pubsub.SubscriberClient()
    subscription_path = subscriber.subscription_path(project, subscription_id)
    subscription = subscriber.subscribe(subscription_path, callback)

    try:
        subscription.result(timeout=timeout)
    except concurrent.futures.TimeoutError:
        print(
            "No event received before the timeout. Please verify that the "
            "subscription provided is subscribed to the topic provided."
        )
        subscription.close()

REST

์ƒˆ๋กœ์šด ์œ„ํ—˜ ๋ถ„์„ ์ž‘์—…์„ ์‹คํ–‰ํ•˜์—ฌ k-์ต๋ช…์„ฑ์„ ๊ณ„์‚ฐํ•˜๋ ค๋ฉด projects.dlpJobs ๋ฆฌ์†Œ์Šค์— ์š”์ฒญ์„ ๋ณด๋ƒ…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ PROJECT_ID๋Š” ํ”„๋กœ์ ํŠธ ์‹๋ณ„์ž๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

https://dlp.googleapis.com/v2/projects/PROJECT_ID/dlpJobs

์š”์ฒญ์—๋Š” ๋‹ค์Œ ์š”์†Œ๋กœ ๊ตฌ์„ฑ๋œ RiskAnalysisJobConfig ๊ฐ์ฒด๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

  • PrivacyMetric ๊ฐ์ฒด. ์—ฌ๊ธฐ์—์„œ KAnonymityConfig ๊ฐ์ฒด๋ฅผ ํฌํ•จํ•˜์—ฌ k-์ต๋ช…์„ฑ์„ ๊ณ„์‚ฐํ•˜๋„๋ก ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

  • BigQueryTable ๊ฐ์ฒด. ๋‹ค์Œ์„ ๋ชจ๋‘ ํฌํ•จํ•˜์—ฌ ์Šค์บ”ํ•  BigQuery ํ…Œ์ด๋ธ”์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

    • projectId: ํ…Œ์ด๋ธ”์ด ํฌํ•จ๋œ ํ”„๋กœ์ ํŠธ์˜ ํ”„๋กœ์ ํŠธ ID
    • datasetId: ํ…Œ์ด๋ธ”์˜ ๋ฐ์ดํ„ฐ ์„ธํŠธ ID
    • tableId: ํ…Œ์ด๋ธ”์˜ ์ด๋ฆ„
  • ์ž‘์—… ์™„๋ฃŒ ์‹œ ์‹คํ–‰ํ•  ์ž‘์—…์„ ๋‚˜ํƒ€๋‚ด๋Š” ํ•˜๋‚˜ ์ด์ƒ์˜ Action ๊ฐ์ฒด ์ง‘ํ•ฉ(์ฃผ์–ด์ง„ ์ˆœ์„œ์— ๋”ฐ๋ฆ„). ๊ฐ Action ๊ฐ์ฒด๋Š” ๋‹ค์Œ ์ž‘์—… ์ค‘ ํ•˜๋‚˜๋ฅผ ํฌํ•จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • SaveFindings ๊ฐ์ฒด: ์œ„ํ—˜ ๋ถ„์„ ์Šค์บ”์˜ ๊ฒฐ๊ณผ๋ฅผ BigQuery ํ…Œ์ด๋ธ”์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
    • PublishToPubSub ๊ฐ์ฒด: Pub/Sub ์ฃผ์ œ์— ์•Œ๋ฆผ์„ ๊ฒŒ์‹œํ•ฉ๋‹ˆ๋‹ค.

    • JobNotificationEmails ๊ฐ์ฒด: ๊ฒฐ๊ณผ๊ฐ€ ํฌํ•จ๋œ ์ด๋ฉ”์ผ์„ ๋ณด๋ƒ…๋‹ˆ๋‹ค.

    KAnonymityConfig ๊ฐ์ฒด ๋‚ด์—์„œ ๋‹ค์Œ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

    • quasiIds[]: k-์ต๋ช…์„ฑ์„ ๊ณ„์‚ฐํ•  ๋•Œ ์Šค์บ”ํ•˜๊ณ  ์‚ฌ์šฉํ•  ํ•˜๋‚˜ ์ด์ƒ์˜ ์œ ์‚ฌ ์‹๋ณ„์ž(FieldId ๊ฐ์ฒด). ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์œ ์‚ฌ ์‹๋ณ„์ž๋ฅผ ์ง€์ •ํ•˜๋Š” ๊ฒฝ์šฐ ํ•˜๋‚˜์˜ ๋ณตํ•ฉ ํ‚ค๋กœ ๊ฐ„์ฃผ๋ฉ๋‹ˆ๋‹ค. ๊ตฌ์กฐ์ฒด ๋ฐ ๋ฐ˜๋ณต ๋ฐ์ดํ„ฐ ์œ ํ˜•์€ ์ง€์›๋˜์ง€ ์•Š์ง€๋งŒ ์ค‘์ฒฉ ํ•„๋“œ๋Š” ๊ทธ ์ž์ฒด๊ฐ€ ๊ตฌ์กฐ์ฒด์ด๊ฑฐ๋‚˜ ๋ฐ˜๋ณต ํ•„๋“œ์— ์ค‘์ฒฉ๋˜์ง€ ์•Š๋Š” ํ•œ ์ง€์›๋ฉ๋‹ˆ๋‹ค.
    • entityId: ์„ ํƒ์‚ฌํ•ญ์ธ ์‹๋ณ„์ž ๊ฐ’. ์„ค์ •๋˜๋ฉด k-์ต๋ช…์„ฑ์„ ๊ณ„์‚ฐํ•  ๋•Œ ๊ฐ๊ฐ์˜ ๊ณ ์œ ํ•œ entityId์— ํ•ด๋‹นํ•˜๋Š” ๋ชจ๋“  ํ–‰์„ ๊ทธ๋ฃนํ™”ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ entityId๋Š” ๊ณ ๊ฐ ID ๋˜๋Š” ์‚ฌ์šฉ์ž ID์™€ ๊ฐ™์ด ์ˆœ ์‚ฌ์šฉ์ž๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์—ด์ž…๋‹ˆ๋‹ค. entityId๊ฐ€ ์œ ์‚ฌ ์‹๋ณ„์ž ๊ฐ’์ด ์„œ๋กœ ๋‹ค๋ฅธ ์—ฌ๋Ÿฌ ํ–‰์— ๋‚˜ํƒ€๋‚˜๋Š” ๊ฒฝ์šฐ ์ด๋Ÿฌํ•œ ํ–‰์€ ์กฐ์ธ์„ ํ†ตํ•ด ํ•ด๋‹น ํ•ญ๋ชฉ์„ ์œ„ํ•œ ์œ ์‚ฌ ์‹๋ณ„์ž๋กœ ์‚ฌ์šฉ๋˜๋Š” ๋ฉ€ํ‹ฐ ์„ธํŠธ๋ฅผ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค. ํ•ญ๋ชฉ ID์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์œ„ํ—˜ ๋ถ„์„ ๊ฐœ๋… ํ•ญ๋ชฉ์˜ ํ•ญ๋ชฉ ID ๋ฐ k-์ต๋ช…์„ฑ ๊ณ„์‚ฐ์„ ์ฐธ์กฐํ•˜์„ธ์š”.

DLP API์— ์š”์ฒญ์„ ์ „์†กํ•˜๋ฉด ์œ„ํ—˜ ๋ถ„์„ ์ž‘์—…์ด ์‹œ์ž‘๋ฉ๋‹ˆ๋‹ค.

์™„๋ฃŒ๋œ ์œ„ํ—˜ ๋ถ„์„ ์ž‘์—… ๋‚˜์—ด

ํ˜„์žฌ ํ”„๋กœ์ ํŠธ์—์„œ ์‹คํ–‰๋œ ์œ„ํ—˜ ๋ถ„์„ ์ž‘์—…์˜ ๋ชฉ๋ก์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Console

Google Cloud ์ฝ˜์†”์—์„œ ์‹คํ–‰ ์ค‘์ด๊ณ  ์ด์ „์— ์‹คํ–‰๋œ ์œ„ํ—˜ ๋ถ„์„ ์ž‘์—…์„ ๋‚˜์—ดํ•˜๋ ค๋ฉด ๋‹ค์Œ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

  1. Google Cloud ์ฝ˜์†”์—์„œ Sensitive Data Protection์„ ์—ฝ๋‹ˆ๋‹ค.

    ๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ๋กœ ์ด๋™

  2. ํŽ˜์ด์ง€ ์ƒ๋‹จ์˜ ์ž‘์—… ๋ฐ ์ž‘์—… ํŠธ๋ฆฌ๊ฑฐ ํƒญ์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

  3. ์œ„ํ—˜ ์ž‘์—… ํƒญ์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

์œ„ํ—˜ ์ž‘์—… ๋ชฉ๋ก์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

ํ”„๋กœํ† ์ฝœ

์‹คํ–‰ ์ค‘์ด๊ณ  ์ด์ „์— ์‹คํ–‰๋œ ์œ„ํ—˜ ๋ถ„์„ ์ž‘์—…์„ ๋‚˜์—ดํ•˜๋ ค๋ฉด projects.dlpJobs ๋ฆฌ์†Œ์Šค์— GET ์š”์ฒญ์„ ๋ณด๋ƒ…๋‹ˆ๋‹ค. ์ž‘์—… ์œ ํ˜• ํ•„ํ„ฐ(?type=RISK_ANALYSIS_JOB)๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด ์‘๋‹ต์˜ ๋ฒ”์œ„๋ฅผ ์œ„ํ—˜ ๋ถ„์„ ์ž‘์—…์œผ๋กœ ์ขํž™๋‹ˆ๋‹ค.

https://dlp.googleapis.com/v2/projects/PROJECT_ID/dlpJobs?type=RISK_ANALYSIS_JOB

์ˆ˜์‹ ํ•˜๋Š” ์‘๋‹ต์—๋Š” ํ˜„์žฌ ๋ฐ ์ด์ „ ์œ„ํ—˜ ๋ถ„์„ ์ž‘์—…์˜ JSON ํ‘œํ˜„์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

k-์ต๋ช…์„ฑ ์ž‘์—… ๊ฒฐ๊ณผ ๋ณด๊ธฐ

Google Cloud ์ฝ˜์†”์˜ Sensitive Data Protection์—๋Š” ์™„๋ฃŒ๋œ k-์ต๋ช…์„ฑ ์ž‘์—…์˜ ์‹œ๊ฐํ™” ๊ธฐ๋Šฅ์ด ๊ธฐ๋ณธ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. ์ด์ „ ์„น์…˜์˜ ์•ˆ๋‚ด๋ฅผ ๋”ฐ๋ฅธ ํ›„ ์œ„ํ—˜ ๋ถ„์„ ์ž‘์—… ๋ชฉ๋ก์—์„œ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•  ์ž‘์—…์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์ž‘์—…์ด ์„ฑ๊ณต์ ์œผ๋กœ ์‹คํ–‰๋˜์—ˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๋ฉด ์œ„ํ—˜ ๋ถ„์„ ์„ธ๋ถ€์ •๋ณด ํŽ˜์ด์ง€ ์ƒ๋‹จ์— ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

ํŽ˜์ด์ง€ ์ƒ๋‹จ์— ์ž‘์—… ID๋ฅผ ํฌํ•จํ•œ k-์ต๋ช…์„ฑ ์œ„ํ—˜ ์ž‘์—…์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ๋‚˜์™€ ์žˆ์œผ๋ฉฐ ์ปจํ…Œ์ด๋„ˆ์—๋Š” ๋ฆฌ์†Œ์Šค ์œ„์น˜๊ฐ€ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

k-์ต๋ช…์„ฑ ๊ณ„์‚ฐ ๊ฒฐ๊ณผ๋ฅผ ๋ณด๋ ค๋ฉด K-์ต๋ช…์„ฑ ํƒญ์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค. ์œ„ํ—˜ ๋ถ„์„ ์ž‘์—…์˜ ๊ตฌ์„ฑ์„ ๋ณด๋ ค๋ฉด ๊ตฌ์„ฑ ํƒญ์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

K-์ต๋ช…์„ฑ ํƒญ์—๋Š” ๋จผ์ € ํ•ญ๋ชฉ ID(์žˆ๋Š” ๊ฒฝ์šฐ)์™€ k-์ต๋ช…์„ฑ์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์œ ์‚ฌ ์‹๋ณ„์ž๊ฐ€ ๋‚˜์—ด๋ฉ๋‹ˆ๋‹ค.

์œ„ํ—˜ ์ฐจํŠธ

์žฌ์‹๋ณ„ ์œ„ํ—˜ ์ฐจํŠธ์˜ y์ถ•์€ ๊ณ ์œ  ํ–‰๊ณผ ๊ณ ์œ  ์œ ์‚ฌ ์‹๋ณ„์ž ์กฐํ•ฉ์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ ์†์‹ค์˜ ์ž ์žฌ์  ๋น„์œจ์ด๋ฉฐ x์ถ•์€ k-์ต๋ช…์„ฑ ๊ฐ’์ž…๋‹ˆ๋‹ค. ์ฐจํŠธ์˜ ์ƒ‰์ƒ๋„ ์œ„ํ—˜ ๊ฐ€๋Šฅ์„ฑ๋„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ํŒŒ๋ž€์ƒ‰์ด ์–ด๋‘์šธ์ˆ˜๋ก ์œ„ํ—˜์ด ๋†’์œผ๋ฉฐ ๋ฐ์„ ์ˆ˜๋ก ์œ„ํ—˜์ด ์ ์Šต๋‹ˆ๋‹ค.

k-์ต๋ช…์„ฑ ๊ฐ’์ด ๋†’์œผ๋ฉด ์žฌ์‹๋ณ„ ์œ„ํ—˜์ด ์ค„์–ด๋“ญ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ k-์ต๋ช…์„ฑ ๊ฐ’์„ ๋†’์ด๋ ค๋ฉด ์ด ํ–‰์˜ ๋†’์€ ๋น„์œจ๊ณผ ๋†’์€ ๊ณ ์œ  ์œ ์‚ฌ ์‹๋ณ„์ž ์กฐํ•ฉ์„ ์‚ญ์ œํ•ด์•ผ ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ด๋Š” ๋ฐ์ดํ„ฐ์˜ ์œ ์šฉ์„ฑ์„ ๋‚ฎ์ถฅ๋‹ˆ๋‹ค. ํŠน์ • k-์ต๋ช…์„ฑ ๊ฐ’์˜ ํŠน์ • ์ž ์žฌ์  ๋ฐฑ๋ถ„์œจ ์†์‹ค ๊ฐ’์„ ๋ณด๋ ค๋ฉด ์ฐจํŠธ ์œ„๋กœ ๋งˆ์šฐ์Šค ์ปค์„œ๋ฅผ ๊ฐ€์ ธ๊ฐ€์„ธ์š”. ์Šคํฌ๋ฆฐ์ƒท๊ณผ ๊ฐ™์ด ์ฐจํŠธ์— ๋„์›€๋ง์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

ํŠน์ • k-์ต๋ช…์„ฑ ๊ฐ’์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์„ ๋ณด๋ ค๋ฉด ํ•ด๋‹น ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค. ์ฐจํŠธ ์•„๋ž˜์— ์ž์„ธํ•œ ์„ค๋ช…์ด ํ‘œ์‹œ๋˜๊ณ  ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ ํ…Œ์ด๋ธ”์ด ํŽ˜์ด์ง€ ์•„๋ž˜์ชฝ์— ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

์œ„ํ—˜ ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ ํ‘œ

์œ„ํ—˜ ์ž‘์—… ๊ฒฐ๊ณผ ํŽ˜์ด์ง€์˜ ๋‘ ๋ฒˆ์งธ ๊ตฌ์„ฑ์š”์†Œ๋Š” ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ ํ…Œ์ด๋ธ”์ž…๋‹ˆ๋‹ค. ์ง€์ •๋œ ๋Œ€์ƒ k-์ต๋ช…์„ฑ ๊ฐ’์— ๋Œ€ํ•œ ์œ ์‚ฌ ์‹๋ณ„์ž ์กฐํ•ฉ์„ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค.

ํ…Œ์ด๋ธ”์˜ ์ฒซ ๋ฒˆ์งธ ์—ด์—๋Š” k-์ต๋ช…์„ฑ ๊ฐ’์ด ๋‚˜์—ด๋ฉ๋‹ˆ๋‹ค. k-์ต๋ช…์„ฑ ๊ฐ’์„ ํด๋ฆญํ•˜๋ฉด ๊ฐ’์„ ์–ป๊ธฐ ์œ„ํ•ด ์‚ญ์ œ๋˜์–ด์•ผ ํ•  ํ•ด๋‹น ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ๊ฐ€ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

๋‘ ๋ฒˆ์งธ ์—ด์€ ๊ณ ์œ ํ•œ ํ–‰ ๋ฐ ์œ ์‚ฌ ์‹๋ณ„์ž ์กฐํ•ฉ์˜ ์ž ์žฌ์  ๋ฐ์ดํ„ฐ ์†์‹ค๊ณผ ์ตœ์†Œ k ๋ ˆ์ฝ”๋“œ๊ฐ€ ์žˆ๋Š” ๊ทธ๋ฃน ์ˆ˜์™€ ์ด ๋ ˆ์ฝ”๋“œ ์ˆ˜๋ฅผ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค.

๋งˆ์ง€๋ง‰ ์—ด์—๋Š” ์œ ์‚ฌ ์‹๋ณ„์ž ์กฐํ•ฉ์„ ๊ณต์œ ํ•˜๋Š” ๊ทธ๋ฃน์˜ ์ƒ˜ํ”Œ๊ณผ ํ•ด๋‹น ์กฐํ•ฉ์— ์กด์žฌํ•˜๋Š” ๋ ˆ์ฝ”๋“œ ์ˆ˜๊ฐ€ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

REST๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž‘์—… ์„ธ๋ถ€์ •๋ณด ๊ฒ€์ƒ‰

REST API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ k-์ต๋ช…์„ฑ ์œ„ํ—˜ ๋ถ„์„ ์ž‘์—…์˜ ๊ฒฐ๊ณผ๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋ ค๋ฉด ๋‹ค์Œ GET ์š”์ฒญ์„ projects.dlpJobs ๋ฆฌ์†Œ์Šค์— ๋ณด๋ƒ…๋‹ˆ๋‹ค. PROJECT_ID๋ฅผ ํ”„๋กœ์ ํŠธ ID๋กœ ๋ฐ”๊พธ๊ณ  JOB_ID๋ฅผ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ž‘์—… ์‹๋ณ„์ž๋กœ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค. ์ž‘์—… ID๋Š” ์ž‘์—… ์‹œ์ž‘ ์‹œ ๋ฐ˜ํ™˜๋˜์—ˆ์œผ๋ฉฐ ๋ชจ๋“  ์ž‘์—…์„ ๋‚˜์—ดํ•˜์—ฌ ๊ฒ€์ƒ‰ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

GET https://dlp.googleapis.com/v2/projects/PROJECT_ID/dlpJobs/JOB_ID

์š”์ฒญ์€ ์ž‘์—… ์ธ์Šคํ„ด์Šค๊ฐ€ ํฌํ•จ๋œ JSON ๊ฐ์ฒด๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๋ถ„์„ ๊ฒฐ๊ณผ๋Š” AnalyzeDataSourceRiskDetails ๊ฐ์ฒด์˜ "riskDetails" ํ‚ค ๋‚ด์— ์žˆ์Šต๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ DlpJob ๋ฆฌ์†Œ์Šค์˜ API ์ฐธ์กฐ๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

์ฝ”๋“œ ์ƒ˜ํ”Œ: ํ•ญ๋ชฉ ID๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ k-์ต๋ช…์„ฑ ๊ณ„์‚ฐ

์ด ์˜ˆ์‹œ์—์„œ๋Š” ์—”ํ‹ฐํ‹ฐ ID๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ k-์ต๋ช…์„ฑ์„ ๊ณ„์‚ฐํ•˜๋Š” ์œ„ํ—˜ ๋ถ„์„ ์ž‘์—…์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

ํ•ญ๋ชฉ ID์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ํ•ญ๋ชฉ ID ๋ฐ k-์ต๋ช…์„ฑ ๊ณ„์‚ฐ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.

C#

Sensitive Data Protection์˜ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ Sensitive Data Protection ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Sensitive Data Protection์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.


using System;
using System.Collections.Generic;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using Newtonsoft.Json;

public class CalculateKAnonymityOnDataset
{
    public static DlpJob CalculateKAnonymitty(
        string projectId,
        string datasetId,
        string sourceTableId,
        string outputTableId)
    {
        // Construct the dlp client.
        var dlp = DlpServiceClient.Create();

        // Construct the k-anonymity config by setting the EntityId as user_id column
        // and two quasi-identifiers columns.
        var kAnonymity = new PrivacyMetric.Types.KAnonymityConfig
        {
            EntityId = new EntityId
            {
                Field = new FieldId { Name = "Name" }
            },
            QuasiIds =
            {
                new FieldId { Name = "Age" },
                new FieldId { Name = "Mystery" }
            }
        };

        // Construct risk analysis job config by providing the source table, privacy metric
        // and action to save the findings to a BigQuery table.
        var riskJob = new RiskAnalysisJobConfig
        {
            SourceTable = new BigQueryTable
            {
                ProjectId = projectId,
                DatasetId = datasetId,
                TableId = sourceTableId,
            },
            PrivacyMetric = new PrivacyMetric
            {
                KAnonymityConfig = kAnonymity,
            },
            Actions =
            {
                new Google.Cloud.Dlp.V2.Action
                {
                    SaveFindings = new Google.Cloud.Dlp.V2.Action.Types.SaveFindings
                    {
                        OutputConfig = new OutputStorageConfig
                        {
                            Table = new BigQueryTable
                            {
                                ProjectId = projectId,
                                DatasetId = datasetId,
                                TableId = outputTableId
                            }
                        }
                    }
                }
            }
        };

        // Construct the request by providing RiskJob object created above.
        var request = new CreateDlpJobRequest
        {
            ParentAsLocationName = new LocationName(projectId, "global"),
            RiskJob = riskJob
        };

        // Send the job request.
        DlpJob response = dlp.CreateDlpJob(request);

        Console.WriteLine($"Job created successfully. Job name: ${response.Name}");

        return response;
    }
}

Go

๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ์˜ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Sensitive Data Protection์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.

import (
	"context"
	"fmt"
	"io"
	"strings"
	"time"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
)

// Uses the Data Loss Prevention API to compute the k-anonymity of a
// column set in a Google BigQuery table.
func calculateKAnonymityWithEntityId(w io.Writer, projectID, datasetId, tableId string, columnNames ...string) error {
	// projectID := "your-project-id"
	// datasetId := "your-bigquery-dataset-id"
	// tableId := "your-bigquery-table-id"
	// columnNames := "age" "job_title"

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	}

	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Specify the BigQuery table to analyze
	bigQueryTable := &dlppb.BigQueryTable{
		ProjectId: "bigquery-public-data",
		DatasetId: "samples",
		TableId:   "wikipedia",
	}

	// Configure the privacy metric for the job
	// Build the QuasiID slice.
	var q []*dlppb.FieldId
	for _, c := range columnNames {
		q = append(q, &dlppb.FieldId{Name: c})
	}

	entityId := &dlppb.EntityId{
		Field: &dlppb.FieldId{
			Name: "id",
		},
	}

	kAnonymityConfig := &dlppb.PrivacyMetric_KAnonymityConfig{
		QuasiIds: q,
		EntityId: entityId,
	}

	privacyMetric := &dlppb.PrivacyMetric{
		Type: &dlppb.PrivacyMetric_KAnonymityConfig_{
			KAnonymityConfig: kAnonymityConfig,
		},
	}

	// Specify the bigquery table to store the findings.
	// The "test_results" table in the given BigQuery dataset will be created if it doesn't
	// already exist.
	outputbigQueryTable := &dlppb.BigQueryTable{
		ProjectId: projectID,
		DatasetId: datasetId,
		TableId:   tableId,
	}

	// Create action to publish job status notifications to BigQuery table.
	outputStorageConfig := &dlppb.OutputStorageConfig{
		Type: &dlppb.OutputStorageConfig_Table{
			Table: outputbigQueryTable,
		},
	}

	findings := &dlppb.Action_SaveFindings{
		OutputConfig: outputStorageConfig,
	}

	action := &dlppb.Action{
		Action: &dlppb.Action_SaveFindings_{
			SaveFindings: findings,
		},
	}

	// Configure the risk analysis job to perform
	riskAnalysisJobConfig := &dlppb.RiskAnalysisJobConfig{
		PrivacyMetric: privacyMetric,
		SourceTable:   bigQueryTable,
		Actions: []*dlppb.Action{
			action,
		},
	}

	// Build the request to be sent by the client
	req := &dlppb.CreateDlpJobRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		Job: &dlppb.CreateDlpJobRequest_RiskJob{
			RiskJob: riskAnalysisJobConfig,
		},
	}

	// Send the request to the API using the client
	dlpJob, err := client.CreateDlpJob(ctx, req)
	if err != nil {
		return err
	}
	fmt.Fprintf(w, "Created job: %v\n", dlpJob.GetName())

	// Build a request to get the completed job
	getDlpJobReq := &dlppb.GetDlpJobRequest{
		Name: dlpJob.Name,
	}

	timeout := 15 * time.Minute
	startTime := time.Now()

	var completedJob *dlppb.DlpJob

	// Wait for job completion
	for time.Since(startTime) <= timeout {
		completedJob, err = client.GetDlpJob(ctx, getDlpJobReq)
		if err != nil {
			return err
		}

		if completedJob.GetState() == dlppb.DlpJob_DONE {
			break
		}

		time.Sleep(30 * time.Second)

	}

	if completedJob.GetState() != dlppb.DlpJob_DONE {
		fmt.Println("Job did not complete within 15 minutes.")
	}

	// Retrieve completed job status
	fmt.Fprintf(w, "Job status: %v", completedJob.State)
	fmt.Fprintf(w, "Job name: %v", dlpJob.Name)

	// Get the result and parse through and process the information
	kanonymityResult := completedJob.GetRiskDetails().GetKAnonymityResult()

	for _, result := range kanonymityResult.GetEquivalenceClassHistogramBuckets() {
		fmt.Fprintf(w, "Bucket size range: [%d, %d]\n", result.GetEquivalenceClassSizeLowerBound(), result.GetEquivalenceClassSizeLowerBound())

		for _, bucket := range result.GetBucketValues() {
			quasiIdValues := []string{}
			for _, v := range bucket.GetQuasiIdsValues() {
				quasiIdValues = append(quasiIdValues, v.GetStringValue())
			}
			fmt.Fprintf(w, "\tQuasi-ID values: %s", strings.Join(quasiIdValues, ","))
			fmt.Fprintf(w, "\tClass size: %d", bucket.EquivalenceClassSize)
		}
	}

	return nil

}

Java

๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ์˜ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Sensitive Data Protection์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.Action;
import com.google.privacy.dlp.v2.Action.SaveFindings;
import com.google.privacy.dlp.v2.AnalyzeDataSourceRiskDetails.KAnonymityResult;
import com.google.privacy.dlp.v2.AnalyzeDataSourceRiskDetails.KAnonymityResult.KAnonymityEquivalenceClass;
import com.google.privacy.dlp.v2.AnalyzeDataSourceRiskDetails.KAnonymityResult.KAnonymityHistogramBucket;
import com.google.privacy.dlp.v2.BigQueryTable;
import com.google.privacy.dlp.v2.CreateDlpJobRequest;
import com.google.privacy.dlp.v2.DlpJob;
import com.google.privacy.dlp.v2.EntityId;
import com.google.privacy.dlp.v2.FieldId;
import com.google.privacy.dlp.v2.GetDlpJobRequest;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.OutputStorageConfig;
import com.google.privacy.dlp.v2.PrivacyMetric;
import com.google.privacy.dlp.v2.PrivacyMetric.KAnonymityConfig;
import com.google.privacy.dlp.v2.RiskAnalysisJobConfig;
import com.google.privacy.dlp.v2.Value;
import java.io.IOException;
import java.time.Duration;
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;

@SuppressWarnings("checkstyle:AbbreviationAsWordInName")
public class RiskAnalysisKAnonymityWithEntityId {

  public static void main(String[] args) throws IOException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    // The Google Cloud project id to use as a parent resource.
    String projectId = "your-project-id";
    // The BigQuery dataset id to be used and the reference table name to be inspected.
    String datasetId = "your-bigquery-dataset-id";
    String tableId = "your-bigquery-table-id";
    calculateKAnonymityWithEntityId(projectId, datasetId, tableId);
  }

  // Uses the Data Loss Prevention API to compute the k-anonymity of a column set in a Google
  // BigQuery table.
  public static void calculateKAnonymityWithEntityId(
      String projectId, String datasetId, String tableId) throws IOException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {

      // Specify the BigQuery table to analyze
      BigQueryTable bigQueryTable =
          BigQueryTable.newBuilder()
              .setProjectId(projectId)
              .setDatasetId(datasetId)
              .setTableId(tableId)
              .build();

      // These values represent the column names of quasi-identifiers to analyze
      List<String> quasiIds = Arrays.asList("Age", "Mystery");

      // Create a list of FieldId objects based on the provided list of column names.
      List<FieldId> quasiIdFields =
          quasiIds.stream()
              .map(columnName -> FieldId.newBuilder().setName(columnName).build())
              .collect(Collectors.toList());

      // Specify the unique identifier in the source table for the k-anonymity analysis.
      FieldId uniqueIdField = FieldId.newBuilder().setName("Name").build();
      EntityId entityId = EntityId.newBuilder().setField(uniqueIdField).build();
      KAnonymityConfig kanonymityConfig = KAnonymityConfig.newBuilder()
              .addAllQuasiIds(quasiIdFields)
              .setEntityId(entityId)
              .build();

      // Configure the privacy metric to compute for re-identification risk analysis.
      PrivacyMetric privacyMetric =
          PrivacyMetric.newBuilder().setKAnonymityConfig(kanonymityConfig).build();

      // Specify the bigquery table to store the findings.
      // The "test_results" table in the given BigQuery dataset will be created if it doesn't
      // already exist.
      BigQueryTable outputbigQueryTable =
          BigQueryTable.newBuilder()
              .setProjectId(projectId)
              .setDatasetId(datasetId)
              .setTableId("test_results")
              .build();

      // Create action to publish job status notifications to BigQuery table.
      OutputStorageConfig outputStorageConfig =
          OutputStorageConfig.newBuilder().setTable(outputbigQueryTable).build();
      SaveFindings findings =
          SaveFindings.newBuilder().setOutputConfig(outputStorageConfig).build();
      Action action = Action.newBuilder().setSaveFindings(findings).build();

      // Configure the risk analysis job to perform
      RiskAnalysisJobConfig riskAnalysisJobConfig =
          RiskAnalysisJobConfig.newBuilder()
              .setSourceTable(bigQueryTable)
              .setPrivacyMetric(privacyMetric)
              .addActions(action)
              .build();

      // Build the request to be sent by the client
      CreateDlpJobRequest createDlpJobRequest =
          CreateDlpJobRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setRiskJob(riskAnalysisJobConfig)
              .build();

      // Send the request to the API using the client
      DlpJob dlpJob = dlpServiceClient.createDlpJob(createDlpJobRequest);

      // Build a request to get the completed job
      GetDlpJobRequest getDlpJobRequest =
          GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build();

      DlpJob completedJob = null;
      // Wait for job completion
      try {
        Duration timeout = Duration.ofMinutes(15);
        long startTime = System.currentTimeMillis();
        do {
          completedJob = dlpServiceClient.getDlpJob(getDlpJobRequest);
          TimeUnit.SECONDS.sleep(30);
        } while (completedJob.getState() != DlpJob.JobState.DONE
            && System.currentTimeMillis() - startTime <= timeout.toMillis());
      } catch (InterruptedException e) {
        System.out.println("Job did not complete within 15 minutes.");
      }

      // Retrieve completed job status
      System.out.println("Job status: " + completedJob.getState());
      System.out.println("Job name: " + dlpJob.getName());

      // Get the result and parse through and process the information
      KAnonymityResult kanonymityResult = completedJob.getRiskDetails().getKAnonymityResult();
      for (KAnonymityHistogramBucket result :
          kanonymityResult.getEquivalenceClassHistogramBucketsList()) {
        System.out.printf(
            "Bucket size range: [%d, %d]\n",
            result.getEquivalenceClassSizeLowerBound(), result.getEquivalenceClassSizeUpperBound());

        for (KAnonymityEquivalenceClass bucket : result.getBucketValuesList()) {
          List<String> quasiIdValues =
              bucket.getQuasiIdsValuesList().stream()
                  .map(Value::toString)
                  .collect(Collectors.toList());

          System.out.println("\tQuasi-ID values: " + String.join(", ", quasiIdValues));
          System.out.println("\tClass size: " + bucket.getEquivalenceClassSize());
        }
      }
    }
  }
}

Node.js

๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ์˜ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Sensitive Data Protection์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under.
// const projectId = "your-project-id";

// The ID of the dataset to inspect, e.g. 'my_dataset'
// const datasetId = 'my_dataset';

// The ID of the table to inspect, e.g. 'my_table'
// const sourceTableId = 'my_source_table';

// The ID of the table where outputs are stored
// const outputTableId = 'my_output_table';

async function kAnonymityWithEntityIds() {
  // Specify the BigQuery table to analyze.
  const sourceTable = {
    projectId: projectId,
    datasetId: datasetId,
    tableId: sourceTableId,
  };

  // Specify the unique identifier in the source table for the k-anonymity analysis.
  const uniqueIdField = {name: 'Name'};

  // These values represent the column names of quasi-identifiers to analyze
  const quasiIds = [{name: 'Age'}, {name: 'Mystery'}];

  // Configure the privacy metric to compute for re-identification risk analysis.
  const privacyMetric = {
    kAnonymityConfig: {
      entityId: {
        field: uniqueIdField,
      },
      quasiIds: quasiIds,
    },
  };
  // Create action to publish job status notifications to BigQuery table.
  const action = [
    {
      saveFindings: {
        outputConfig: {
          table: {
            projectId: projectId,
            datasetId: datasetId,
            tableId: outputTableId,
          },
        },
      },
    },
  ];

  // Configure the risk analysis job to perform.
  const riskAnalysisJob = {
    sourceTable: sourceTable,
    privacyMetric: privacyMetric,
    actions: action,
  };
  // Combine configurations into a request for the service.
  const createDlpJobRequest = {
    parent: `projects/${projectId}/locations/global`,
    riskJob: riskAnalysisJob,
  };

  // Send the request and receive response from the service
  const [createdDlpJob] = await dlp.createDlpJob(createDlpJobRequest);
  const jobName = createdDlpJob.name;

  // Waiting for a maximum of 15 minutes for the job to get complete.
  let job;
  let numOfAttempts = 30;
  while (numOfAttempts > 0) {
    // Fetch DLP Job status
    [job] = await dlp.getDlpJob({name: jobName});

    // Check if the job has completed.
    if (job.state === 'DONE') {
      break;
    }
    if (job.state === 'FAILED') {
      console.log('Job Failed, Please check the configuration.');
      return;
    }
    // Sleep for a short duration before checking the job status again.
    await new Promise(resolve => {
      setTimeout(() => resolve(), 30000);
    });
    numOfAttempts -= 1;
  }

  // Create helper function for unpacking values
  const getValue = obj => obj[Object.keys(obj)[0]];

  // Print out the results.
  const histogramBuckets =
    job.riskDetails.kAnonymityResult.equivalenceClassHistogramBuckets;

  histogramBuckets.forEach((histogramBucket, histogramBucketIdx) => {
    console.log(`Bucket ${histogramBucketIdx}:`);
    console.log(
      `  Bucket size range: [${histogramBucket.equivalenceClassSizeLowerBound}, ${histogramBucket.equivalenceClassSizeUpperBound}]`
    );

    histogramBucket.bucketValues.forEach(valueBucket => {
      const quasiIdValues = valueBucket.quasiIdsValues
        .map(getValue)
        .join(', ');
      console.log(`  Quasi-ID values: {${quasiIdValues}}`);
      console.log(`  Class size: ${valueBucket.equivalenceClassSize}`);
    });
  });
}
await kAnonymityWithEntityIds();

PHP

๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ์˜ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Sensitive Data Protection์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.

use Google\Cloud\Dlp\V2\Action;
use Google\Cloud\Dlp\V2\Action\SaveFindings;
use Google\Cloud\Dlp\V2\BigQueryTable;
use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\CreateDlpJobRequest;
use Google\Cloud\Dlp\V2\DlpJob\JobState;
use Google\Cloud\Dlp\V2\EntityId;
use Google\Cloud\Dlp\V2\FieldId;
use Google\Cloud\Dlp\V2\GetDlpJobRequest;
use Google\Cloud\Dlp\V2\OutputStorageConfig;
use Google\Cloud\Dlp\V2\PrivacyMetric;
use Google\Cloud\Dlp\V2\PrivacyMetric\KAnonymityConfig;
use Google\Cloud\Dlp\V2\RiskAnalysisJobConfig;

/**
 * Computes the k-anonymity of a column set in a Google BigQuery table with entity id.
 *
 * @param string    $callingProjectId  The project ID to run the API call under.
 * @param string    $datasetId         The ID of the dataset to inspect.
 * @param string    $tableId           The ID of the table to inspect.
 * @param string[]  $quasiIdNames      Array columns that form a composite key (quasi-identifiers).
 */

function k_anonymity_with_entity_id(
    // TODO(developer): Replace sample parameters before running the code.
    string $callingProjectId,
    string $datasetId,
    string $tableId,
    array  $quasiIdNames
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    // Specify the BigQuery table to analyze.
    $bigqueryTable = (new BigQueryTable())
        ->setProjectId($callingProjectId)
        ->setDatasetId($datasetId)
        ->setTableId($tableId);

    // Create a list of FieldId objects based on the provided list of column names.
    $quasiIds = array_map(
        function ($id) {
            return (new FieldId())
                ->setName($id);
        },
        $quasiIdNames
    );

    // Specify the unique identifier in the source table for the k-anonymity analysis.
    $statsConfig = (new KAnonymityConfig())
        ->setEntityId((new EntityId())
            ->setField((new FieldId())
                ->setName('Name')))
        ->setQuasiIds($quasiIds);

    // Configure the privacy metric to compute for re-identification risk analysis.
    $privacyMetric = (new PrivacyMetric())
        ->setKAnonymityConfig($statsConfig);

    // Specify the bigquery table to store the findings.
    // The "test_results" table in the given BigQuery dataset will be created if it doesn't
    // already exist.
    $outBigqueryTable = (new BigQueryTable())
        ->setProjectId($callingProjectId)
        ->setDatasetId($datasetId)
        ->setTableId('test_results');

    $outputStorageConfig = (new OutputStorageConfig())
        ->setTable($outBigqueryTable);

    $findings = (new SaveFindings())
        ->setOutputConfig($outputStorageConfig);

    $action = (new Action())
        ->setSaveFindings($findings);

    // Construct risk analysis job config to run.
    $riskJob = (new RiskAnalysisJobConfig())
        ->setPrivacyMetric($privacyMetric)
        ->setSourceTable($bigqueryTable)
        ->setActions([$action]);

    // Submit request.
    $parent = "projects/$callingProjectId/locations/global";
    $createDlpJobRequest = (new CreateDlpJobRequest())
        ->setParent($parent)
        ->setRiskJob($riskJob);
    $job = $dlp->createDlpJob($createDlpJobRequest);

    $numOfAttempts = 10;
    do {
        printf('Waiting for job to complete' . PHP_EOL);
        sleep(10);
        $getDlpJobRequest = (new GetDlpJobRequest())
            ->setName($job->getName());
        $job = $dlp->getDlpJob($getDlpJobRequest);
        if ($job->getState() == JobState::DONE) {
            break;
        }
        $numOfAttempts--;
    } while ($numOfAttempts > 0);

    // Print finding counts
    printf('Job %s status: %s' . PHP_EOL, $job->getName(), JobState::name($job->getState()));
    switch ($job->getState()) {
        case JobState::DONE:
            $histBuckets = $job->getRiskDetails()->getKAnonymityResult()->getEquivalenceClassHistogramBuckets();

            foreach ($histBuckets as $bucketIndex => $histBucket) {
                // Print bucket stats.
                printf('Bucket %s:' . PHP_EOL, $bucketIndex);
                printf(
                    '  Bucket size range: [%s, %s]' . PHP_EOL,
                    $histBucket->getEquivalenceClassSizeLowerBound(),
                    $histBucket->getEquivalenceClassSizeUpperBound()
                );

                // Print bucket values.
                foreach ($histBucket->getBucketValues() as $percent => $valueBucket) {
                    // Pretty-print quasi-ID values.
                    printf('  Quasi-ID values:' . PHP_EOL);
                    foreach ($valueBucket->getQuasiIdsValues() as $index => $value) {
                        print('    ' . $value->serializeToJsonString() . PHP_EOL);
                    }
                    printf(
                        '  Class size: %s' . PHP_EOL,
                        $valueBucket->getEquivalenceClassSize()
                    );
                }
            }

            break;
        case JobState::FAILED:
            printf('Job %s had errors:' . PHP_EOL, $job->getName());
            $errors = $job->getErrors();
            foreach ($errors as $error) {
                var_dump($error->getDetails());
            }
            break;
        case JobState::PENDING:
            printf('Job has not completed. Consider a longer timeout or an asynchronous execution model' . PHP_EOL);
            break;
        default:
            printf('Unexpected job state. Most likely, the job is either running or has not yet started.');
    }
}

Python

๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ์˜ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฏผ๊ฐํ•œ ์ •๋ณด ๋ณดํ˜ธ ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Sensitive Data Protection์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.

import time
from typing import List

import google.cloud.dlp_v2
from google.cloud.dlp_v2 import types


def k_anonymity_with_entity_id(
    project: str,
    source_table_project_id: str,
    source_dataset_id: str,
    source_table_id: str,
    entity_id: str,
    quasi_ids: List[str],
    output_table_project_id: str,
    output_dataset_id: str,
    output_table_id: str,
) -> None:
    """Uses the Data Loss Prevention API to compute the k-anonymity using entity_id
        of a column set in a Google BigQuery table.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        source_table_project_id: The Google Cloud project id where the BigQuery table
            is stored.
        source_dataset_id: The id of the dataset to inspect.
        source_table_id: The id of the table to inspect.
        entity_id: The column name of the table that enables accurately determining k-anonymity
         in the common scenario wherein several rows of dataset correspond to the same sensitive
         information.
        quasi_ids: A set of columns that form a composite key.
        output_table_project_id: The Google Cloud project id where the output BigQuery table
            is stored.
        output_dataset_id: The id of the output BigQuery dataset.
        output_table_id: The id of the output BigQuery table.
    """

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Location info of the source BigQuery table.
    source_table = {
        "project_id": source_table_project_id,
        "dataset_id": source_dataset_id,
        "table_id": source_table_id,
    }

    # Specify the bigquery table to store the findings.
    # The output_table_id in the given BigQuery dataset will be created if it doesn't
    # already exist.
    dest_table = {
        "project_id": output_table_project_id,
        "dataset_id": output_dataset_id,
        "table_id": output_table_id,
    }

    # Convert quasi id list to Protobuf type
    def map_fields(field: str) -> dict:
        return {"name": field}

    #  Configure column names of quasi-identifiers to analyze
    quasi_ids = map(map_fields, quasi_ids)

    # Tell the API where to send a notification when the job is complete.
    actions = [{"save_findings": {"output_config": {"table": dest_table}}}]

    # Configure the privacy metric to compute for re-identification risk analysis.
    # Specify the unique identifier in the source table for the k-anonymity analysis.
    privacy_metric = {
        "k_anonymity_config": {
            "entity_id": {"field": {"name": entity_id}},
            "quasi_ids": quasi_ids,
        }
    }

    # Configure risk analysis job.
    risk_job = {
        "privacy_metric": privacy_metric,
        "source_table": source_table,
        "actions": actions,
    }

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Call API to start risk analysis job.
    response = dlp.create_dlp_job(
        request={
            "parent": parent,
            "risk_job": risk_job,
        }
    )
    job_name = response.name
    print(f"Inspection Job started : {job_name}")

    # Waiting for a maximum of 15 minutes for the job to be completed.
    job = dlp.get_dlp_job(request={"name": job_name})
    no_of_attempts = 30
    while no_of_attempts > 0:
        # Check if the job has completed
        if job.state == google.cloud.dlp_v2.DlpJob.JobState.DONE:
            break
        if job.state == google.cloud.dlp_v2.DlpJob.JobState.FAILED:
            print("Job Failed, Please check the configuration.")
            return

        # Sleep for a short duration before checking the job status again
        time.sleep(30)
        no_of_attempts -= 1

        # Get the DLP job status
        job = dlp.get_dlp_job(request={"name": job_name})

    if job.state != google.cloud.dlp_v2.DlpJob.JobState.DONE:
        print("Job did not complete within 15 minutes.")
        return

    # Create helper function for unpacking values
    def get_values(obj: types.Value) -> str:
        return str(obj.string_value)

    # Print out the results.
    print(f"Job name: {job.name}")
    histogram_buckets = (
        job.risk_details.k_anonymity_result.equivalence_class_histogram_buckets
    )
    # Print bucket stats
    for i, bucket in enumerate(histogram_buckets):
        print(f"Bucket {i}:")
        if bucket.equivalence_class_size_lower_bound:
            print(
                f"Bucket size range: [{bucket.equivalence_class_size_lower_bound}, "
                f"{bucket.equivalence_class_size_upper_bound}]"
            )
            for value_bucket in bucket.bucket_values:
                print(
                    f"Quasi-ID values: {get_values(value_bucket.quasi_ids_values[0])}"
                )
                print(f"Class size: {value_bucket.equivalence_class_size}")
        else:
            print("No findings.")

๋‹ค์Œ ๋‹จ๊ณ„

  • ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ l-๋‹ค์–‘์„ฑ ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ• ์•Œ์•„๋ณด๊ธฐ
  • ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ k-๋งต ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ• ์•Œ์•„๋ณด๊ธฐ
  • ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ฮด-์กด์žฌ ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ• ์•Œ์•„๋ณด๊ธฐ