ํ•™์Šต ๋ถ„๋ฅ˜ ๋ฐ ํšŒ๊ท€ ๋ชจ๋ธ์˜ ๋ฐ์ดํ„ฐ ์„ธํŠธ ๋งŒ๋“ค๊ธฐ

์ด ํŽ˜์ด์ง€์—์„œ๋Š” ํ…Œ์ด๋ธ” ํ˜•์‹ ๋ฐ์ดํ„ฐ์—์„œ Vertex AI ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋งŒ๋“ค์–ด ํ•™์Šต ๋ถ„๋ฅ˜ ๋ฐ ํšŒ๊ท€ ๋ชจ๋ธ์„ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. Google Cloud ์ฝ˜์†” ๋˜๋Š” Vertex AI API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‹œ์ž‘ํ•˜๊ธฐ ์ „์—

ํ…Œ์ด๋ธ” ํ˜•์‹ ๋ฐ์ดํ„ฐ์—์„œ Vertex AI ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋งŒ๋“ค๋ ค๋ฉด ๋จผ์ € ๋ฐ์ดํ„ฐ๋ฅผ ์ค€๋น„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋‹ค์Œ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.

๋นˆ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋งŒ๋“ค๊ณ  ์ค€๋น„๋œ ๋ฐ์ดํ„ฐ ์—ฐ๊ฒฐ

๋ถ„๋ฅ˜ ๋˜๋Š” ํšŒ๊ท€์šฉ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ๋งŒ๋“ค๋ ค๋ฉด ๋จผ์ € ํ•™์Šต์‹œํ‚ฌ ๋ฐ์ดํ„ฐ์˜ ๋Œ€ํ‘œ ์ปฌ๋ ‰์…˜์ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. Google Cloud ์ฝ˜์†” ๋˜๋Š” API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ค€๋น„๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๋ฅผ ์—ฐ๊ฒฐํ•œ ํ›„ ์ ์ ˆํžˆ ์ˆ˜์ •ํ•˜์—ฌ ๋ชจ๋ธ ํ•™์Šต์„ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Google Cloud ์ฝ˜์†”

  1. Google Cloud ์ฝ˜์†”์˜ Vertex AI ์„น์…˜์—์„œ ๋ฐ์ดํ„ฐ ์„ธํŠธ ํŽ˜์ด์ง€๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.

    ๋ฐ์ดํ„ฐ ์„ธํŠธ ํŽ˜์ด์ง€๋กœ ์ด๋™

  2. ๋งŒ๋“ค๊ธฐ๋ฅผ ํด๋ฆญํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ ๋งŒ๋“ค๊ธฐ ์„ธ๋ถ€์ •๋ณด ํŽ˜์ด์ง€๋ฅผ ์—ฝ๋‹ˆ๋‹ค.
  3. ๋ฐ์ดํ„ฐ ์„ธํŠธ ์ด๋ฆ„ ํ•„๋“œ๋ฅผ ์ˆ˜์ •ํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์„ค๋ช…ํ•˜๋Š” ํ‘œ์‹œ ์ด๋ฆ„์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
  4. ํ‘œ ํ˜•์‹ ํƒญ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
  5. ํšŒ๊ท€/๋ถ„๋ฅ˜ ๋ชฉํ‘œ๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
  6. ๋ฆฌ์ „ ๋“œ๋กญ๋‹ค์šด ๋ชฉ๋ก์—์„œ ๋ฆฌ์ „์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
  7. ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ๊ณ ๊ฐ ๊ด€๋ฆฌ ์•”ํ˜ธํ™” ํ‚ค(CMEK)๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ๊ณ ๊ธ‰ ์˜ต์…˜์„ ์—ด๊ณ  ํ‚ค๋ฅผ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค. (๋ฏธ๋ฆฌ๋ณด๊ธฐ)
  8. ๋งŒ๋“ค๊ธฐ๋ฅผ ํด๋ฆญํ•˜์—ฌ ๋นˆ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋งŒ๋“ค๊ณ  ์†Œ์Šค ํƒญ์œผ๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.
  9. ๋ฐ์ดํ„ฐ ์†Œ์Šค์— ๋”ฐ๋ผ ๋‹ค์Œ ์˜ต์…˜ ์ค‘ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

    ์ปดํ“จํ„ฐ์˜ CSV ํŒŒ์ผ

    1. ์ปดํ“จํ„ฐ์—์„œ CSV ํŒŒ์ผ ์—…๋กœ๋“œ๋ฅผ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.
    2. ํŒŒ์ผ ์„ ํƒ์„ ํด๋ฆญํ•˜๊ณ  Cloud Storage ๋ฒ„ํ‚ท์— ์—…๋กœ๋“œํ•  ๋ชจ๋“  ๋กœ์ปฌ ํŒŒ์ผ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
    3. Cloud Storage ๊ฒฝ๋กœ ์„ ํƒ ์„น์…˜์—์„œ Cloud Storage ๋ฒ„ํ‚ท ๊ฒฝ๋กœ๋ฅผ ์ž…๋ ฅํ•˜๊ฑฐ๋‚˜ ํƒ์ƒ‰์„ ํด๋ฆญํ•˜์—ฌ ๋ฒ„ํ‚ท ์œ„์น˜๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

    Cloud Storage์˜ CSV ํŒŒ์ผ

    1. Cloud Storage์—์„œ CSV ํŒŒ์ผ ์„ ํƒ์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.
    2. Cloud Storage์—์„œ CSV ํŒŒ์ผ ์„ ํƒ ์„น์…˜์—์„œ Cloud Storage ๋ฒ„ํ‚ท ๊ฒฝ๋กœ๋ฅผ ์ž…๋ ฅํ•˜๊ฑฐ๋‚˜ ํƒ์ƒ‰์„ ํด๋ฆญํ•˜์—ฌ CSV ํŒŒ์ผ์˜ ์œ„์น˜๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

    BigQuery์˜ ํ…Œ์ด๋ธ” ๋˜๋Š” ๋ทฐ

    1. BigQuery์—์„œ ํ…Œ์ด๋ธ” ๋˜๋Š” ๋ทฐ ์„ ํƒ์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.
    2. ์ž…๋ ฅ ํŒŒ์ผ์˜ ํ”„๋กœ์ ํŠธ, ๋ฐ์ดํ„ฐ ์„ธํŠธ, ํ…Œ์ด๋ธ” ID๋ฅผ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  10. ๊ณ„์†์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

    ๋ฐ์ดํ„ฐ ์†Œ์Šค๊ฐ€ ๋ฐ์ดํ„ฐ ์„ธํŠธ์™€ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

API

๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋งŒ๋“ค๋ฉด ํ•ด๋‹น ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋ฐ์ดํ„ฐ ์†Œ์Šค์™€๋„ ์—ฐ๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐ ํ•„์š”ํ•œ ์ฝ”๋“œ๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ Cloud Storage ๋˜๋Š” BigQuery์— ์žˆ๋Š”์ง€์— ๋”ฐ๋ผ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์†Œ์Šค๊ฐ€ ๋‹ค๋ฅธ ํ”„๋กœ์ ํŠธ์— ์žˆ๋Š” ๊ฒฝ์šฐ ํ•„์ˆ˜ ๊ถŒํ•œ ์„ค์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

Cloud Storage์—์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ ์„ธํŠธ ๋งŒ๋“ค๊ธฐ

REST

datasets.create ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

์š”์ฒญ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์ „์— ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

  • LOCATION: ๋ฐ์ดํ„ฐ ์„ธํŠธ๊ฐ€ ์ €์žฅ๋  ๋ฆฌ์ „. ๋ฐ์ดํ„ฐ ์„ธํŠธ ๋ฆฌ์†Œ์Šค๋ฅผ ์ง€์›ํ•˜๋Š” ๋ฆฌ์ „์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด us-central1์ž…๋‹ˆ๋‹ค.
  • PROJECT: ํ”„๋กœ์ ํŠธ ID
  • DATASET_NAME: ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ํ‘œ์‹œ ์ด๋ฆ„์ž…๋‹ˆ๋‹ค.
  • METADATA_SCHEMA_URI: ๋ชฉํ‘œ์˜ ์Šคํ‚ค๋งˆ ํŒŒ์ผ์— ๋Œ€ํ•œ URI gs://google-cloud-aiplatform/schema/dataset/metadata/tabular_1.0.0.yaml
  • URI: ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ํฌํ•จ๋œ Cloud Storage ๋ฒ„ํ‚ท์˜ ๊ฒฝ๋กœ(URI). ๋‘ ๊ฐœ ์ด์ƒ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ URI์˜ ํ˜•์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
    gs://GCSprojectId/bucketName/fileName
    
  • PROJECT_NUMBER: ํ”„๋กœ์ ํŠธ์˜ ์ž๋™์œผ๋กœ ์ƒ์„ฑ๋œ ํ”„๋กœ์ ํŠธ ๋ฒˆํ˜ธ

HTTP ๋ฉ”์„œ๋“œ ๋ฐ URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/datasets

JSON ์š”์ฒญ ๋ณธ๋ฌธ:

{
  "display_name": "DATASET_NAME",
  "metadata_schema_uri": "METADATA_SCHEMA_URI",
  "metadata": {
    "input_config": {
      "gcs_source": {
        "uri": [URI1, URI2, ...]
      }
    }
  }
}

์š”์ฒญ์„ ๋ณด๋‚ด๋ ค๋ฉด ๋‹ค์Œ ์˜ต์…˜ ์ค‘ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

curl

์š”์ฒญ ๋ณธ๋ฌธ์„ request.json ํŒŒ์ผ์— ์ €์žฅํ•˜๊ณ  ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/datasets"

PowerShell

์š”์ฒญ ๋ณธ๋ฌธ์„ request.json ํŒŒ์ผ์— ์ €์žฅํ•˜๊ณ  ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/datasets" | Select-Object -Expand Content

๋‹ค์Œ๊ณผ ๋น„์Šทํ•œ JSON ์‘๋‹ต์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateDatasetOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-07-07T21:27:35.964882Z",
      "updateTime": "2020-07-07T21:27:35.964882Z"
    }
}

Java

์ด ์ƒ˜ํ”Œ์„ ์‚ฌ์šฉํ•ด ๋ณด๊ธฐ ์ „์— Vertex AI ๋น ๋ฅธ ์‹œ์ž‘: ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์‚ฌ์šฉ์˜ Java ์„ค์ • ์•ˆ๋‚ด๋ฅผ ๋”ฐ๋ฅด์„ธ์š”. ์ž์„ธํ•œ ๋‚ด์šฉ์€ Vertex AI Java API ์ฐธ๊ณ  ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Vertex AI์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.


import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.aiplatform.v1.CreateDatasetOperationMetadata;
import com.google.cloud.aiplatform.v1.Dataset;
import com.google.cloud.aiplatform.v1.DatasetServiceClient;
import com.google.cloud.aiplatform.v1.DatasetServiceSettings;
import com.google.cloud.aiplatform.v1.LocationName;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateDatasetTabularGcsSample {

  public static void main(String[] args)
      throws InterruptedException, ExecutionException, TimeoutException, IOException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String datasetDisplayName = "YOUR_DATASET_DISPLAY_NAME";
    String gcsSourceUri = "gs://YOUR_GCS_SOURCE_BUCKET/path_to_your_gcs_table/file.csv";
    ;
    createDatasetTableGcs(project, datasetDisplayName, gcsSourceUri);
  }

  static void createDatasetTableGcs(String project, String datasetDisplayName, String gcsSourceUri)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    DatasetServiceSettings settings =
        DatasetServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DatasetServiceClient datasetServiceClient = DatasetServiceClient.create(settings)) {
      String location = "us-central1";
      String metadataSchemaUri =
          "gs://google-cloud-aiplatform/schema/dataset/metadata/tables_1.0.0.yaml";
      LocationName locationName = LocationName.of(project, location);

      String jsonString =
          "{\"input_config\": {\"gcs_source\": {\"uri\": [\"" + gcsSourceUri + "\"]}}}";
      Value.Builder metaData = Value.newBuilder();
      JsonFormat.parser().merge(jsonString, metaData);

      Dataset dataset =
          Dataset.newBuilder()
              .setDisplayName(datasetDisplayName)
              .setMetadataSchemaUri(metadataSchemaUri)
              .setMetadata(metaData)
              .build();

      OperationFuture<Dataset, CreateDatasetOperationMetadata> datasetFuture =
          datasetServiceClient.createDatasetAsync(locationName, dataset);
      System.out.format("Operation name: %s\n", datasetFuture.getInitialFuture().get().getName());
      System.out.println("Waiting for operation to finish...");
      Dataset datasetResponse = datasetFuture.get(300, TimeUnit.SECONDS);

      System.out.println("Create Dataset Table GCS sample");
      System.out.format("Name: %s\n", datasetResponse.getName());
      System.out.format("Display Name: %s\n", datasetResponse.getDisplayName());
      System.out.format("Metadata Schema Uri: %s\n", datasetResponse.getMetadataSchemaUri());
      System.out.format("Metadata: %s\n", datasetResponse.getMetadata());
    }
  }
}

Node.js

์ด ์ƒ˜ํ”Œ์„ ์‚ฌ์šฉํ•ด ๋ณด๊ธฐ ์ „์— Vertex AI ๋น ๋ฅธ ์‹œ์ž‘: ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์‚ฌ์šฉ์˜ Node.js ์„ค์ • ์•ˆ๋‚ด๋ฅผ ๋”ฐ๋ฅด์„ธ์š”. ์ž์„ธํ•œ ๋‚ด์šฉ์€ Vertex AI Node.js API ์ฐธ๊ณ  ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Vertex AI์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const datasetDisplayName = 'YOUR_DATASET_DISPLAY_NAME';
// const gcsSourceUri = 'YOUR_GCS_SOURCE_URI';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

// Imports the Google Cloud Dataset Service Client library
const {DatasetServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const datasetServiceClient = new DatasetServiceClient(clientOptions);

async function createDatasetTabularGcs() {
  // Configure the parent resource
  const parent = `projects/${project}/locations/${location}`;
  const metadata = {
    structValue: {
      fields: {
        inputConfig: {
          structValue: {
            fields: {
              gcsSource: {
                structValue: {
                  fields: {
                    uri: {
                      listValue: {
                        values: [{stringValue: gcsSourceUri}],
                      },
                    },
                  },
                },
              },
            },
          },
        },
      },
    },
  };
  // Configure the dataset resource
  const dataset = {
    displayName: datasetDisplayName,
    metadataSchemaUri:
      'gs://google-cloud-aiplatform/schema/dataset/metadata/tabular_1.0.0.yaml',
    metadata: metadata,
  };
  const request = {
    parent,
    dataset,
  };

  // Create dataset request
  const [response] = await datasetServiceClient.createDataset(request);
  console.log(`Long running operation : ${response.name}`);

  // Wait for operation to complete
  await response.promise();
  const result = response.result;

  console.log('Create dataset tabular gcs response');
  console.log(`\tName : ${result.name}`);
  console.log(`\tDisplay name : ${result.displayName}`);
  console.log(`\tMetadata schema uri : ${result.metadataSchemaUri}`);
  console.log(`\tMetadata : ${JSON.stringify(result.metadata)}`);
}
createDatasetTabularGcs();

Python

Vertex AI SDK for Python์„ ์„ค์น˜ํ•˜๊ฑฐ๋‚˜ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐฉ๋ฒ•์€ Vertex AI SDK for Python ์„ค์น˜๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”. ์ž์„ธํ•œ ๋‚ด์šฉ์€ Python API ์ฐธ๊ณ  ๋ฌธ์„œ๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

def create_and_import_dataset_tabular_gcs_sample(
    display_name: str,
    project: str,
    location: str,
    gcs_source: Union[str, List[str]],
):

    aiplatform.init(project=project, location=location)

    dataset = aiplatform.TabularDataset.create(
        display_name=display_name,
        gcs_source=gcs_source,
    )

    dataset.wait()

    print(f'\tDataset: "{dataset.display_name}"')
    print(f'\tname: "{dataset.resource_name}"')

BigQuery์—์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š” ๋ฐ์ดํ„ฐ ์„ธํŠธ ๋งŒ๋“ค๊ธฐ

REST

datasets.create ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

์š”์ฒญ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์ „์— ๋‹ค์Œ์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค.

  • LOCATION: ๋ฐ์ดํ„ฐ ์„ธํŠธ๊ฐ€ ์ €์žฅ๋  ๋ฆฌ์ „. ๋ฐ์ดํ„ฐ ์„ธํŠธ ๋ฆฌ์†Œ์Šค๋ฅผ ์ง€์›ํ•˜๋Š” ๋ฆฌ์ „์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด us-central1์ž…๋‹ˆ๋‹ค.
  • PROJECT: ํ”„๋กœ์ ํŠธ ID
  • DATASET_NAME: ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ํ‘œ์‹œ ์ด๋ฆ„
  • METADATA_SCHEMA_URI: ๋ชฉํ‘œ์˜ ์Šคํ‚ค๋งˆ ํŒŒ์ผ์— ๋Œ€ํ•œ URI gs://google-cloud-aiplatform/schema/dataset/metadata/tabular_1.0.0.yaml
  • URI: ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ํฌํ•จ๋œ BigQuery ํ…Œ์ด๋ธ”์˜ ๊ฒฝ๋กœ์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ ์•ˆ๋‚ด๋ฅผ ๋”ฐ๋ผ ์–‘์‹์„ ์ž‘์„ฑํ•˜์„ธ์š”.
    bq://bqprojectId.bqDatasetId.bqTableId
    
  • PROJECT_NUMBER: ํ”„๋กœ์ ํŠธ์˜ ์ž๋™์œผ๋กœ ์ƒ์„ฑ๋œ ํ”„๋กœ์ ํŠธ ๋ฒˆํ˜ธ

HTTP ๋ฉ”์„œ๋“œ ๋ฐ URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/datasets

JSON ์š”์ฒญ ๋ณธ๋ฌธ:

{
  "display_name": "DATASET_NAME",
  "metadata_schema_uri": "METADATA_SCHEMA_URI",
  "metadata": {
    "input_config": {
      "bigquery_source" :{
        "uri": "URI
      }
    }
  }
}

์š”์ฒญ์„ ๋ณด๋‚ด๋ ค๋ฉด ๋‹ค์Œ ์˜ต์…˜ ์ค‘ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

curl

์š”์ฒญ ๋ณธ๋ฌธ์„ request.json ํŒŒ์ผ์— ์ €์žฅํ•˜๊ณ  ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/datasets"

PowerShell

์š”์ฒญ ๋ณธ๋ฌธ์„ request.json ํŒŒ์ผ์— ์ €์žฅํ•˜๊ณ  ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/datasets" | Select-Object -Expand Content

๋‹ค์Œ๊ณผ ๋น„์Šทํ•œ JSON ์‘๋‹ต์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateDatasetOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-07-07T21:27:35.964882Z",
      "updateTime": "2020-07-07T21:27:35.964882Z"
    }
}

Java

์ด ์ƒ˜ํ”Œ์„ ์‚ฌ์šฉํ•ด ๋ณด๊ธฐ ์ „์— Vertex AI ๋น ๋ฅธ ์‹œ์ž‘: ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์‚ฌ์šฉ์˜ Java ์„ค์ • ์•ˆ๋‚ด๋ฅผ ๋”ฐ๋ฅด์„ธ์š”. ์ž์„ธํ•œ ๋‚ด์šฉ์€ Vertex AI Java API ์ฐธ๊ณ  ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Vertex AI์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.


import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.aiplatform.v1.CreateDatasetOperationMetadata;
import com.google.cloud.aiplatform.v1.Dataset;
import com.google.cloud.aiplatform.v1.DatasetServiceClient;
import com.google.cloud.aiplatform.v1.DatasetServiceSettings;
import com.google.cloud.aiplatform.v1.LocationName;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateDatasetTabularBigquerySample {

  public static void main(String[] args)
      throws InterruptedException, ExecutionException, TimeoutException, IOException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String bigqueryDisplayName = "YOUR_DATASET_DISPLAY_NAME";
    String bigqueryUri =
        "bq://YOUR_GOOGLE_CLOUD_PROJECT_ID.BIGQUERY_DATASET_ID.BIGQUERY_TABLE_OR_VIEW_ID";
    createDatasetTableBigquery(project, bigqueryDisplayName, bigqueryUri);
  }

  static void createDatasetTableBigquery(
      String project, String bigqueryDisplayName, String bigqueryUri)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    DatasetServiceSettings settings =
        DatasetServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DatasetServiceClient datasetServiceClient = DatasetServiceClient.create(settings)) {
      String location = "us-central1";
      String metadataSchemaUri =
          "gs://google-cloud-aiplatform/schema/dataset/metadata/tables_1.0.0.yaml";
      LocationName locationName = LocationName.of(project, location);

      String jsonString =
          "{\"input_config\": {\"bigquery_source\": {\"uri\": \"" + bigqueryUri + "\"}}}";
      Value.Builder metaData = Value.newBuilder();
      JsonFormat.parser().merge(jsonString, metaData);

      Dataset dataset =
          Dataset.newBuilder()
              .setDisplayName(bigqueryDisplayName)
              .setMetadataSchemaUri(metadataSchemaUri)
              .setMetadata(metaData)
              .build();

      OperationFuture<Dataset, CreateDatasetOperationMetadata> datasetFuture =
          datasetServiceClient.createDatasetAsync(locationName, dataset);
      System.out.format("Operation name: %s\n", datasetFuture.getInitialFuture().get().getName());
      System.out.println("Waiting for operation to finish...");
      Dataset datasetResponse = datasetFuture.get(300, TimeUnit.SECONDS);

      System.out.println("Create Dataset Table Bigquery sample");
      System.out.format("Name: %s\n", datasetResponse.getName());
      System.out.format("Display Name: %s\n", datasetResponse.getDisplayName());
      System.out.format("Metadata Schema Uri: %s\n", datasetResponse.getMetadataSchemaUri());
      System.out.format("Metadata: %s\n", datasetResponse.getMetadata());
    }
  }
}

Node.js

์ด ์ƒ˜ํ”Œ์„ ์‚ฌ์šฉํ•ด ๋ณด๊ธฐ ์ „์— Vertex AI ๋น ๋ฅธ ์‹œ์ž‘: ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์‚ฌ์šฉ์˜ Node.js ์„ค์ • ์•ˆ๋‚ด๋ฅผ ๋”ฐ๋ฅด์„ธ์š”. ์ž์„ธํ•œ ๋‚ด์šฉ์€ Vertex AI Node.js API ์ฐธ๊ณ  ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Vertex AI์— ์ธ์ฆํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋กœ์ปฌ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์˜ ์ธ์ฆ ์„ค์ •์„ ์ฐธ์กฐํ•˜์„ธ์š”.

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const datasetDisplayName = 'YOUR_DATASET_DISPLAY_NAME';
// const bigquerySourceUri = 'YOUR_BIGQUERY_SOURCE_URI';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

// Imports the Google Cloud Dataset Service Client library
const {DatasetServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const datasetServiceClient = new DatasetServiceClient(clientOptions);

async function createDatasetTabularBigquery() {
  // Configure the parent resource
  const parent = `projects/${project}/locations/${location}`;
  const metadata = {
    structValue: {
      fields: {
        inputConfig: {
          structValue: {
            fields: {
              bigquerySource: {
                structValue: {
                  fields: {
                    uri: {
                      listValue: {
                        values: [{stringValue: bigquerySourceUri}],
                      },
                    },
                  },
                },
              },
            },
          },
        },
      },
    },
  };
  // Configure the dataset resource
  const dataset = {
    displayName: datasetDisplayName,
    metadataSchemaUri:
      'gs://google-cloud-aiplatform/schema/dataset/metadata/tabular_1.0.0.yaml',
    metadata: metadata,
  };
  const request = {
    parent,
    dataset,
  };

  // Create dataset request
  const [response] = await datasetServiceClient.createDataset(request);
  console.log(`Long running operation : ${response.name}`);

  // Wait for operation to complete
  await response.promise();
  const result = response.result;

  console.log('Create dataset tabular bigquery response');
  console.log(`\tName : ${result.name}`);
  console.log(`\tDisplay name : ${result.displayName}`);
  console.log(`\tMetadata schema uri : ${result.metadataSchemaUri}`);
  console.log(`\tMetadata : ${JSON.stringify(result.metadata)}`);
}
createDatasetTabularBigquery();

Python

Vertex AI SDK for Python์„ ์„ค์น˜ํ•˜๊ฑฐ๋‚˜ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐฉ๋ฒ•์€ Vertex AI SDK for Python ์„ค์น˜๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”. ์ž์„ธํ•œ ๋‚ด์šฉ์€ Python API ์ฐธ๊ณ  ๋ฌธ์„œ๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

def create_and_import_dataset_tabular_bigquery_sample(
    display_name: str,
    project: str,
    location: str,
    bq_source: str,
):

    aiplatform.init(project=project, location=location)

    dataset = aiplatform.TabularDataset.create(
        display_name=display_name,
        bq_source=bq_source,
    )

    dataset.wait()

    print(f'\tDataset: "{dataset.display_name}"')
    print(f'\tname: "{dataset.resource_name}"')

์ž‘์—… ์ƒํƒœ ๊ฐ€์ ธ์˜ค๊ธฐ

์ผ๋ถ€ ์š”์ฒญ์€ ์™„๋ฃŒํ•˜๋Š” ๋ฐ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ๋Š” ์žฅ๊ธฐ ์‹คํ–‰ ์ž‘์—…์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์š”์ฒญ์€ ์ž‘์—… ์ƒํƒœ๋ฅผ ๋ณด๊ฑฐ๋‚˜ ์ž‘์—…์„ ์ทจ์†Œํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ž‘์—… ์ด๋ฆ„์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. Vertex AI๋Š” ์žฅ๊ธฐ ์‹คํ–‰ ์ž‘์—…์„ ํ˜ธ์ถœํ•˜๋Š” ๋„์šฐ๋ฏธ ๋ฉ”์„œ๋“œ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์žฅ๊ธฐ ์‹คํ–‰ ์ž‘์—… ๋‹ค๋ฃจ๊ธฐ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

๋‹ค์Œ ๋‹จ๊ณ„