Stay organized with collections
Save and categorize content based on your preferences.
Cloud Data Fusion provides a Dataplex Universal Catalog Sink plugin
for ingesting data to any of the Dataplex Universal Catalog supported assets.
Before you begin
If you don't have a Cloud Data Fusion instance, create one. This plugin
is available in instances that run in Cloud Data Fusion version 6.6 or
later. For more information, see
Create a Cloud Data Fusion public instance.
The BigQuery dataset or Cloud Storage bucket
where data is ingested must be part of a Dataplex Universal Catalog lake.
For data to be read from Cloud Storage entities,
Dataproc Metastore must be attached to the lake.
CSV data in Cloud Storage entities isn't supported.
In the Dataplex Universal Catalog project, enable Private Google Access on
the subnetwork, which is typically set to default, or set
internal_ip_only to false.
Required roles
To get the permissions that
you need to manage roles,
ask your administrator to grant you the
following IAM roles on the Dataproc service agent and the Cloud Data Fusion service agent (service-CUSTOMER_PROJECT_NUMBER@gcp-sa-datafusion.iam.gserviceaccount.com):
Go to the Studio page, expand the Sink menu, and click Dataplex.
Configure the plugin
After you add this plugin to your pipeline on the Studio page, click the
Dataplex Universal Catalog sink to configure and save its properties.
For more information about configurations, see the
Dataplex Sink reference.
Optional: Get started with a sample pipeline
Sample pipelines are available, including an SAP source to
Dataplex Universal Catalog sink pipeline and a Dataplex Universal Catalog source to
BigQuery sink pipeline.
To use a sample pipeline, open your instance in the Cloud Data Fusion UI,
click Hub > Pipelines, and select one of the
Dataplex Universal Catalog pipelines. A dialog opens to help you create the
pipeline.
Run your pipeline
After deploying the pipeline, open your pipeline on the Cloud Data Fusion
Studio page.
Click Configure > Resources.
Optional: Change the Executor CPU and Memory based on the overall
data size and the number of transformations used in your pipeline.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[[["\u003cp\u003eCloud Data Fusion's Dataplex Sink plugin enables data ingestion into Dataplex-supported assets from version 6.6 or later.\u003c/p\u003e\n"],["\u003cp\u003eUsing the plugin requires the BigQuery dataset or Cloud Storage bucket to be part of a Dataplex lake, with Dataproc Metastore attached for Cloud Storage data.\u003c/p\u003e\n"],["\u003cp\u003eSpecific IAM roles, including Dataplex Developer and Dataplex Data Reader, are needed on the Dataproc and Cloud Data Fusion service agents to manage permissions.\u003c/p\u003e\n"],["\u003cp\u003eThe Dataplex plugin can be added to your Cloud Data Fusion pipeline via the Studio page, where it can be configured and its properties saved.\u003c/p\u003e\n"],["\u003cp\u003eSample pipelines are available, including SAP to Dataplex and Dataplex to BigQuery, which can be accessed through the Cloud Data Fusion UI.\u003c/p\u003e\n"]]],[],null,["# Ingest data with Cloud Data Fusion\n\n[Cloud Data Fusion](/data-fusion) provides a Dataplex Universal Catalog Sink plugin\nfor ingesting data to any of the Dataplex Universal Catalog supported assets.\n\nBefore you begin\n----------------\n\n- If you don't have a Cloud Data Fusion instance, create one. This plugin is available in instances that run in Cloud Data Fusion version 6.6 or later. For more information, see [Create a Cloud Data Fusion public instance](/data-fusion/docs/how-to/create-instance).\n- The BigQuery dataset or Cloud Storage bucket where data is ingested must be part of a Dataplex Universal Catalog lake.\n- For data to be read from Cloud Storage entities, Dataproc Metastore must be attached to the lake.\n- CSV data in Cloud Storage entities isn't supported.\n- In the Dataplex Universal Catalog project, enable Private Google Access on the subnetwork, which is typically set to `default`, or set `internal_ip_only` to `false`.\n\n### Required roles\n\n\nTo get the permissions that\nyou need to manage roles,\n\nask your administrator to grant you the\nfollowing IAM roles on the Dataproc service agent and the Cloud Data Fusion service agent (`service-`\u003cvar translate=\"no\"\u003eCUSTOMER_PROJECT_NUMBER\u003c/var\u003e`@gcp-sa-datafusion.iam.gserviceaccount.com`):\n\n- [Dataplex Developer](/iam/docs/roles-permissions/dataplex#dataplex.developer) (`roles/dataplex.developer`)\n- [Dataplex Data Reader](/iam/docs/roles-permissions/dataplex#dataplex.dataReader) (`roles/dataplex.dataReader`)\n- [Dataproc Metastore Metadata User](/iam/docs/roles-permissions/metastore#metastore.metadataUser) (`roles/metastore.metadataUser`)\n- [Cloud Dataplex Service Agent](/iam/docs/roles-permissions/dataplex#dataplex.serviceAgent) (`roles/dataplex.serviceAgent`)\n- [Dataplex Metadata Reader](/iam/docs/roles-permissions/dataplex#dataplex.metadataReader) (`roles/dataplex.metadataReader`)\n\n\nFor more information about granting roles, see [Manage access to projects, folders, and organizations](/iam/docs/granting-changing-revoking-access).\n\n\nYou might also be able to get\nthe required permissions through [custom\nroles](/iam/docs/creating-custom-roles) or other [predefined\nroles](/iam/docs/roles-overview#predefined).\n\nAdd the plugin to your pipeline\n-------------------------------\n\n1. In the Google Cloud console, go to the Cloud Data Fusion **Instances** page.\n\n [Go to Instances](https://console.cloud.google.com/data-fusion/locations/-/instances)\n\n This page lets you manage your instances.\n2. To open your instance, click **View instance**.\n\n3. Go to the **Studio** page, expand the **Sink** menu, and click **Dataplex**.\n\nConfigure the plugin\n--------------------\n\nAfter you add this plugin to your pipeline on the **Studio** page, click the\nDataplex Universal Catalog sink to configure and save its properties.\n\nFor more information about configurations, see the\n[Dataplex Sink](https://cdap.atlassian.net/wiki/spaces/DOCS/pages/1766948865/Google+Dataplex+Sink) reference.\n\nOptional: Get started with a sample pipeline\n--------------------------------------------\n\nSample pipelines are available, including an SAP source to\nDataplex Universal Catalog sink pipeline and a Dataplex Universal Catalog source to\nBigQuery sink pipeline.\n\nTo use a sample pipeline, open your instance in the Cloud Data Fusion UI,\nclick **Hub \\\u003e Pipelines**, and select one of the\nDataplex Universal Catalog pipelines. A dialog opens to help you create the\npipeline.\n\nRun your pipeline\n-----------------\n\n1. After deploying the pipeline, open your pipeline on the Cloud Data Fusion\n **Studio** page.\n\n2. Click **Configure \\\u003e Resources**.\n\n3. Optional: Change the **Executor CPU** and **Memory** based on the overall\n data size and the number of transformations used in your pipeline.\n\n4. Click **Save**.\n\n5. To start the data pipeline, click **Run**.\n\nWhat's next\n-----------\n\n- [Process data with Cloud Data Fusion](/dataplex/docs/process-with-data-fusion) using the Dataplex Universal Catalog Source plugin."]]