Cloud Storage Avro ๅฐ Bigtable ็ฏๆฌๆฏไธ็จฎ็ฎก้๏ผๅฏๅพ Cloud Storage ๅผๅไธญ็ Avro ๆชๆก่ฎๅ่ณๆ๏ผไธฆๅฐ่ณๆๅฏซๅ ฅ Bigtable ่ณๆ่กจใๆจๅฏไปฅไฝฟ็จ้ๅ็ฏๆฌๅฐ่ณๆๅพ Cloud Storage ่ค่ฃฝๅฐ Bigtableใ
็ฎก้็ธ้่ฆๅฎ
- Bigtable ่ณๆ่กจๅฟ ้ ๅญๅจ๏ผไธฆไธ่ๅพ Avro ๆชๆกไธญๅฏๅบ็ๅ งๅฎนๅ ทๆ็ธๅ็่ณๆๆฌ็ณปๅใ
- ๅจๅท่ก็ฎก้ไนๅ๏ผ่ผธๅ ฅ Avro ๆชๆกๅฟ ้ ๅญๅจๆผ Cloud Storage ๅผๅไธญใ
- Bigtable ้ ๆ่ผธๅ ฅ Avro ๆชๆกๆก็นๅฎ ็ตๆงๅฎ็พฉใ
็ฏๆฌๅๆธ
ๅฟ ่ฆๅๆธ
- bigtableProjectId๏ผๅ ๅซๆจ่ฆๅฏซๅ ฅ่ณๆ็ Bigtable ๅท่กๅ้ซ็ Google Cloud ๅฐๆก IDใ
- bigtableInstanceId๏ผๅ ๅซ่ณๆ่กจ็ Bigtable ๅท่กๅ้ซ IDใ
- bigtableTableId๏ผ่ฆๅฏๅ ฅ็ Bigtable ่ณๆ่กจ IDใ
- inputFilePattern๏ผ่ณๆๅญๆพไฝ็ฝฎ็ Cloud Storage ่ทฏๅพๆจกๅผ๏ผไพๅฆ๏ผ
gs://<BUCKET_NAME>/FOLDER/PREFIX*
ใ
้ธ็จๅๆธ
- splitLargeRows๏ผ้ๅๆๆจ็จๆผๅ็จๅฐๅคงๅ่ณๆๅๅๅฒๆๅคๅ MutateRows ่ฆๆฑใ่ซๆณจๆ๏ผๅฆๆๅคงๅ่ณๆๅๅจๅคๅ API ๅผๅซไน้ๅๅฒ๏ผ่ณๆๅ็ๆดๆฐไฝๆฅญๅฐฑไธๆฏไธๅฏๅๅฒใ
ๅท่ก็ฏๆฌ
ๆงๅถๅฐ
- ๅๅพ Dataflow ็ใCreate job from templateใ(้้็ฏๆฌๅปบ็ซๅทฅไฝ) ้ ้ขใ ๅๅพใไพๆ็ฏๆฌๅปบ็ซๅทฅไฝใ
- ๅจใๅทฅไฝๅ็จฑใๆฌไฝไธญ๏ผ่ผธๅ ฅๅฐๅฑฌๅทฅไฝๅ็จฑใ
- ้ธ็จ๏ผๅฆ่ฆไฝฟ็จๅๅ็ซฏ้ป๏ผ่ซๅพไธๆๅผ้ธๅฎไธญ้ธๅๅผใ้ ่จญๅๅ็บ
us-central1
ใๅฆ้ๅฏๅท่ก Dataflow ๅทฅไฝ็ๅฐๅๆธ ๅฎ๏ผ่ซๅ้ฑใDataflow ไฝ็ฝฎใใ
- ๅพใDataflow templateใ(Dataflow ็ฏๆฌ) ไธๆๅผ้ธๅฎไธญ้ธๅ the Avro Files on Cloud Storage to Cloud Bigtable templateใ
- ๅจๆไพ็ๅๆธๆฌไฝไธญ่ผธๅ ฅๅๆธๅผใ
- ๆไธไธใRun Jobใ(ๅท่กๅทฅไฝ)ใ
gcloud
ๅจๆฎผๅฑคๆ็ต็ซฏๆฉไธญๅท่ก็ฏๆฌ๏ผ
gcloud dataflow jobs run JOB_NAME \ --gcs-location gs://dataflow-templates-REGION_NAME/VERSION/GCS_Avro_to_Cloud_Bigtable \ --region REGION_NAME \ --parameters \ bigtableProjectId=BIGTABLE_PROJECT_ID,\ bigtableInstanceId=INSTANCE_ID,\ bigtableTableId=TABLE_ID,\ inputFilePattern=INPUT_FILE_PATTERN
ๆดๆนไธๅๅ งๅฎน๏ผ
JOB_NAME
๏ผ ๆจ้ธๆ็ไธ้่คๅทฅไฝๅ็จฑVERSION
๏ผ ๆจ่ฆไฝฟ็จ็็ฏๆฌ็ๆฌๆจๅฏไปฅไฝฟ็จไธๅๅผ๏ผ
latest
๏ผไฝฟ็จ็ฏๆฌ็ๆๆฐ็ๆฌ๏ผ่ฉฒ็ๆฌไฝๆผๅผๅไธญ้ไพๆฅๆๅฝๅ็ไธๅฑค่ณๆๅคพ๏ผgs://dataflow-templates-REGION_NAME/latest/- ็ๆฌๅ็จฑ (ไพๅฆ
2023-09-12-00_RC00
)๏ผ็จๆผๆๅฎ็ฏๆฌ็ๆฌ๏ผ่ฉฒ็ๆฌๆไปฅๅทข็็ตๆงๅญๆพๅจๅผๅไธญไพๆฅๆๅฝๅ็ไธๅฑค่ณๆๅคพไธญ๏ผgs://dataflow-templates-REGION_NAME/
REGION_NAME
๏ผ ๆจ่ฆ้จ็ฝฒ Dataflow ๅทฅไฝ็ๅฐๅ๏ผไพๅฆus-central1
BIGTABLE_PROJECT_ID
๏ผๆจ่ฆ่ฎๅ่ณๆ็ Bigtable ๅท่กๅ้ซ Google Cloud ๅฐๆก IDINSTANCE_ID
๏ผๅ ๅซ่ณๆ่กจ็ Bigtable ๅท่กๅ้ซ IDTABLE_ID
๏ผ่ฆๅฏๅบ็ Bigtable ่ณๆ่กจ IDINPUT_FILE_PATTERN
๏ผ่ณๆๅญๆพไฝ็ฝฎ็ Cloud Storage ่ทฏๅพๆจกๅผ๏ผไพๅฆgs://mybucket/somefolder/prefix*
API
ๅฆ่ฆไฝฟ็จ REST API ๅท่ก็ฏๆฌ๏ผ่ซๅณ้ HTTP POST ่ฆๆฑใๅฆ่ฆ้ฒไธๆญฅ็ญ่งฃ API ๅๆๆฌ็ฏๅ๏ผ่ซๅ้ฑ projects.templates.launch
ใ
POST https://dataflow.googleapis.com/v1b3/projects/PROJECT_ID/locations/LOCATION/templates:launch?gcsPath=gs://dataflow-templates-LOCATION/VERSION/GCS_Avro_to_Cloud_Bigtable { "jobName": "JOB_NAME", "parameters": { "bigtableProjectId": "BIGTABLE_PROJECT_ID", "bigtableInstanceId": "INSTANCE_ID", "bigtableTableId": "TABLE_ID", "inputFilePattern": "INPUT_FILE_PATTERN", }, "environment": { "zone": "us-central1-f" } }
ๆดๆนไธๅๅ งๅฎน๏ผ
PROJECT_ID
๏ผ ๆจ่ฆๅท่ก Dataflow ๅทฅไฝ็ๅฐๆก ID Google CloudJOB_NAME
๏ผ ๆจ้ธๆ็ไธ้่คๅทฅไฝๅ็จฑVERSION
๏ผ ๆจ่ฆไฝฟ็จ็็ฏๆฌ็ๆฌๆจๅฏไปฅไฝฟ็จไธๅๅผ๏ผ
latest
๏ผไฝฟ็จ็ฏๆฌ็ๆๆฐ็ๆฌ๏ผ่ฉฒ็ๆฌไฝๆผๅผๅไธญ้ไพๆฅๆๅฝๅ็ไธๅฑค่ณๆๅคพ๏ผgs://dataflow-templates-REGION_NAME/latest/- ็ๆฌๅ็จฑ (ไพๅฆ
2023-09-12-00_RC00
)๏ผ็จๆผๆๅฎ็ฏๆฌ็ๆฌ๏ผ่ฉฒ็ๆฌๆไปฅๅทข็็ตๆงๅญๆพๅจๅผๅไธญไพๆฅๆๅฝๅ็ไธๅฑค่ณๆๅคพไธญ๏ผgs://dataflow-templates-REGION_NAME/
LOCATION
๏ผ ๆจ่ฆ้จ็ฝฒ Dataflow ๅทฅไฝ็ๅฐๅ๏ผไพๅฆus-central1
BIGTABLE_PROJECT_ID
๏ผๆจ่ฆ่ฎๅ่ณๆ็ Bigtable ๅท่กๅ้ซ Google Cloud ๅฐๆก IDINSTANCE_ID
๏ผๅ ๅซ่ณๆ่กจ็ Bigtable ๅท่กๅ้ซ IDTABLE_ID
๏ผ่ฆๅฏๅบ็ Bigtable ่ณๆ่กจ IDINPUT_FILE_PATTERN
๏ผ่ณๆๅญๆพไฝ็ฝฎ็ Cloud Storage ่ทฏๅพๆจกๅผ๏ผไพๅฆgs://mybucket/somefolder/prefix*
ๅพ็บๆญฅ้ฉ
- ็ญ่งฃ Dataflow ็ฏๆฌใ
- ่ซๅ้ฑ Google ๆไพ็็ฏๆฌๆธ ๅฎใ