主機維護事件是指 Google Cloud 必須在 TPU 上執行維護或修復活動時。Google 會在維護作業開始前,傳送即將進行的代管服務器維護作業通知。維護期間開始後, Google Cloud會自動對執行個體執行維護作業。監控執行個體的近期維護期間,您就能主動準備工作負載,以便在維護期間盡量減少中斷情形。
{"protoPayload":{"@type":"type.googleapis.com/google.cloud.audit.AuditLog","status":{"message":"Maintenance is scheduled for this instance. Review the maintenance schedule by describing the VM with gcloud CLI or querying the http://metadata.google.internal/computeMetadata/v1/instance/upcoming-maintenance metadata key."},"serviceName":"compute.googleapis.com","methodName":"compute.instances.upcomingMaintenance","resourceName":"projects/cloud-tpu-multipod-dev/zones/europe-west4-b/instances/t1v-n-9472280f-w-0","request":{"@type":"type.googleapis.com/compute.instances.upcomingMaintenance"},"metadata":{"type":"SCHEDULED","windowStartTime":"2024-11-15T04:00:00Z","canReschedule":true,"latestWindowStartTime":"2024-11-15T04:00:01Z","windowEndTime":"2024-11-15T08:00:00Z","maintenanceStatus":"PENDING"},"logName":"projects/cloud-tpu-multipod-dev/logs/cloudaudit.googleapis.com%2Fsystem_event","operation":{"id":"systemevent-1731038451389-6265ecbfcd453-5127b81e-f40b8149","producer":"compute.instances.upcomingMaintenance","first":true,"last":true},"receiveTimestamp":"2024-11-08T04:00:54.457835088Z"}
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[],[],null,["# View maintenance notifications\n==============================\n\n| **Note:** Only TPU v6e supports upcoming maintenance notifications.\n\nA host maintenance event is when Google Cloud has to perform a maintenance or\nrepair activity on your TPU. Google sends notifications for upcoming host\nmaintenance prior to the maintenance being performed. When the\nmaintenance window opens, Google Cloud\nautomatically performs maintenance on your instance. By monitoring your\ninstance's upcoming maintenance windows, you can proactively prepare your\nworkloads to handle upcoming maintenance with minimal disruption.\n\nCloud TPU lets you view maintenance notifications using the Google Cloud CLI\nand by querying the metadata server. You can also view upcoming maintenance\nevents in Cloud Logging. For information about viewing maintenance\nnotifications for TPUs in GKE, see [Manage GKE node disruption for GPUs and\nTPUs](/kubernetes-engine/docs/concepts/handle-disruption-gpu-tpu).\n\nMaintenance notification fields\n-------------------------------\n\nMaintenance notifications contain the following fields:\n\n- `windowStartTime`: The start of the time window in which maintenance will occur\n- `windowEndTime`: The end of the time window in which maintenance will occur\n- `latestWindowStartTime`: The latest time that the maintenance window can be moved to\n- `maintenanceType`: The type of maintenance that will be performed\n - `SCHEDULED`: Maintenance will get seven days notice\n - `UNSCHEDULED`: Maintenance represents critical updates for which less notice is given than for scheduled maintenance events\n- `canReschedule`: Whether you can manually start maintenance during the notification period for this VM.\n - `TRUE`: You can manually start maintenance during the notification period.\n - `FALSE`: You can't manually start maintenance on this VM. This is typically observed during the period in which the VM is actively undergoing maintenance.\n- `maintenanceStatus`: The current maintenance operation's status\n - `ONGOING`: The maintenance operation is underway\n - `PENDING`: The maintenance operation has not yet started, but is scheduled\n\nIf there is no maintenance notification, the response looks similar to the\nfollowing: \n\n { \"error\": \"no notifications have been received yet, try again later\" }\n\n### Maintenance status behaviors\n\nWhen managing maintenance events, check the values for `canReschedule` and\n`maintenanceStatus`. When combined, these fields indicate which actions you can\nor can't take with regards to manually starting a maintenance event:\n\n- **`canReschedule=True` and `maintenanceStatus=Pending`**: you can manually start the maintenance event for the instance before the scheduled start time.\n- **`canReschedule=False` and `maintenanceStatus=Ongoing`**: the maintenance is underway and can't be rescheduled.\n- **`canReschedule=False` and `maintenanceStatus=Pending`**: your instance doesn't support manually-triggered maintenance events.\n\nView maintenance notifications\n------------------------------\n\nYou can view maintenance notifications by:\n\n- Calling the Cloud TPU API using the Google Cloud CLI\n- Querying the metadata server on your VM\n- Checking Cloud Logging\n\n### Check TPUs for a maintenance notification\n\n### gcloud\n\nUse the [`gcloud alpha compute tpus tpu-vm\ndescribe`](/sdk/gcloud/reference/alpha/compute/tpus/describe) command to view\nmaintenance notifications: \n\n```bash\ngcloud alpha compute tpus tpu-vm describe TPU_NAME \\\n --zone=ZONE\n```\n\nIf there is an upcoming maintenance event, the response will contain a section\nlike the following: \n\n```bash\nupcomingMaintenance:\n canReschedule: true\n latestWindowStartTime: \"2025-12-01T19:00:00Z\"\n maintenanceStatus: PENDING\n type: SCHEDULED\n windowEndTime: \"2025-12-01T22:00:00Z\"\n windowStartTime: \"2025-12-01T19:00:00Z\"\n```\n\nIn this response:\n\n- The maintenance is scheduled for the date and time shown in `windowStartTime`.\n- `canReschedule` is set to `true` and `maintenanceStatus` is set to `PENDING`. These settings indicate that you can manually start the scheduled maintenance event before the date shown in `latestWindowStartTime`.\n\n### Metadata server\n\nFrom a TPU VM, query the metadata server to see the next maintenance event: \n\n```bash\ncurl http://metadata.google.internal/computeMetadata/v1/instance/upcoming-maintenance?alt=json -H \"Metadata-Flavor: Google\"\n```\n\nIf there is an upcoming maintenance event, the response will contain a\nsection similar to the following: \n\n```json\nUpcoming maintenance: {\n \"can_reschedule\" : \"true\",\n \"latest_window_start_time\" : \"2024-06-12T16:00:01+00:00\",\n \"maintenance_status\" : \"PENDING\",\n \"type\" : \"SCHEDULED\",\n \"window_end_time\" : \"2024-06-12T20:00:00+00:00\",\n \"window_start_time\" : \"2024-06-12T16:00:00+00:00\"\n}\n```\n\nYou can query the metadata server from any TPU VM in the slice because the\nupcoming maintenance event notification is the same for all VMs in a slice.\n\nFor more information about VM metadata, see [About VM\nmetadata](/compute/docs/metadata/overview) in the Compute Engine\ndocumentation.\n\n### Check Cloud Logging for a maintenance notification\n\nWhen a notification is scheduled on your Cloud TPU, Cloud Logging will\ncontain a system event log for the event, with the `methodName`:\n`compute.instance.upcomingMaintenance`. To view logs for upcoming maintenance\nevents:\n\n1. In the Google Cloud console navigation menu, go to the Logs Explorer page:\n\n [Go to Logs Explorer](https://console.cloud.google.com/logs)\n2. Use the following search query to view any TPUs that have an upcoming\n maintenance event scheduled:\n\n `\"compute.instances.upcomingMaintenance\"`\n\n Cloud TPU logs upcoming maintenance events in Cloud Logging by\n the individual VM instance, for example, `t1v-n-5bdca789-w-0`.\n\n#### Examples of maintenance notification logs\n\nA maintenance event notification appears in Logs Explorer with values\nsimilar to the following:\n\n- `methodName`: `\"compute.instances.upcomingMaintenance\"`\n- `metadata`:\n - `maintenanceStatus`: `\"PENDING\"`\n - `windowStartTime`: `\"2024-07-23T20:00:00Z\"`\n\nThe following is an example of a complete log entry for an upcoming maintenance\nevent: \n\n {\n \"protoPayload\": {\n \"@type\": \"type.googleapis.com/google.cloud.audit.AuditLog\",\n \"status\": {\n \"message\": \"Maintenance is scheduled for this instance. Review the maintenance schedule by describing the VM with gcloud CLI or querying the http://metadata.google.internal/computeMetadata/v1/instance/upcoming-maintenance metadata key.\"\n },\n \"serviceName\": \"compute.googleapis.com\",\n \"methodName\": \"compute.instances.upcomingMaintenance\",\n \"resourceName\": \"projects/cloud-tpu-multipod-dev/zones/europe-west4-b/instances/t1v-n-9472280f-w-0\",\n \"request\": {\n \"@type\": \"type.googleapis.com/compute.instances.upcomingMaintenance\"\n },\n \"metadata\": {\n \"type\": \"SCHEDULED\",\n \"windowStartTime\": \"2024-11-15T04:00:00Z\",\n \"canReschedule\": true,\n \"latestWindowStartTime\": \"2024-11-15T04:00:01Z\",\n \"windowEndTime\": \"2024-11-15T08:00:00Z\",\n \"maintenanceStatus\": \"PENDING\"\n },\n \"logName\": \"projects/cloud-tpu-multipod-dev/logs/cloudaudit.googleapis.com%2Fsystem_event\",\n \"operation\": {\n \"id\": \"systemevent-1731038451389-6265ecbfcd453-5127b81e-f40b8149\",\n \"producer\": \"compute.instances.upcomingMaintenance\",\n \"first\": true,\n \"last\": true\n },\n \"receiveTimestamp\": \"2024-11-08T04:00:54.457835088Z\"\n }\n\nWhen the maintenance event starts, a new informational event appears in the logs\nwith values similar to the following:\n\n- `methodName`: `\"compute.instances.upcomingMaintenance\"`\n- `metadata`:\n - `maintenanceStatus`: `\"ONGOING\"`\n - `windowStartTime`: `\"2024-07-23T20:00:00Z\"`\n\nWhen the maintenance event ends, a new informational event appears in the audit\nlogs with values similar to the following:\n\n- `methodName`: `\"compute.instances.upcomingMaintenance\"`\n- `status: { message: \"Maintenance window has completed for this instance. All\n maintenance notifications on the instance have been removed.\" }`\n\nWhat's next\n-----------\n\n- [Prepare for maintenance events](/tpu/docs/maintenance-events)\n- [Manually start a host maintenance event](/tpu/docs/manually-start-maintenance)"]]