Los eventos de mantenimiento del host suelen ocurrir una vez cada dos semanas, pero a veces se pueden ejecutar con mayor frecuencia.
En este documento, se analiza cómo puedes minimizar las interrupciones en las cargas de trabajo durante un evento de mantenimiento.
Recibe avisos anticipados antes de eventos de mantenimiento
Puedes
supervisar el programa de mantenimiento de la instancia de máquina virtual (VM) y
preparar la transición de tus cargas de trabajo cuando se reinicia el sistema.
Para recibir una notificación avanzada de los eventos del host, supervisa
el valor de metadatos /computeMetadata/v1/instance/maintenance-event.
Si la solicitud al servidor de metadatos devuelve NONE, la VM
no está programada para detenerse. Por ejemplo, ejecuta el siguiente comando desde una VM:
Si el servidor de metadatos muestra TERMINATE_ON_HOST_MAINTENANCE, entonces tu instancia está programada para detenerse. Compute Engine les da a las VMs de GPU un aviso de 1 hora antes de la detención, mientras que las VMs normales reciben un aviso de apenas 60 segundos. Configura tu aplicación para que haga una transición cuando se produce un evento de mantenimiento. Por ejemplo, puedes usar una de las siguientes técnicas:
Configura tu aplicación para que transfiera de forma temporal el trabajo en curso a un
bucket de Cloud Storage y recupere
esos datos después de que se reinicie la instancia.
Escribe datos en un
disco persistente secundario.
Cuando la instancia se reinicia de forma automática, el disco persistente se puede
volver a conectar y tu aplicación puede reanudar el trabajo.
[[["Fácil de comprender","easyToUnderstand","thumb-up"],["Resolvió mi problema","solvedMyProblem","thumb-up"],["Otro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Información o código de muestra incorrectos","incorrectInformationOrSampleCode","thumb-down"],["Faltan la información o los ejemplos que necesito","missingTheInformationSamplesINeed","thumb-down"],["Problema de traducción","translationIssue","thumb-down"],["Otro","otherDown","thumb-down"]],["Última actualización: 2025-09-04 (UTC)"],[[["\u003cp\u003eVMs with attached GPUs must be stopped during Compute Engine maintenance events because they cannot be live migrated.\u003c/p\u003e\n"],["\u003cp\u003eYou must configure these GPU-attached VMs to stop for host maintenance events, with the option to automatically restart afterward.\u003c/p\u003e\n"],["\u003cp\u003eData on Local SSD disks attached to GPU VMs is unrecoverable if the VM is restarted during a host maintenance event.\u003c/p\u003e\n"],["\u003cp\u003eYou can monitor the \u003ccode\u003e/computeMetadata/v1/instance/maintenance-event\u003c/code\u003e metadata value to receive advance notice of host maintenance events, with GPU VMs receiving a 1-hour notice to prepare for shutdown.\u003c/p\u003e\n"],["\u003cp\u003eTo minimize disruptions, you can temporarily move in-progress work to Cloud Storage or write data to a secondary Persistent Disk, ensuring it is retrievable after the VM restarts.\u003c/p\u003e\n"]]],[],null,["# Handle GPU host maintenance events\n\n*** ** * ** ***\n\nWhen Compute Engine performs [maintenance](/compute/docs/instances/host-maintenance-overview#maintenanceevents) on a virtual machine (VM) with\n[attached graphics processing units (GPUs)](/compute/docs/gpus/about-gpus),\nthe VM must be stopped. This is because VMs with attached GPUs\ncan't be\n[live migrated](/compute/docs/instances/live-migration-process#limitations).\n\nYou must set these VMs to\n[stop for host maintenance events](/compute/docs/instances/host-maintenance-overview#terminate_and_optionally_restart).\nYou can set your stopped VMs to\n[automatically restart](/compute/docs/instances/host-maintenance-overview#autorestart)\nafter the maintenance event completes.\n| **Warning:** For VMs with GPUs, data on any Local SSD disks attached to the VM is unrecoverable if Compute Engine restarts the VM for [host maintenance events](/compute/docs/gpus/gpu-host-maintenance).\n\nHost maintenance events typically occur once every two weeks, but might occasionally run more frequently.\n\nThis document discusses how you can minimize disruptions to your workloads during a maintenance event.\n| **Note:** VMs with attached GPUs can take up to one hour to terminate after failures or [host errors](/compute/docs/faq#hosterror).\n\nReceive advance notice before maintenance events\n------------------------------------------------\n\nYou can\nmonitor the maintenance schedule for your virtual machine (VM) instance, and\nprepare your workloads to transition through the system restart.\n\nTo receive advance notice of host events, monitor the\n`/computeMetadata/v1/instance/maintenance-event` metadata value.\nIf the request to the metadata server returns `NONE`, then the VM isn't\nscheduled to stop. For example, run the following command from within a VM: \n\n```\ncurl http://metadata.google.internal/computeMetadata/v1/instance/maintenance-event -H \"Metadata-Flavor: Google\"\n\nNONE\n```\n\nIf the metadata server returns `TERMINATE_ON_HOST_MAINTENANCE`, then your\nVM is scheduled for stopping. Compute Engine gives GPU\nVMs a 1-hour stopping notice, while normal VMs receive only\na 60-second notice. Configure your application to transition through the\nmaintenance event. For example, you might use one of the following techniques:\n\n- Configure your application to temporarily move work in progress to a\n [Cloud Storage bucket](/storage/docs/uploading-objects), then retrieve\n that data after the VM restarts.\n\n- Write data to a\n [secondary Persistent Disk](/compute/docs/disks/add-persistent-disk).\n When the VM automatically restarts, the Persistent Disk can be\n reattached and your application can resume work.\n\nWhat's next?\n------------\n\n- Learn more about [GPU platforms](/compute/docs/gpus).\n- To learn more about managing and scaling groups of VMs, see [Set the group's target size](/compute/docs/instance-groups/add-remove-vms-in-mig#set_the_groups_target_size).\n- To monitor GPU performance, see [Monitoring GPU performance](/compute/docs/gpus/monitor-gpus).\n- To improve network performance, see [Use higher network bandwidth](/compute/docs/gpus/optimize-gpus).\n- Learn how to [troubleshoot VM shutdowns and reboots](/compute/docs/troubleshooting/troubleshooting-reboots)."]]