Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Halaman ini membahas cara melihat log pelacakan yang terkait dengan cluster Ray dan memantau metrik Ray on Vertex AI. Panduan
untuk men-debug cluster Ray juga disediakan.
Melihat log
Saat Anda menjalankan tugas dengan cluster Ray di Vertex AI, log pelacakan akan otomatis dibuat dan disimpan di Cloud Logging dan dasbor Ray open source. Bagian ini menjelaskan cara mengakses log yang dihasilkan melalui konsol Google Cloud .
Ganti CLUSTER_NAME dengan nama untuk cluster Ray Anda. Di konsol Google Cloud , buka Vertex AI>Ray on Vertex AI tempat Anda melihat daftar nama cluster di setiap region.
Untuk lebih mempersempit log ke file log tertentu seperti raylet.out, klik nama log di bagian Log fields -> Log name.
Anda dapat mengelompokkan entri log yang serupa:
Di Query results, klik entri log untuk meluaskan log.
Di jsonPayload, klik nilai tailed_path. Menu drop-down akan muncul.
Klik Show matching entries.
Nonaktifkan log
Secara default, Cloud Logging untuk Ray di Vertex AI diaktifkan.
Untuk menonaktifkan ekspor log Ray ke Cloud Logging, gunakan perintah
Vertex AI SDK untuk Python berikut:
Anda dapat melihat file log Ray di dasbor Ray meskipun fitur
Ray on Vertex AI Cloud Logging dinonaktifkan.
Memantau metrik
Anda dapat melihat metrik Ray on Vertex AI dengan berbagai cara menggunakan
Google Cloud Monitoring (GCM).
Atau, Anda dapat mengekspor metrik dari GCM ke server Grafana Anda sendiri.
Memantau Metrik di GCM
Ada dua cara untuk melihat metrik Ray di Vertex AI di GCM.
Gunakan tampilan langsung di bagian Metrics Explorer.
Impor dasbor Grafana.
Metrics Explorer
Untuk menggunakan tampilan langsung di bagian Metrics Explorer, ikuti langkah-langkah berikut:
Di bagian Active Resources, pilih Prometheus Target. Kategori Metrik Aktif
akan muncul.
Pilih Ray.
Daftar metrik akan muncul:
Pilih metrik yang ingin Anda pantau. Contoh:
Pilih persentase pemakaian CPU sebagai metrik yang dipantau:
Pilih filter. Misalnya, pilih cluster:
Gunakan ID cluster untuk memantau metrik di atas hanya untuk cluster tertentu. Untuk menemukan ID cluster Anda,
ikuti langkah-langkah berikut:
Pastikan Anda berada dalam project tempat eksperimen akan dibuat.
Di bagian Name, daftar ID cluster akan muncul.
Pilih metode Agregasi untuk melihat metrik. Artinya, Anda dapat memilih untuk
melihat metrik yang tidak diagregasi, yang menunjukkan pemakaian CPU setiap proses Ray:
Yang Anda butuhkan hanyalah file JSON dasbor Grafana. OSS Ray mendukung
penyiapan manual ini
dengan menyediakan file JSON Grafana dasbor default.
Memantau metrik
dari Grafana milik pengguna
Jika Anda sudah menjalankan server Grafana, ada juga cara untuk mengekspor semua metrik Prometheus cluster Ray di Vertex AI ke server Grafana yang ada. Untuk melakukannya, ikuti panduan
Membuat kueri menggunakan Grafana
GMP. Hal ini memungkinkan Anda menambahkan sumber data Grafana baru ke server Grafana yang ada dan menggunakan penyinkron sumber data untuk menyinkronkan sumber data Prometheus Grafana baru ke metrik Ray di Vertex AI.
Penting bagi Anda untuk mengonfigurasi dan mengautentikasi sumber data Grafana yang baru ditambahkan menggunakan penyinkron sumber data. Ikuti langkah-langkah yang diberikan di
Mengonfigurasi dan mengautentikasi sumber data Grafana.
Setelah disinkronkan, Anda dapat membuat dan menambahkan dasbor yang diperlukan berdasarkan metrik Ray on Vertex AI.
Secara default, pengumpulan metrik Ray on Vertex AI diaktifkan.
Berikut cara menonaktifkannya menggunakan Vertex AI SDK untuk Python:
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-02 UTC."],[],[],null,["# Monitor your Ray cluster on Vertex AI\n\nThis page covers how to view the tracking logs associated with your\nRay clusters and monitor the Ray on Vertex AI metrics. Guidance\nfor debugging Ray clusters is also provided.\n\nView logs\n---------\n\nWhen you perform tasks with your Ray cluster on Vertex AI,\ntracking logs are automatically generated and stored in both Cloud Logging\nand [open source Ray dashboard](https://docs.ray.io/en/latest/ray-observability/getting-started.html#logs-view). This section describes how to access\nthe generated logs through the Google Cloud console.\nBefore you begin, make sure to read the [Ray on Vertex AI overview](/vertex-ai/docs/open-source/ray-on-vertex-ai/overview) and [set up](/vertex-ai/docs/open-source/ray-on-vertex-ai/set-up) all the prerequisite tools you need. \n\n### Ray OSS dashboard\n\nYou can view the open source Ray log files through the Ray OSS dashboard:\n\n1. In the Google Cloud console, go to the Ray on Vertex AI page.\n\n [Go to the Ray on Vertex AI page](https://console.cloud.google.com/vertex-ai/ray)\n2. In the row for the cluster you created, select more_vert\n **more actions** menu.\n\n3. Select the Ray OSS dashboard link.\n The dashboard opens in another tab.\n\n4. Navigate to the **Logs** view in the top right corner in the menu:\n\n5. Click each node to see the log files associated with that node.\n\n### Cloud Logging console\n\n1. In the Google Cloud console, go to the **Logs Explorer** page:\n\n [Go to **Logs Explorer**](https://console.cloud.google.com/logs/query)\n\n \u003cbr /\u003e\n\n If you use the search bar to find this page, then select the result whose subheading is\n **Logging**.\n2. Select an existing Google Cloud project, folder, or organization.\n\n3. To display all Ray logs, enter the following query into the query-editor\n field, and then click **Run query**:\n\n ```\n resource.labels.task_name=\"ray-cluster-logs\"\n ```\n4. To narrow down the logs to a specific Ray cluster, add the following line\n to the query and then click **Run query**:\n\n ```\n labels.\"ml.googleapis.com/ray_cluster_id\"=CLUSTER_NAME\n ```\n\n Replace \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e with the name for your Ray cluster. In the Google Cloud console go to **Vertex AI** \\\u003e **Ray on Vertex AI** where you see a list of cluster names in each region.\n5. To further narrow down the logs to a specific log file like `raylet.out`,\n click the name of the log under **Log fields** -\\\u003e **Log name**.\n\n6. You can group similar log entries together:\n\n 1. In the **Query results**, click a log entry to expand the log.\n\n 2. In the `jsonPayload`, click the `tailed_path` value. A drop-down menu\n appears.\n\n 3. Click **Show matching entries**.\n\nDisable logs\n------------\n\nBy default, Ray on Vertex AI Cloud Logging is enabled.\n\n- To disable the export of Ray logs to Cloud Logging, use the following\n Vertex AI SDK for Python command:\n\n vertex_ray.create_ray_cluster(..., enable_logging=False, ...)\n\nYou can view the Ray log files on the Ray dashboard even if the\nRay on Vertex AI Cloud Logging feature is disabled.\n\nMonitor metrics\n---------------\n\nYou can view the Ray on Vertex AI metrics in different ways using\n[Google Cloud Monitoring (GCM)](/monitoring).\nAlternatively, you can export the metrics from GCM to your own Grafana server.\n| **Note:** See [Google Cloud Managed Service for Prometheus (GMP)](/stackdriver/docs/managed-prometheus) for [pricing](/stackdriver/docs/managed-prometheus/cost-controls) and [data storage](/stackdriver/docs/managed-prometheus#gmp-data-storage) information.\n\n### Monitor Metrics in GCM\n\nThere are two ways you can view the Ray on Vertex AI metrics in GCM.\n\n- Use the direct view under **Metrics Explorer**.\n- Import the Grafana dashboard.\n\n### **Metrics Explorer**\n\n\nTo use the direct view under **Metrics Explorer**, follow these steps:\n\n1. Go to the Google Cloud Monitoring console.\n2. Under [**Explore**](http://console.cloud.google.com/monitoring/metrics-explorer) select **Metrics explorer**.\n3. Under **Active Resources** , select **Prometheus Target** . **Active Metric Categories** appears.\n4. Select **Ray**.\n\n A list of metrics appears:\n5. Select the metrics you want to monitor. For example:\n 1. Choose the cpu utilization percentage as a monitored metric: \n\n 2. Select a filter. For example, select cluster: \n Use the cluster ID to only monitor the above metrics for a specific cluster. To locate your cluster ID, follow these steps:\n 1. In the Google Cloud console, go to the **Ray** page.\n\n [Go to Ray](https://console.cloud.google.com/vertex-ai/ray)\n 2. Be sure you're in the project you want to create the experiment in. \n 3. Under **Name** a list of cluster IDs appears.\n\n 3. Select the **Aggregation** method to view the metrics. That is, you can choose to view unaggregated metrics, which show each Ray process's CPU utilization: \n\n\u003cbr /\u003e\n\n### **GCM** dashboard\n\n\nTo import a Grafana dashboard for Ray on Vertex AI follow the guidelines on the\ncloud monitoring dashboard,\n[Import your own grafana dashboard](https://cloud.google.com/monitoring/dashboards/import-grafana-dashboards).\n\n\nAll you need is a Grafana dashboard JSON file. OSS Ray supports this\n[manual setup](https://docs.ray.io/en/releases-2.5.1/cluster/metrics.html?highlight=simplist#recommended-use-ray-dashboard-with-embedded-grafana-visualizations)\nby providing the default dashboard Grafana JSON file.\n\n\u003cbr /\u003e\n\n### Monitor metrics\n\nfrom user-owned Grafana\n\nIf you already have a Grafana server running, then there's also a way to export\nall the Ray cluster on Vertex AI Prometheus metrics to your existing\nGrafana server. To do so, follow the GMP\n[Query using Grafana](/stackdriver/docs/managed-prometheus/query#begin)\nguidance. This lets you add a new Grafana data source to your existing Grafana\nserver and use the data source syncer to sync the new Grafana Prometheus data\nsource to Ray on Vertex AI metrics.\n\nIt's important that you configure and authenticate the newly added Grafana\ndata source using the data source syncer. Follow the steps provided in\n[Configure and authenticate the Grafana data source](/stackdriver/docs/managed-prometheus/query#grafana-oauth).\n\nOnce synced, you can create and add any dashboard you need based on the\nRay on Vertex AI metrics.\n\nBy default, the Ray on Vertex AI metrics collections are enabled.\nHere's how to disable them using Vertex AI SDK for Python: \n\n```python\nvertex_ray.create_ray_cluster(..., enable_metrics_collection=False, ...)\n```\n\nDebug Ray clusters\n------------------\n\nTo debug Ray clusters, use the **Head node interactive shell**:\n**Note:** Only use the interactive shell for debugging purposes or other advanced operations not supported in other ways. It's **not recommended** for normal operations like running workloads. \n\n### Google Cloud console\n\n\nTo access the **Head node interactive shell**, do the following:\n\n1. In the Google Cloud console, go to the **Ray on Vertex AI** page. \n [Go to Ray on Vertex AI](https://console.cloud.google.com/vertex-ai/ray)\n2. Be sure you're in the correct project. \n3. Select the cluster you want to examine. **Basic info** section appears.\n4. In the **Access links** section, click the link for **Head node interactive shell**. The head node interactive shell appears.\n5. Follow the instructions outlined in [Monitor and debug training with an interactive shell](/vertex-ai/docs/training/monitor-debug-interactive-shell).\n\nWhat's next\n-----------\n\n- [Delete a Ray cluster](/vertex-ai/docs/open-source/ray-on-vertex-ai/delete-cluster)"]]