监控资源利用率:使用资源利用率指标来找出未充分利用的 Google Cloud 资源。例如,使用 CPU 和内存利用率等指标来识别空闲的虚拟机资源。对于 Google Kubernetes Engine (GKE),您可以查看详细的费用明细和与费用相关的优化指标。对于 Google Cloud VMware Engine,请查看资源利用率,以优化 CUD、存储空间用量和 ESXi 调整大小。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2024-09-25。"],[[["\u003cp\u003eContinuously monitoring and analyzing your cloud environment is essential for optimizing costs and adapting to changing business needs, which includes focusing on key performance indicators that impact end-users and align with business goals.\u003c/p\u003e\n"],["\u003cp\u003eObservability tools are critical for identifying underutilized resources and bottlenecks, utilizing resources like Active Assist to receive actionable recommendations for reducing costs, improving performance, and increasing sustainability.\u003c/p\u003e\n"],["\u003cp\u003eBalancing detailed data collection for troubleshooting with cost considerations requires using sampling and aggregation techniques, and regularly reviewing monitoring configurations to avoid excessive data storage.\u003c/p\u003e\n"],["\u003cp\u003eTailoring data collection to specific roles, such as developers and IT administrators, and defining role-specific data retention policies can reduce unnecessary storage costs and improve data relevance.\u003c/p\u003e\n"],["\u003cp\u003eImplementing smart alerting that prioritizes issues affecting customers, tunes for temporary problems, and uses notification channels effectively helps to ensure timely issue resolution without overwhelming teams.\u003c/p\u003e\n"]]],[],null,["# Optimize continuously\n\nThis principle in the cost optimization pillar of the [Google Cloud Well-Architected Framework](/architecture/framework)\nprovides recommendations to help you optimize the cost of your cloud deployments\nbased on constantly changing and evolving business goals.\n\nAs your business grows and evolves, your cloud workloads need to adapt to changes\nin resource requirements and usage patterns. To derive maximum value from your\ncloud spending, you must maintain cost-efficiency while continuing to support\nbusiness objectives. This requires a proactive and adaptive approach that focuses\non continuous improvement and optimization.\n\nPrinciple overview\n------------------\n\nTo optimize cost continuously, you must proactively monitor and analyze your\ncloud environment and make suitable adjustments to meet current requirements.\nFocus your monitoring efforts on key performance indicators (KPIs) that directly\naffect your end users' experience, align with your business goals, and provide\ninsights for continuous improvement. This approach lets you identify and address\ninefficiencies, adapt to changing needs, and continuously align cloud spending\nwith strategic business goals. To balance comprehensive observability with cost\neffectiveness, understand the costs and benefits of monitoring resource usage\nand use appropriate process-improvement and optimization strategies.\n\nRecommendations\n---------------\n\nTo effectively monitor your Google Cloud environment and optimize cost\ncontinuously, consider the following recommendations.\n\n### Focus on business-relevant metrics\n\nEffective monitoring starts with identifying the metrics that are most important\nfor your business and customers. These metrics include the following:\n\n- **User experience metrics**: Latency, error rates, throughput, and customer satisfaction metrics are useful for understanding your end users' experience when using your applications.\n- **Business outcome metrics**: Revenue, customer growth, and engagement can be correlated with resource usage to identify opportunities for cost optimization.\n- **[DevOps Research \\& Assessment (DORA)](https://dora.dev) metrics**: Metrics like deployment frequency, lead time for changes, change failure rate, and time to restore provide insights into the efficiency and reliability of your software delivery process. By improving these metrics, you can increase productivity, reduce downtime, and optimize cost.\n- **[Site Reliability Engineering (SRE)](https://sre.google) metrics**: Error budgets help teams to quantify and manage the acceptable level of service disruption. By establishing clear expectations for reliability, error budgets empower teams to innovate and deploy changes more confidently, knowing their safety margin. This proactive approach promotes a balance between innovation and stability, helping prevent excessive operational costs associated with major outages or prolonged downtime.\n\n### Use observability for resource optimization\n\nThe following are recommendations to use observability to identify resource\nbottlenecks and underutilized resources in your cloud deployments:\n\n- **Monitor resource utilization** : Use resource utilization metrics to identify Google Cloud resources that are underutilized. For example, use metrics like CPU and memory utilization to identify [idle VM resources](/monitoring/agent/process-metrics#view_performance_metrics_for_top_resource-consuming_vms). For Google Kubernetes Engine (GKE), you can view a detailed [breakdown of costs](/kubernetes-engine/docs/how-to/cost-allocations) and [cost-related optimization metrics](/kubernetes-engine/docs/how-to/cost-optimization-metrics). For Google Cloud VMware Engine, [review resource utilization](https://cloud.google.com/blog/topics/cost-management/cost-optimization-of-google-cloud-vmware-engine-deployments) to optimize CUDs, storage consumption, and ESXi right-sizing.\n- **Use cloud recommendations** : [Active Assist](/solutions/active-assist) is a portfolio of intelligent tools that help you optimize your cloud operations. These tools provide actionable recommendations to reduce costs, increase performance, improve security and even make sustainability-focused decisions. For example, [VM rightsizing insights](/compute/docs/instance-groups/apply-machine-type-recommendations-managed-instance-groups) can help to optimize resource allocation and avoid unnecessary spending.\n- **Correlate resource utilization with performance**: Analyze the relationship between resource utilization and application performance to determine whether you can downgrade to less expensive resources without affecting the user experience.\n\n### Balance troubleshooting needs with cost\n\nDetailed observability data can help with diagnosing and troubleshooting issues.\nHowever, storing excessive amounts of observability data or exporting unnecessary\ndata to external monitoring tools can lead to unnecessary costs. For efficient\ntroubleshooting, consider the following recommendations:\n\n- **Collect sufficient data for troubleshooting**: Ensure that your monitoring solution captures enough data to efficiently diagnose and resolve issues when they arise. This data might include logs, traces, and metrics at various levels of granularity.\n- **Use sampling and aggregation**: Balance the need for detailed data with cost considerations by using sampling and aggregation techniques. This approach lets you collect representative data without incurring excessive storage costs.\n- **Understand the pricing models of your monitoring tools and services**: Evaluate different monitoring solutions and choose options that align with your project's specific needs, budget, and usage patterns. Consider factors like data volume, retention requirements, and the required features when making your selection.\n- **Regularly review your monitoring configuration**: Avoid collecting excessive data by removing unnecessary metrics or logs.\n\n### Tailor data collection to roles and set role-specific retention policies\n\nConsider the specific data needs of different roles. For example, developers\nmight primarily need access to traces and application-level logs, whereas IT\nadministrators might focus on system logs and infrastructure metrics. By tailoring\ndata collection, you can reduce unnecessary storage costs and avoid overwhelming\nusers with irrelevant information.\n\nAdditionally, you can define retention policies based on the needs of each role\nand any regulatory requirements. For example, developers might need access to\ndetailed logs for a shorter period, while financial analysts might require\nlonger-term data.\n\n### Consider regulatory and compliance requirements\n\nIn certain industries, regulatory requirements mandate data retention. To avoid\nlegal and financial risks, you need to ensure that your monitoring and data\nretention practices help you adhere to relevant regulations. At the same time,\nyou need to maintain cost efficiency. Consider the following recommendations:\n\n- Determine the specific data retention requirements for your industry or region, and ensure that your monitoring strategy meets the requirements of those requirements.\n- Implement appropriate data archival and retrieval mechanisms to meet audit and compliance needs while minimizing storage costs.\n\n### Implement smart alerting\n\nAlerting helps to detect and resolve issues in a timely manner. However, a\nbalance is necessary between an approach that keeps you informed, and one that\noverwhelms you with notifications. By designing intelligent alerting systems,\nyou can prioritize critical issues that have higher business impact. Consider\nthe following recommendations:\n\n- **Prioritize issues that affect customers**: Design alerts that trigger rapidly for issues that directly affect the customer experience, like website outages, slow response times, or transaction failures.\n- **Tune for temporary problems**: Use appropriate thresholds and delay mechanisms to avoid unnecessary alerts for temporary problems or self-healing system issues that don't affect customers.\n- **Customize alert severity**: Ensure that the most urgent issues receive immediate attention by differentiating between critical and noncritical alerts.\n- **Use notification channels wisely**: Choose appropriate channels for alert notifications (email, SMS, or paging) based on the severity and urgency of the alerts."]]