[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2024-12-06 (世界標準時間)。"],[[["\u003cp\u003eElasticity allows systems to dynamically adjust resources based on workload changes, enabling independent scaling of different components for improved performance and cost efficiency.\u003c/p\u003e\n"],["\u003cp\u003eSystems should be designed to scale both horizontally by adding replica nodes and vertically by increasing capacity, memory, and storage, in response to increases and decreases in load.\u003c/p\u003e\n"],["\u003cp\u003ePlanning for peak load periods involves manually scaling systems ahead of known high-traffic events, and using autoscaling for unexpected surges, triggered by metrics like CPU utilization.\u003c/p\u003e\n"],["\u003cp\u003ePredictive autoscaling can be used in Compute Engine, which forecasts future load based on historical trends and adapts rapidly to recent changes in load, improving responsiveness.\u003c/p\u003e\n"],["\u003cp\u003eImplementing serverless architectures, such as Cloud Run, BigQuery, and Spanner, provides inherent elasticity with instant autoscaling that can scale down to zero resources, and using Autopilot mode in GKE provides automation and scalability by default.\u003c/p\u003e\n"]]],[],null,["# Take advantage of elasticity\n\nThis principle in the performance optimization pillar of the\n[Google Cloud Well-Architected Framework](/architecture/framework)\nprovides recommendations to help you incorporate elasticity, which is the ability\nto adjust resources dynamically based on changes in workload requirements.\n\nElasticity allows different components of\na system to scale independently. This targeted scaling can help improve performance and\ncost efficiency by allocating resources precisely where they're needed, without\nover provisioning or under provisioning your resources.\n\nPrinciple overview\n------------------\n\nThe performance requirements of a system directly influence when and how the\nsystem scales vertically or scales horizontally. You need to evaluate the system's\ncapacity and determine the load that the system is expected to handle at baseline.\nThen, you need to determine how you want the system to respond to increases and decreases\nin the load.\n\nWhen the load increases, the system must scale out horizontally, scale up\nvertically, or both. For horizontal scaling, add replica nodes to ensure that\nthe system has sufficient overall capacity to fulfill the increased demand. For\nvertical scaling, replace the application's existing components with components\nthat contain more capacity, more memory, and more storage.\n\nWhen the load decreases, the system must scale down (horizontally, vertically,\nor both).\n\nDefine the *circumstances* in which the system scales up or scales down. Plan to\nmanually scale up systems for known periods of high traffic. Use tools like\nautoscaling, which responds to increases or decreases in the load.\n\nRecommendations\n---------------\n\nTo take advantage of elasticity, consider the recommendations in the following\nsections.\n\n### Plan for peak load periods\n\nYou need to plan an efficient scaling path for known events, such as expected\nperiods of increased customer demand.\n\nConsider scaling up your system ahead of known periods of high traffic. For\nexample, if you're a retail organization, you expect demand to increase during\nseasonal sales. We recommend that you manually scale up or scale out your systems before\nthose sales to ensure that your system can immediately handle the increased load\nor immediately adjust existing limits. Otherwise, the system might take several minutes to\nadd resources in response to real-time changes. Your application's capacity\nmight not increase quickly enough and cause some users to experience delays.\n\nFor unknown or unexpected events, such as a sudden surge in demand or traffic,\nyou can use autoscaling features to trigger elastic scaling that's based on\nmetrics. These metrics can include CPU utilization, load balancer serving\ncapacity, latency, and even custom metrics that you define in\n[Cloud Monitoring](/monitoring/docs/monitoring-overview).\n\nFor example, consider an application that runs on a\n[Compute Engine](/compute/docs/overview)\nmanaged instance group (MIG). This application has a requirement that each instance performs\noptimally until the average CPU utilization reaches 75%. In this example, you\nmight define an\n[autoscaling policy](/compute/docs/autoscaler#autoscaling_policy)\nthat creates more instances when the CPU utilization reaches the threshold.\nThese newly-created instances help absorb the load, which helps ensure that the average\nCPU utilization remains at an optimal rate until the maximum number of instances\nthat you've configured for the MIG is reached. When the demand decreases, the\nautoscaling policy removes the instances that are no longer needed.\n\nPlan\n[resource slot reservations in BigQuery](/bigquery/docs/reservations-intro#reservations)\nor adjust the limits for autoscaling configurations in Spanner by using the\n[managed autoscaler](/spanner/docs/managed-autoscaler).\n\n### Use predictive scaling\n\nIf your system components include Compute Engine, you must evaluate whether\n[predictive autoscaling](/compute/docs/autoscaler#predictive_mode)\nis suitable for your workload. Predictive autoscaling forecasts the future load\nbased on your metrics' historical trends---for example, CPU utilization.\nForecasts are recomputed every few minutes, so the autoscaler rapidly adapts its\nforecast to very recent changes in load. Without predictive autoscaling, an\nautoscaler can only scale a group reactively, based on observed real-time changes\nin load. Predictive autoscaling works with both real-time data and\nhistorical data to respond to both the current and the forecasted load.\n\n### Implement serverless architectures\n\nConsider implementing a serverless architecture with serverless services that\nare inherently elastic, such as the following:\n\n- [Cloud Run](/run/docs)\n- [Cloud Run functions](/functions/docs)\n- [BigQuery](/bigquery/docs)\n- [Spanner](/spanner/docs)\n- [Eventarc](/eventarc/docs)\n- [Workflows](/workflows/docs)\n- [Pub/Sub](/pubsub/docs)\n\nUnlike autoscaling in other services that require fine-tuning rules (for\nexample, Compute Engine), serverless autoscaling is instant and can\nscale down to zero resources.\n\n### Use Autopilot mode for Kubernetes\n\nFor complex applications that require greater control over Kubernetes, consider\n[Autopilot mode in Google Kubernetes Engine (GKE)](/kubernetes-engine/docs/concepts/autopilot-overview).\nAutopilot mode provides automation and scalability by default.\nGKE automatically scales nodes and resources based on\ntraffic. GKE manages nodes, creates new nodes for your applications, and\nconfigures automatic upgrades and repairs."]]