[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-02。"],[],[],null,["# TPU v5e\n=======\n\nThis document describes the architecture and supported configurations of\nCloud TPU v5e.\n\nTPU v5e supports single and multi-host training and single-host inference.\nMulti-host inference is supported using [Sax](https://github.com/google/saxml).\nFor more information, see [Cloud TPU inference](/tpu/docs/tpu-inference).\n\nSystem architecture\n-------------------\n\nEach v5e chip contains one TensorCore. Each TensorCore has four matrix-multiply\nunits (MXUs), a vector unit, and a scalar unit.\n\nThe following diagram illustrates a TPU v5e chip.\n\nThe following table shows the key chip specifications and their values for v5e.\n\nThe following table shows Pod specifications and their values for v5e.\n\nConfigurations\n--------------\n\nCloud TPU v5e is a combined training and inference (serving) product. To\ndifferentiate between a training and an inference environment, use the\n`AcceleratorType` parameter with the TPU API or the `--machine-type` flag [when\ncreating a Google Kubernetes Engine (GKE) node\npool](/kubernetes-engine/docs/how-to/tpus#create-node-pool).\n\nTraining jobs are optimized for throughput and availability, while serving jobs\nare optimized for latency. A training job on TPUs provisioned for serving\ncould have lower availability and similarly, a serving job executed on TPUs\nprovisioned for training could have higher latency.\n\nYou use `AcceleratorType` to specify the number of TensorCores you want to use.\nYou specify the `AcceleratorType` when creating a TPU using the\ngcloud CLI or the [Google Cloud console](https://console.cloud.google.com/). The value you\nspecify for `AcceleratorType` is a string with the format:\n`v$VERSION_NUMBER-$CHIP_COUNT`.\n\nThe following 2D slice shapes are supported for v5e:\n\n### VM types\n\nEach TPU VM in a v5e TPU slice contains 1, 4 or 8 chips. In 4-chip and smaller\nslices, all TPU chips share the same non-uniform memory access (NUMA) node.\n\nFor 8-chip v5e TPU VMs, CPU-TPU communication will be more efficient within NUMA\npartitions. For example, in the following figure, `CPU0-Chip0` communication will\nbe faster than `CPU0-Chip4` communication.\n\nThe following table shows a comparison of TPU v5e VM types:\n\n### Cloud TPU v5e types for serving\n\nSingle-host serving is supported for up to 8 v5e chips. The following\nconfigurations are supported: 1x1, 2x2 and 2x4 slices. Each slice has 1, 4 and\n8 chips respectively.\n\nTo provision TPUs for a serving job, use one of the following accelerator types\nin your CLI or API TPU creation request:\n\nThe following command creates a v5e TPU slice with 8 v5e chips for serving: \n\n```bash\n$ gcloud compute tpus tpu-vm create your-tpu-name \\\n --zone=us-central1-a \\\n --accelerator-type=v5litepod-8 \\\n --version=v2-alpha-tpuv5-lite\n```\n\nFor more information about managing TPUs, see [Manage TPUs](/tpu/docs/managing-tpus-tpu-vm).\nFor more information about the system architecture of Cloud TPU, see\n[System architecture](/tpu/docs/system-architecture).\n\nServing on more than 8 v5e chips, also called multi-host serving, is supported\nusing [Sax](https://github.com/google/saxml). For more information, see\n[Cloud TPU inference](/tpu/docs/tpu-inference).\n\n### Cloud TPU v5e types for training\n\nTraining is supported for up to 256 chips.\n\nTo provision TPUs for a v5e training job, use one of the following accelerator\ntypes in your CLI or API TPU creation request:\n\nThe following command creates a v5e TPU slice with 256 v5e chips for training: \n\n```bash\n$ gcloud compute tpus tpu-vm create your-tpu-name \\\n --zone=us-east5-a \\\n --accelerator-type=v5litepod-256 \\\n --version=v2-alpha-tpuv5-lite\n```\n\nFor more information about managing TPUs, see [Manage TPUs](/tpu/docs/managing-tpus-tpu-vm).\nFor more information about the system architecture of Cloud TPU, see\n[System architecture](/tpu/docs/system-architecture)."]]