[[["わかりやすい","easyToUnderstand","thumb-up"],["問題の解決に役立った","solvedMyProblem","thumb-up"],["その他","otherUp","thumb-up"]],[["わかりにくい","hardToUnderstand","thumb-down"],["情報またはサンプルコードが不正確","incorrectInformationOrSampleCode","thumb-down"],["必要な情報 / サンプルがない","missingTheInformationSamplesINeed","thumb-down"],["翻訳に関する問題","translationIssue","thumb-down"],["その他","otherDown","thumb-down"]],["最終更新日 2025-09-04 UTC。"],[],[],null,["# Run PyTorch code on TPU slices\n==============================\n\nBefore running the commands in this document, make sure you have followed the\ninstructions in [Set up an account and Cloud TPU project](/tpu/docs/setup-gcp-account).\n\nAfter you have your PyTorch code running on a single TPU VM, you can scale up\nyour code by running it on a [TPU slice](/tpu/docs/system-architecture-tpu-vm#slices).\nTPU slices are multiple TPU boards connected to each other over dedicated\nhigh-speed network connections. This document is an introduction to running\nPyTorch code on TPU slices.\n\nCreate a Cloud TPU slice\n------------------------\n\n1. Define some environment variables to make the commands easier to use.\n\n\n ```bash\n export PROJECT_ID=your-project-id\n export TPU_NAME=your-tpu-name\n export ZONE=europe-west4-b\n export ACCELERATOR_TYPE=v5p-32\n export RUNTIME_VERSION=v2-alpha-tpuv5\n ``` \n\n #### Environment variable descriptions\n\n \u003cbr /\u003e\n\n2. Create your TPU VM by running the following command:\n\n ```bash\n $ gcloud compute tpus tpu-vm create ${TPU_NAME} \\\n --zone=${ZONE} \\\n --project=${PROJECT_ID} \\\n --accelerator-type=${ACCELERATOR_TYPE} \\\n --version=${RUNTIME_VERSION}\n ```\n\nInstall PyTorch/XLA on your slice\n---------------------------------\n\nAfter creating the TPU slice, you must install PyTorch on all hosts in the\nTPU slice. You can do this using the `gcloud compute tpus tpu-vm ssh` command using\nthe `--worker=all` and `--commamnd` parameters.\n\nIf the following commands fail due to an SSH connection error, it might be\nbecause the TPU VMs don't have external IP addresses. To access a TPU VM without\nan external IP address, follow the instructions in [Connect to a TPU VM without\na public IP address](/tpu/docs/tpu-iap).\n\n1. Install PyTorch/XLA on all TPU VM workers:\n\n ```bash\n gcloud compute tpus tpu-vm ssh ${TPU_NAME} \\\n --zone=${ZONE} \\\n --project=${PROJECT_ID} \\\n --worker=all \\\n --command=\"pip install torch~=2.5.0 torch_xla[tpu]~=2.5.0 torchvision -f https://storage.googleapis.com/libtpu-releases/index.html\"\n ```\n2. Clone XLA on all TPU VM workers:\n\n ```bash\n gcloud compute tpus tpu-vm ssh ${TPU_NAME} \\\n --zone=${ZONE} \\\n --project=${PROJECT_ID} \\\n --worker=all \\\n --command=\"git clone https://github.com/pytorch/xla.git\"\n ```\n\nRun a training script on your TPU slice\n---------------------------------------\n\nRun the training script on all workers. The training script uses a Single Program\nMultiple Data (SPMD) sharding strategy. For more information on SPMD, see\n[PyTorch/XLA SPMD User Guide](https://pytorch.org/xla/release/r2.4/spmd.html). \n\n```bash\ngcloud compute tpus tpu-vm ssh ${TPU_NAME} \\\n --zone=${ZONE} \\\n --project=${PROJECT_ID} \\\n --worker=all \\\n --command=\"PJRT_DEVICE=TPU python3 ~/xla/test/spmd/test_train_spmd_imagenet.py \\\n --fake_data \\\n --model=resnet50 \\\n --num_epochs=1 2\u003e&1 | tee ~/logs.txt\"\n```\n\nThe training takes about 15 minutes. When it completes, you should see a message\nsimilar to the following: \n\n```\nEpoch 1 test end 23:49:15, Accuracy=100.00\n 10.164.0.11 [0] Max Accuracy: 100.00%\n```\n\nClean up\n--------\n\nWhen you are done with your TPU VM, follow these steps to clean up your resources.\n\n1. Disconnect from the Cloud TPU instance, if you have not already\n done so:\n\n ```bash\n (vm)$ exit\n ```\n\n Your prompt should now be `username@projectname`, showing you are in the\n Cloud Shell.\n2. Delete your Cloud TPU resources.\n\n ```bash\n $ gcloud compute tpus tpu-vm delete \\\n --zone=${ZONE}\n ```\n3. Verify the resources have been deleted by running `gcloud compute tpus tpu-vm list`. The\n deletion might take several minutes. The output from the following command\n shouldn't include any of the resources created in this tutorial:\n\n ```bash\n $ gcloud compute tpus tpu-vm list --zone=${ZONE}\n ```"]]