You can also create a dedicated public endpoint and deploy a model to it by
using the Vertex AI API as follows:
Create a dedicated public endpoint.
Configuration of the inference timeout and request-response logging settings
is supported at the time of endpoint creation.
Get online inferences from a dedicated public endpoint
Dedicated endpoints support both HTTP and gRPC communication protocols. For gRPC
requests, the x-vertex-ai-endpoint-id header must be included for proper
endpoint identification. The following APIs are supported:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[],[],null,["# Use dedicated public endpoints for online inference\n\nA *dedicated public endpoint* is a public endpoint for online inference. It\noffers the following benefits:\n\n- **Dedicated networking**: When you send an inference request to a dedicated public endpoint, it is isolated from other users' traffic.\n- **Optimized network latency**\n- **Larger payload support**: Up to 10 MB.\n- **Longer request timeouts**: Configurable up to 1 hour.\n- **Generative AI-ready**: Streaming and gRPC are supported. Inference timeout is configurable up to 1 hour.\n\nFor these reasons, dedicated public endpoints are recommended as a best\npractice for serving Vertex AI online inferences.\n| **Note:** Tuned Gemini models can only be deployed to shared public endpoints.\n\nTo learn more, see\n[Choose an endpoint type](/vertex-ai/docs/predictions/choose-endpoint-type).\n\nCreate a dedicated public endpoint and deploy a model to it\n-----------------------------------------------------------\n\nYou can create a dedicated endpoint and deploy a model to it by using the\nGoogle Cloud console. For details, see\n[Deploy a model by using the Google Cloud console](/vertex-ai/docs/predictions/deploy-model-console).\n\nYou can also create a dedicated public endpoint and deploy a model to it by\nusing the Vertex AI API as follows:\n\n1. [Create a dedicated public endpoint](/vertex-ai/docs/predictions/create-public-endpoint). Configuration of the inference timeout and request-response logging settings is supported at the time of endpoint creation.\n2. [Deploy the model by using the Vertex AI API](/vertex-ai/docs/predictions/deploy-model-api).\n\nGet online inferences from a dedicated public endpoint\n------------------------------------------------------\n\nDedicated endpoints support both HTTP and gRPC communication protocols. For gRPC\nrequests, the x-vertex-ai-endpoint-id header must be included for proper\nendpoint identification. The following APIs are supported:\n\n- Predict\n- RawPredict\n- StreamRawPredict\n- Chat Completion (Model Garden only)\n\nYou can send online inference requests to a dedicated public endpoint by using\nthe Vertex AI SDK for Python. For details, see\n[Send an online inference request to a dedicated public endpoint](/vertex-ai/docs/predictions/get-online-predictions#inference-request-dedicated).\n\nTutorial\n--------\n\n| To learn more,\n| run the \"Vertex AI Model Garden - Gemma (Deployment)\" notebook in one of the following\n| environments:\n|\n| [Open in Colab](https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_gemma_deployment_on_vertex.ipynb)\n|\n|\n| \\|\n|\n| [Open in Colab Enterprise](https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fcommunity%2Fmodel_garden%2Fmodel_garden_gemma_deployment_on_vertex.ipynb)\n|\n|\n| \\|\n|\n| [Open\n| in Vertex AI Workbench](https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fcommunity%2Fmodel_garden%2Fmodel_garden_gemma_deployment_on_vertex.ipynb)\n|\n|\n| \\|\n|\n| [View on GitHub](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_gemma_deployment_on_vertex.ipynb)\n\nLimitations\n-----------\n\n- Deployment of tuned Gemini models isn't supported.\n- VPC Service Controls isn't supported. Use a Private Service Connect endpoint instead.\n\nWhat's next\n-----------\n\n- Learn about Vertex AI online inference [endpoint types](/vertex-ai/docs/predictions/choose-endpoint-type)."]]