2.3.x release versions

Component	2.3.10-debian12/ -ubuntu22/ -ubuntu22-arm/ -ml-ubuntu22/ -rocky9 2025/08/29	2.3.9-debian12/ -ubuntu22/ -ubuntu22-arm/ -ml-ubuntu22/ -rocky9 2025/08/19	2.3.8-debian12/ -ubuntu22/ -ubuntu22-arm/ -ml-ubuntu22/ -rocky9 2025/08/12	2.3.7-debian12/ -ubuntu22/ -ubuntu22-arm/ -ml-ubuntu22/ -rocky9 2025/07/25	2.3.6-debian12/ -ubuntu22/ -ml-ubuntu22/ -rocky9 2025/07/15
Apache Atlas ^{initialization action}	2.2.0	2.2.0	2.2.0	2.2.0	2.2.0
Apache Flink ^{optional component}	1.17.0	1.17.0	1.17.0	1.17.0	1.17.0
Apache Hadoop ^installed	3.3.6	3.3.6	3.3.6	3.3.6	3.3.6
Apache Hive ^installed	3.1.3	3.1.3	3.1.3	3.1.3	3.1.3
Apache Hive WebHCat ^{optional component}	3.1.3	3.1.3	3.1.3	3.1.3	3.1.3
Apache Hudi ^{optional component}	0.15.0	0.15.0	0.15.0	0.15.0	0.15.0
Apache Iceberg ^{optional component}	1.6.1	1.6.1	1.6.1	1.6.1	1.6.1
Apache Kafka ^{initialization action}	3.1.0	3.1.0	3.1.0	3.1.0	3.1.0
Apache Pig ^{optional component}	0.18.0-SNAPSHOT	0.18.0-SNAPSHOT	0.18.0-SNAPSHOT	0.18.0-SNAPSHOT	0.18.0-SNAPSHOT
Apache Spark ^installed	3.5.3	3.5.3	3.5.3	3.5.3	3.5.3
Apache Sqoop ^{initialization action}	1.5.0-SNAPSHOT	1.5.0-SNAPSHOT	1.5.0-SNAPSHOT	1.5.0-SNAPSHOT	1.5.0-SNAPSHOT
Apache Tez ^installed	0.10.2	0.10.2	0.10.2	0.10.2	0.10.2
BigQuery Connector ^installed	0.42.3	0.42.3	0.42.3	0.42.3	0.42.3
Cloud Storage Connector ^installed	3.1.0	3.1.0	3.1.0	3.1.0	3.1.0
Conscrypt ^installed	2.5.2	2.5.2	2.5.2	2.5.2	2.5.2
Delta Lake ^{optional component}	3.2.0	3.2.0	3.2.0	3.2.0	3.2.0
Docker ^{optional component}	28.1	28.1	28.1	28.1	28.1
Hue ^{initialization action}	4.11.0	4.11.0	4.11.0	4.11.0	4.11.0
Java ^installed	11	11	11	11	11
JupyterLab Notebook ^{optional component}	3.6	3.6	3.6	3.6	3.6
Oozie ^{initialization action}	5.2.1	5.2.1	5.2.1	5.2.1	5.2.1
Python ^installed	micromamba 2.0.5 with Python 3.11	micromamba 2.0.5 with Python 3.11	micromamba 2.0.5 with Python 3.11	micromamba 2.0.5 with Python 3.11	micromamba 2.0.5 with Python 3.11
R ^installed	R 4.3	R 4.3	R 4.3	R 4.3	R 4.3
Ranger ^{optional component}	2.4.0	2.4.0	2.4.0	2.4.0	2.4.0
Scala ^installed	2.12.18	2.12.18	2.12.18	2.12.18	2.12.18
Solr ^{optional component}	9.4.1	9.4.1	9.4.1	9.4.1	9.4.1
Trino ^{optional component}	432	432	432	432	432
Zeppelin Notebook ^{optional component}	0.10.1	0.10.1	0.10.1	0.10.1	0.10.1
Zookeeper ^{optional component}	3.9.3	3.9.3	3.9.3	3.9.3	3.9.3

Important changes in 2.3:

Version 2.3 is a lightweight image that contains only core components, reducing exposure to Common Vulnerabilities and Exposures (CVEs). For higher security compliance requirements, use the image version 2.3or later, when creating a Dataproc cluster.
If you choose to install optional components when creating a Dataproc cluster with 2.3 image, they will be downloaded and installed during cluster creation. This might increase the cluster startup time. To avoid this delay, you can create a custom image with the optional components pre-installed. This is achieved by running generate_custom_image.py with the --optional-components flag.
Note:
You must specify the optional components that you want to install when you create the cluster. For more information, see Add optional components.
The following example shows the Google Cloud CLI command for creating a cluster with optional components:
```
gcloud dataproc clusters create CLUSTER_NAME
    --optional-components=COMPONENT_NAME \
    ... other flags
```

Notes:

The following are the optional components in 2.3 images:
- Apache Flink
- Apache Hive WebHCat
- Apache Hudi
- Apache Iceberg
- Apache Pig
- Delta Lake
- Docker
- JupyterLab Notebook
- Ranger
- Solr
- Zeppelin Notebook
- Zookeeper
yarn.nodemanager.recovery.enabled and HDFS Audit Logging are enabled by default in 2.3 images.
micromamba, instead of conda in previous image versions, is installed as part of the Python installation.
2.3.x-*-arm images support only the pre-installed components and the following optional components. The other 2.3 optional components and all initialization actions aren't supported:
- Apache Hive WebHCat
- Docker
- Zeppelin
- Zookeeper (installed in high availability clusters; optional component in other clusters)
Docker and Zeppelin installation issues:
- Installation fails if the cluster has no public internet access. As a workaround, create a cluster that uses a custom image with optional components pre-installed. You can do this by running generate_custom_image.py with the --optional-components flag.
- Installation can fail if the cluster is pinned to an older sub-minor image version: Packages are installed on demand from public OSS repositories, and a package might not be available upstream to support the installation. As a workaround, create a cluster that uses a custom image with optional components pre-installed in the custom image. To do this, run generate_custom_image.py with the --optional-components flag.

Image version 2.3 machine learning (ML) components

The Dataproc 2.3-ml-ubuntu image extends the 2.3 base image with ML-specific software. It supports 2.3 image optional components and other 2.3 features, and adds the component versions listed in the following sections.

GPU-specific libraries

For Dataproc jobs that use GPU VMs, the following NVIDIA driver and libraries are available in the 2.3-ml-ubuntu image. You can use them to accomplish the following tasks:

Accelerate Spark batch workloads with the NVIDIA Spark Rapids library
Train machine learning workloads
Run distributed batch inference using Spark

Package Name	Version
Spark Rapids	25.04.0
NVIDIA Driver	Ubuntu 22.04 LTS Accelerated with NVIDIA driver version 570
CUDA	12.6.3
cublas	12.6.4
cusolver	11.7.1
cupti	12.6.80
cusparse	12.5.4
cuDNN	9.10.1
NCCL	2.27.5

XGBoost libraries

The following Maven package versions are available in 2.3-ml-ubuntu image to let you use XGBoost with Spark in Java or Scala.

Group ID	Package Name	Version
ml.dmlc	xgboost4j-gpu_2.12	2.1.1
ml.dmlc	xgboost4j-spark-gpu_2.12	2.1.1

Python libraries

The 2.3-ml-ubuntu image contains the following libraries, which support different stages in the ML lifecycle.

🌐 2.3.x release versions | Dataproc Documentation | Google Cloud`2.3-ml-ubuntu` image Python libraries`2.3-ml-ubuntu` image R libraries - cloud.google.com

Package	Version
accelerate	1.8.1
conda	23.11.0
cookiecutter	2.5.0
curl	8.12.1
cython	3.0.12
dask	2023.12.1
datasets	3.6.0
deepspeed	0.17.2
delta-spark	3.2.0
evaluate	0.4.5
fastavro	1.9.7
fastparquet	2023.10.1
fiona	1.10.0
gateway-provisioners[yarn]	0.4.0
gcsfs	2023.12.2.post1
google-auth-oauthlib	1.2.2
google-cloud-aiplatform	1.88.0
google-cloud-bigquery[pandas]	3.31.0
google-cloud-bigquery-storage	2.30.0
google-cloud-bigtable	2.30.1
google-cloud-container	2.56.1
google-cloud-datacatalog	3.26.1
google-cloud-dataproc	5.18.1
google-cloud-datastore	2.21.0
google-cloud-language	2.17.2
google-cloud-logging	3.11.4
google-cloud-monitoring	2.27.2
google-cloud-pubsub	2.29.1
google-cloud-redis	2.18.1
google-cloud-spanner	3.53.0
google-cloud-speech	2.32.0
google-cloud-storage	2.19.0
google-cloud-texttospeech	2.25.1
google-cloud-translate	3.20.3
google-cloud-vision	3.10.2
huggingface_hub	0.33.1
httplib2	0.22.0
ipyparallel	8.6.1
ipython-sql	0.3.9
ipywidgets	8.1.7
jupyter_contrib_nbextensions	0.7.0
jupyter_http_over_ws	0.0.8
jupyter_kernel_gateway	2.5.2
jupyter_server	1.24.0
jupyterhub	4.1.6
jupyterlab	3.6.8
jupyterlab-git	0.44.0
jupyterlab_widgets	3.0.15
koalas	0.22.0
langchain	0.3.26
lightgbm	4.6.0
markdown	3.5.2
matplotlib	3.8.4
mlflow	3.1.1
nbconvert	7.14.2
nbdime	3.2.1
nltk	3.9.1
notebook	6.5.7
numba	0.58.1
numpy	1.26.4
oauth2client	4.1.3
onnx	1.17.0
openblas	0.3.25
opencv	4.11.0
orc	2.1.1
pandas	2.1.4
pandas-profiling	3.0.0
papermill	2.4.0
pyarrow	16.1.0
pydot	2.0.0
pyhive	0.7.0
pynvml	12.0.0
pysal	23.7
pytables	3.9.2
python	3.11
regex	2023.12.25
requests	2.32.2
requests-kerberos	0.12.0
rtree	1.1.0
scikit-image	0.22.0
scikit-learn	1.5.2
scipy	1.11.4
seaborn	0.13.2
sentence-transformers	5.0.0
setuptools	79.0.1
shap	0.48.0
shapely	2.1.1
spacy	3.8.7
spark-tensorflow-distributor	1.0.0
spyder	5.5.6
sqlalchemy	2.0.41
sympy	1.13.3
tensorflow	2.18.0
tokenizers	0.21.4.dev0
toree	0.5.0
torch	2.6.0
torch-model-archiver	0.11.1
torcheval	0.0.7
tornado	6.4.2
torchvision	0.21.0
traitlets	5.14.3
transformers	4.53.1
uritemplate	4.1.1
virtualenv	20.26.6
wordcloud	1.9.4
xgboost	2.1.4

R libraries

The following R library versions are included in 2.3-ml-ubuntu image.