This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Getting started

1: Introduction to Kubernetes operators
2: Bootstrapping and samples
3: Patterns and best practices

1 - Introduction to Kubernetes operators

What are Kubernetes Operators?

Kubernetes operators are software extensions that manage both cluster and non-cluster resources on behalf of Kubernetes. The Java Operator SDK (JOSDK) makes it easy to implement Kubernetes operators in Java, with APIs designed to feel natural to Java developers and framework handling of common problems so you can focus on your business logic.

Why Use Java Operator SDK?

JOSDK provides several key advantages:

Java-native APIs that feel familiar to Java developers
Automatic handling of common operator challenges (caching, event handling, retries)
Production-ready features like observability, metrics, and error handling
Simplified development so you can focus on business logic instead of Kubernetes complexities

Learning Resources

2 - Bootstrapping and samples

Creating a New Operator Project

Using the Maven Plugin

The simplest way to start a new operator project is using the provided Maven plugin, which generates a complete project skeleton:

mvn io.javaoperatorsdk:bootstrapper:[version]:create \
  -DprojectGroupId=org.acme \
  -DprojectArtifactId=getting-started

This command creates a new Maven project with:

A basic operator implementation
Maven configuration with required dependencies
Generated CustomResourceDefinition (CRD)

Building Your Project

Build the generated project with Maven:

mvn clean install

The build process automatically generates the CustomResourceDefinition YAML file that you’ll need to apply to your Kubernetes cluster.

Exploring Sample Operators

The sample-operators directory contains real-world examples demonstrating different JOSDK features and patterns:

Available Samples

webpage

Purpose: Creates NGINX webservers from Custom Resources containing HTML code
Key Features: Multiple implementation approaches using both low-level APIs and higher-level abstractions
Good for: Understanding basic operator concepts and API usage patterns

mysql-schema

Purpose: Manages database schemas in MySQL instances
Key Features: Demonstrates managing non-Kubernetes resources (external systems)
Good for: Learning how to integrate with external services and manage state outside Kubernetes

tomcat

Purpose: Manages Tomcat instances and web applications
Key Features: Multiple controllers managing related custom resources
Good for: Understanding complex operators with multiple resource types and relationships

Running the Samples

Prerequisites

The easiest way to try samples is using a local Kubernetes cluster:

Step-by-Step Instructions

Apply the CustomResourceDefinition:

kubectl apply -f target/classes/META-INF/fabric8/[resource-name]-v1.yml

Run the operator:
```
mvn exec:java -Dexec.mainClass="your.main.ClassName"
```
Or run your main class directly from your IDE.
Create custom resources: The operator will automatically detect and reconcile custom resources when you create them:
```
kubectl apply -f examples/sample-resource.yaml
```

Detailed Examples

For comprehensive setup instructions and examples, see:

MySQL Schema sample README
Individual sample directories for specific setup requirements

Next Steps

After exploring the samples:

Review the patterns and best practices guide
Learn about implementing reconcilers
Explore dependent resources and workflows for advanced use cases

3 - Patterns and best practices

This document describes patterns and best practices for building and running operators, and how to implement them using the Java Operator SDK (JOSDK).

See also best practices in the Operator SDK.

Implementing a Reconciler

Always Reconcile All Resources

Reconciliation can be triggered by events from multiple sources. It might be tempting to check the events and only reconcile the related resource or subset of resources that the controller manages. However, this is considered an anti-pattern for operators.

Why this is problematic:

Kubernetes’ distributed nature makes it difficult to ensure all events are received
If your operator misses some events and doesn’t reconcile the complete state, it might operate with incorrect assumptions about the cluster state
Always reconcile all resources, regardless of the triggering event

JOSDK makes this efficient by providing smart caches to avoid unnecessary Kubernetes API server access and ensuring your reconciler is triggered only when needed.

Since there’s industry consensus on this topic, JOSDK no longer provides event access from Reconciler implementations starting with version 2.

Event Sources and Caching

During reconciliation, best practice is to reconcile all dependent resources managed by the controller. This means comparing the desired state with the actual cluster state.

The Challenge: Reading the actual state directly from the Kubernetes API Server every time would create significant load.

The Solution: Create a watch for dependent resources and cache their latest state using the Informer pattern. In JOSDK, informers are wrapped into EventSource to integrate with the framework’s eventing system via the InformerEventSource class.

How it works:

New events trigger reconciliation only when the resource is already cached
Reconciler implementations compare desired state with cached observed state
If a resource isn’t in cache, it needs to be created
If actual state doesn’t match desired state, the resource needs updating

Idempotency

Since all resources should be reconciled when your Reconciler is triggered, and reconciliations can be triggered multiple times for any given resource (especially with retry policies), it’s crucial that Reconciler implementations be idempotent.

Idempotency means: The same observed state should always result in exactly the same outcome.

Key implications:

Operators should generally operate in a stateless fashion
Since operators usually manage declarative resources, ensuring idempotency is typically straightforward

Synchronous vs Asynchronous Resource Handling

Sometimes your reconciliation logic needs to wait for resources to reach their desired state (e.g., waiting for a Pod to become ready). You can approach this either synchronously or asynchronously.

Asynchronous Approach (Recommended)

Exit the reconciliation logic as soon as the Reconciler determines it cannot complete at this point. This frees resources to process other events.

Requirements: Set up adequate event sources to monitor state changes of all resources the operator waits for. When state changes occur, the Reconciler is triggered again and can finish processing.

Synchronous Approach

Periodically poll resources’ state until they reach the desired state. If done within the reconcile method, this blocks the current thread for potentially long periods.

Recommendation: Use the asynchronous approach for better resource utilization.

Why Use Automatic Retries?

Automatic retries are enabled by default and configurable. While you can deactivate this feature, we advise against it.

Why retries are important:

Transient network errors: Common in Kubernetes’ distributed environment, easily resolved with retries
Resource conflicts: When multiple actors modify resources simultaneously, conflicts can be resolved by reconciling again
Transparency: Automatic retries make error handling completely transparent when successful

Managing State

Thanks to Kubernetes resources’ declarative nature, operators dealing only with Kubernetes resources can operate statelessly. They don’t need to maintain resource state information since it should be possible to rebuild the complete resource state from its representation.

When State Management Becomes Necessary

This stateless approach typically breaks down when dealing with external resources. You might need to track external state for future reconciliations.

Anti-pattern: Putting state in the primary resource’s status sub-resource

Becomes difficult to manage with large amounts of state
Violates best practice: status should represent actual resource state, while spec represents desired state

Recommended approach: Store state in separate resources designed for this purpose:

Kubernetes Secret or ConfigMap
Dedicated Custom Resource with validated structure

Handling Informer Errors and Cache Sync Timeouts

You can configure whether the operator should stop when informer errors occur on startup.

Default Behavior

By default, if there’s a startup error (e.g., the informer lacks permissions to list target resources for primary or secondary resources), the operator stops immediately.

Alternative Configuration

Set the flag to false to start the operator even when some informers fail to start. In this case:

The operator continuously retries connection with exponential backoff
This applies both to startup failures and runtime problems
The operator only stops for fatal errors (currently when a resource cannot be deserialized)

Use case: When watching multiple namespaces, it’s better to start the operator so it can handle other namespaces while resolving permission issues in specific namespaces.

Cache Sync Timeout Impact

The stopOnInformerErrorDuringStartup setting affects cache sync timeout behavior:

If true: Operator stops on cache sync timeout
If false: After timeout, the controller starts reconciling resources even if some event source caches haven’t synced yet

Graceful Shutdown

You can provide sufficient time for the reconciler to process and complete ongoing events before shutting down. Simply set an appropriate duration value for reconciliationTerminationTimeout using ConfigurationServiceOverrider.

final var overridden = new ConfigurationServiceOverrider(config)
    .withReconciliationTerminationTimeout(Duration.ofSeconds(5));

final var operator = new Operator(overridden);