go get

operator for managing cassandra clusters

by pantheon-systems

v0.0.0-20181027044235-735de0001e9d (see all)License:MIT
go get

Cassandra Operator

This repository contains the cassandra cluster kubernetes operator. The operator consists of the CustomResourceDefinition (CRD) and a Kubernetes Controller. It is a work in progress.

The image is stored at:


We are using the 0.0.7 branch of the operator-sdk.

This sdk is under heavy development.

Current Capabilities

  • Create a single node empty cluster
  • Create a multi-node empty cluster
  • Scale up a single node and down a single node ** Does not call cassandra lifecycle at this time
  • Add ExternalSeeds to CRD to setup multi-dc
  • Delete a cluster that has been created with the operator Persistant Volumes (data disk) is retained and must be manually deleted The system does not currently decommission the cluster before deleting

Deploying the Operator

The operator comes in two parts. The Custom Resource Definition must be created first and is per cluster task. The operator service runs in a kube pod and needs to be deployed to each cluster as well. The pod will run in the kube-system namespace.

CustomResourceDefinition (CRD)

kubectl -n kube-system --context \<cluster> create -f deploy/crd.yaml

Operator Service

The operator must be deployed on any k8s cluster that is expected to run cassandra cluster resources. Rename the ./deploy/operator.yaml.example and rename it ./deploy/operator.yaml. Set the docker image tag for the operator service and run the following commands:

KUBE_CONTEXT=gke_pantheon-internal_us-central1-a_cluster-02 make deploy KUBE_CONTEXT=gke_pantheon-internal_us-central1-b_cluster-01 make deploy

This will create a running container of the cassandra operator image on the cluster and register the CRD as well.

NOTE: you can specify the release (image tag) using the $VERSION and $UNIQUE_TAG enviornment variables. Our images are tagged with v<version>-<unique-tag> (eg v0.0.1-20e37818-e3e2-4675-ab10-aa065045f753) where the unique tag is either a git commit hash or a circle ci workflow id.


The cassandra operator can automatically manage repair jobs. To enable this feature you must set the values for the v1alpha1.RepairPolicy:

apiVersion: ""
kind: "CassandraCluster"
  name: "example-application"
  size: 1 # ring size
    schedule: "22 6 * * 0,4"
    image: ""

NOTE: The schedule is specified in Cron format. See wikipedia

Multi-DC Deployment

When externalSeeds is set in the v1alphaCassandraCluster.ClusterSpec section of the custom resource, the cluster that is created will be created as a second datacenter of the clusters that the external seeds are members. The comma-seperated list of external seeds are appeneded to the seed list created for the new ring, and auto-bootstrap is disabled for the new node. Currently we only support single node second datacenter creation. The workaround is to scale up the new datacenter after initial creation and nodetool rebuild -- <name of other dc> is completed.

If the other datacenter is not in the same physical network as the new ring being constructed, in the yaml set:

enablePublicPodServices: true

The v1alpha1.cassandracluster Custom Resource

In the ./deploy directory you will find a sample.yaml file which contains a sample cassandra cluster setup.

API Documentation

The v1alpha1.CassandraCluster API documenation is located at: GoDoc Coming Soon

JVM Agents

The cassandra operator uses JMX agents to get information from the JVM about how cassandra is running. It is also used in some areas instead of nodetool. The JMX agents are also used to feed the metrics system for cassandra. There are two supported options for how this can be configured:

Jolokia Agent Attched

apiVersion: ""
kind: "CassandraCluster"
  name: "example-application"
    jvmAgent: "agent"
    jvmAgentConfigName: "<configmap name goes here>"

See Jolokia documenation here

Telegraf Agent Sidecar

apiVersion: ""
kind: "CassandraCluster"
  name: "example-application"
    jvmAgent: "sidecar"
    jvmAgentConfigName: "<configmap name goes here>"
Telegraf Configuration

A configmap should be created that has this file as the value and the key telegraf.conf

See Telegraf config documenation here

The Cassandra Docker Image

The image provided to the operator for the cassandra image (which can be sepecified in the CRD) should meet the following:

Exposed Ports

The following ports are expected to be exposed by the cassandra container:

  • 7000 - intra-node communication
  • 7001 - intra-node tls based communication
  • 7199 - JMX port
  • 9042 - CQL
  • 9160 - Thrift
  • 8778 - Metrics

Enviornment Variables

The operator will pass configuration options to cassandra on startup through enviornment variables. These should be used to populate values in the cassandra.yaml file:

  • CASSANDRA_DC: Name of datacenter, if not set lets snitch set the DC name
  • POD_NAMESPACE: From the downward API passing in the namespace of the pod (metadata.namespace)
  • POD_IP: From the downward API passing in the pod private IP address (status.podIP)
  • CASSANDRA_CLUSTER_NAME: Name of the cluster
  • SERVICE_NAME: Name of the public service used as the LB for CQL/Thrift access
  • CASSANDRA_ALLOCATE_TOKENS_FOR_KEYSPACE: Name of the keyspace to create on startup (defaults to cluster name)
  • CASSANDRA_MAX_HEAP: Maximum heap size for the JVM
  • CASSANDRA_MIN_HEAP: Minimum head size for the JVM
  • CASSANDRA_SEEDS: Comma seperated seed list for the ring
  • CASSANDRA_AUTO_BOOTSTRAP: Boolean if the node should auto-bootstrap from the rest of the cluster on startup


The certificates that cassandra uses should be in a secret called test-cluster-cassandra-certs where test-cluster is the name of the cluster specified in the CRD. These certificates in the secret will be attached to the container at a volume at the /keystore mount path.


Depending on which JVM agent you choose you will need to provide a configuration. The configuration should be stored as a configmap resource in kube. The default name for the configmap is test-cluster-prometheus-jvm-agent-config where test-cluster is the cluster name.

If you choose the JvmAgent is sidecar then the telegraf sidecar container will have this configmap mounted at the /telegraf-config mount point if JvmAgent is default or set to jvm then the jolokia sidecar is used and mounted in the primary cassandra container at the /jvm-agent mount point.

Version Taint

Developers can run multiple operators in a single kubernetes cluster and not cross paths by using the version-taint command line option to the operator executable. Use --version-taint=<something unique here> to enable it, this will flag your clusters with a version tag and sets your operator to only operate on your clusters.

The annotation for the managing operator version is

Feature Flag

Feature flags have been implemented using annotations on the objects that they toggle features on.

Available Feature Flags

  • disable-pod-finalizer disables finalizers on the pods representing cassandra nodes (added to corev1.Pod)

GitHub Stars



4yrs ago








3yrs ago
No alternatives found
No tutorials found
Add a tutorial