New to KubeDB? Please start here.
Elasticsearch Hot-Warm-Cold Cluster
Hot-warm-cold architectures are common for time series data such as logging or metrics and it also has various use cases too. For example, assume Elasticsearch is being used to aggregate log files from multiple systems. Logs from today are actively being indexed and this week’s logs are the most heavily searched (hot). Last week’s logs may be searched but not as much as the current week’s logs (warm). Last month’s logs may or may not be searched often, but are good to keep around just in case (cold).
Before You Begin
At first, you need to have a Kubernetes cluster, and the kubectl
command-line tool must be configured to communicate with your cluster. If you do not already have a cluster, you can create one by using kind.
Now, install the KubeDB operator in your cluster following the steps here.
To keep things isolated, this tutorial uses a separate namespace called demo
throughout this tutorial.
$ kubectl create namespace demo
namespace/demo created
$ kubectl get namespace
NAME STATUS AGE
demo Active 14s
Note: YAML files used in this tutorial are stored in here in GitHub repository kubedb/docs.
Find Available StorageClass
We will have to provide StorageClass
in Elasticsearch CR specification. Check available StorageClass
in your cluster using the following command,
$ kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
standard (default) rancher.io/local-path Delete WaitForFirstConsumer false 10m
linode-block-storage linodebs.csi.linode.com Delete Immediate true 10m
linode-block-storage-retain (default) linodebs.csi.linode.com Retain Immediate true 10m
Here, we use linode-block-storage
as StorageClass in this demo.
Create Elasticsearch Hot-Warm-Cold Cluster
We are going to create a Elasticsearch Hot-Warm-Cold cluster in topology mode. Our cluster will be consist of 2 master nodes, 2 ingest nodes, 1 data content node, 3 data hot nodes, 2 data warm node, and 2 data cold nodes. Here, we are using Elasticsearch version (xpack-8.11.1
) of ElasticStack distribution for this demo. To learn more about the Elasticsearch CR, visit here.
apiVersion: kubedb.com/v1
kind: Elasticsearch
metadata:
name: es-cluster
namespace: demo
spec:
enableSSL: true
version: xpack-8.11.1
topology:
master:
replicas: 2
storage:
resources:
requests:
storage: 1Gi
storageClassName: "linode-block-storage"
ingest:
replicas: 2
storage:
resources:
requests:
storage: 1Gi
storageClassName: "linode-block-storage"
dataContent:
replicas: 1
storage:
resources:
requests:
storage: 5Gi
storageClassName: "linode-block-storage"
dataHot:
replicas: 3
storage:
resources:
requests:
storage: 3Gi
storageClassName: "linode-block-storage"
dataWarm:
replicas: 2
storage:
resources:
requests:
storage: 5Gi
storageClassName: "linode-block-storage"
dataCold:
replicas: 2
storage:
resources:
requests:
storage: 5Gi
storageClassName: "linode-block-storage"
Here,
spec.version
- is the name of the ElasticsearchVersion CR. Here, we are using Elasticsearch versionxpack-8.11.1
of ElasticStack distribution.spec.enableSSL
- specifies whether the HTTP layer is secured with certificates or not.spec.storageType
- specifies the type of storage that will be used for Elasticsearch database. It can beDurable
orEphemeral
. The default value of this field isDurable
. IfEphemeral
is used then KubeDB will create the Elasticsearch database usingEmptyDir
volume. In this case, you don’t have to specifyspec.storage
field. This is useful for testing purposes.spec.topology
- specifies the node-specific properties for the Elasticsearch cluster.topology.master
- specifies the properties of master nodes.master.replicas
- specifies the number of master nodes.master.storage
- specifies the master node storage information that passed to the PetSet.
topology.dataContent
- specifies the properties of data content node.dataContent.replicas
- specifies the number of data content node.dataContent.storage
- specifies the data content node storage information that passed to the PetSet.
topology.ingest
- specifies the properties of ingest nodes.ingest.replicas
- specifies the number of ingest nodes.ingest.storage
- specifies the ingest node storage information that passed to the PetSet.
topology.dataHot
- specifies the properties of dataHot nodes.dataHot.replicas
- specifies the number of dataHot nodes.dataHot.storage
- specifies the dataHot node storage information that passed to the PetSet.
topology.dataWarm
- specifies the properties of dataWarm nodes.dataWarm.replicas
- specifies the number of dataWarm nodes.dataWarm.storage
- specifies the dataWarm node storage information that passed to the PetSet.
topology.dataCold
- specifies the properties of dataCold nodes.dataCold.replicas
- specifies the number of dataCold nodes.dataCold.storage
- specifies the dataCold node storage information that passed to the PetSet.
Here, we use
linode-block-storage
as storage for every node. But it is recommended to prioritize faster storage fordataHot
node thendataWarm
and finallydataCold
.
Let’s deploy the above example by the following command:
$ kubectl apply -f https://github.com/kubedb/docs/raw/v2024.11.18/docs/guides/elasticsearch/clustering/topology-cluster/hot-warm-cold-cluster/yamls/es-cluster.yaml
elasticsearch.kubedb.com/es-cluster created
KubeDB will create the necessary resources to deploy the Elasticsearch cluster according to the above specification. Let’s wait until the database to be ready to use,
$ watch kubectl get elasticsearch -n demo
NAME VERSION STATUS AGE
es-cluster xpack-8.11.1 Ready 2m48s
Here, Elasticsearch is in Ready
state. It means the database is ready to accept connections.
Describe the Elasticsearch object to observe the progress if something goes wrong or the status is not changing for a long period of time:
$ kubectl describe elasticsearch -n demo es-cluster
Name: es-cluster
Namespace: demo
Labels: <none>
Annotations: <none>
API Version: kubedb.com/v1
Kind: Elasticsearch
Metadata:
Creation Timestamp: 2022-03-14T06:33:20Z
Finalizers:
kubedb.com
Generation: 2
Resource Version: 20467655
UID: 236fd414-9d94-4fce-93d3-7891fcf7f6a4
Spec:
Auth Secret:
Name: es-cluster-elastic-cred
Enable SSL: true
Heap Size Percentage: 50
Kernel Settings:
Privileged: true
Sysctls:
Name: vm.max_map_count
Value: 262144
Pod Template:
Controller:
Metadata:
Spec:
Resources:
Service Account Name: es-cluster
Storage Type: Durable
Termination Policy: Delete
Tls:
Certificates:
Alias: ca
Private Key:
Encoding: PKCS8
Secret Name: es-cluster-ca-cert
Subject:
Organizations:
kubedb
Alias: transport
Private Key:
Encoding: PKCS8
Secret Name: es-cluster-transport-cert
Subject:
Organizations:
kubedb
Alias: http
Private Key:
Encoding: PKCS8
Secret Name: es-cluster-http-cert
Subject:
Organizations:
kubedb
Alias: archiver
Private Key:
Encoding: PKCS8
Secret Name: es-cluster-archiver-cert
Subject:
Organizations:
kubedb
Topology:
Data Cold:
Replicas: 2
Resources:
Limits:
Memory: 1Gi
Requests:
Cpu: 500m
Memory: 1Gi
Storage:
Resources:
Requests:
Storage: 5Gi
Storage Class Name: linode-block-storage
Suffix: data-cold
Data Content:
Replicas: 1
Resources:
Limits:
Memory: 1Gi
Requests:
Cpu: 500m
Memory: 1Gi
Storage:
Resources:
Requests:
Storage: 5Gi
Storage Class Name: linode-block-storage
Suffix: data-content
Data Hot:
Replicas: 3
Resources:
Limits:
Memory: 1Gi
Requests:
Cpu: 500m
Memory: 1Gi
Storage:
Resources:
Requests:
Storage: 3Gi
Storage Class Name: linode-block-storage
Suffix: data-hot
Data Warm:
Replicas: 2
Resources:
Limits:
Memory: 1Gi
Requests:
Cpu: 500m
Memory: 1Gi
Storage:
Resources:
Requests:
Storage: 5Gi
Storage Class Name: linode-block-storage
Suffix: data-warm
Ingest:
Replicas: 2
Resources:
Limits:
Memory: 1Gi
Requests:
Cpu: 500m
Memory: 1Gi
Storage:
Resources:
Requests:
Storage: 1Gi
Storage Class Name: linode-block-storage
Suffix: ingest
Master:
Replicas: 2
Resources:
Limits:
Memory: 1Gi
Requests:
Cpu: 500m
Memory: 1Gi
Storage:
Resources:
Requests:
Storage: 1Gi
Storage Class Name: linode-block-storage
Suffix: master
Version: xpack-8.11.1
Status:
Conditions:
Last Transition Time: 2022-03-14T06:33:20Z
Message: The KubeDB operator has started the provisioning of Elasticsearch: demo/es-cluster
Reason: DatabaseProvisioningStartedSuccessfully
Status: True
Type: ProvisioningStarted
Last Transition Time: 2022-03-14T06:34:55Z
Message: All desired replicas are ready.
Reason: AllReplicasReady
Status: True
Type: ReplicaReady
Last Transition Time: 2022-03-14T06:35:17Z
Message: The Elasticsearch: demo/es-cluster is accepting client requests.
Observed Generation: 2
Reason: DatabaseAcceptingConnectionRequest
Status: True
Type: AcceptingConnection
Last Transition Time: 2022-03-14T06:35:27Z
Message: The Elasticsearch: demo/es-cluster is ready.
Observed Generation: 2
Reason: ReadinessCheckSucceeded
Status: True
Type: Ready
Last Transition Time: 2022-03-14T06:35:28Z
Message: The Elasticsearch: demo/es-cluster is successfully provisioned.
Observed Generation: 2
Reason: DatabaseSuccessfullyProvisioned
Status: True
Type: Provisioned
Observed Generation: 2
Phase: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Successful 3m29s KubeDB Operator Successfully created governing service
Normal Successful 3m29s KubeDB Operator Successfully created Service
Normal Successful 3m29s KubeDB Operator Successfully created Service
Normal Successful 3m27s KubeDB Operator Successfully created Elasticsearch
Normal Successful 3m26s KubeDB Operator Successfully created appbinding
Normal Successful 3m26s KubeDB Operator Successfully governing service
- Here, in
Status.Conditions
Conditions.Status
isTrue
for theCondition.Type:ProvisioningStarted
which means database provisioning has been started successfully.Conditions.Status
isTrue
for theCondition.Type:ReplicaReady
which specifies all replicas are ready in the cluster.Conditions.Status
isTrue
for theCondition.Type:AcceptingConnection
which means database has been accepting connection request.Conditions.Status
isTrue
for theCondition.Type:Ready
which defines database is ready to use.Conditions.Status
isTrue
for theCondition.Type:Provisioned
which specifies Database has been successfully provisioned.
KubeDB Operator Generated Resources
Let’s check the Kubernetes resources created by the operator on the deployment of Elasticsearch CRO:
$ kubectl get all,secret,pvc -n demo -l 'app.kubernetes.io/instance=es-cluster'
NAME READY STATUS RESTARTS AGE
pod/es-cluster-data-cold-0 1/1 Running 0 5m46s
pod/es-cluster-data-cold-1 1/1 Running 0 4m51s
pod/es-cluster-data-content-0 1/1 Running 0 5m46s
pod/es-cluster-data-hot-0 1/1 Running 0 5m46s
pod/es-cluster-data-hot-1 1/1 Running 0 5m9s
pod/es-cluster-data-hot-2 1/1 Running 0 4m41s
pod/es-cluster-data-warm-0 1/1 Running 0 5m46s
pod/es-cluster-data-warm-1 1/1 Running 0 4m52s
pod/es-cluster-ingest-0 1/1 Running 0 5m46s
pod/es-cluster-ingest-1 1/1 Running 0 5m14s
pod/es-cluster-master-0 1/1 Running 0 5m46s
pod/es-cluster-master-1 1/1 Running 0 4m50s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/es-cluster ClusterIP 10.128.132.28 <none> 9200/TCP 5m50s
service/es-cluster-master ClusterIP None <none> 9300/TCP 5m50s
service/es-cluster-pods ClusterIP None <none> 9200/TCP 5m50s
NAME READY AGE
petset.apps/es-cluster-data-cold 2/2 5m48s
petset.apps/es-cluster-data-content 1/1 5m48s
petset.apps/es-cluster-data-hot 3/3 5m48s
petset.apps/es-cluster-data-warm 2/2 5m48s
petset.apps/es-cluster-ingest 2/2 5m48s
petset.apps/es-cluster-master 2/2 5m48s
NAME TYPE VERSION AGE
appbinding.appcatalog.appscode.com/es-cluster kubedb.com/elasticsearch 7.16.2 5m49s
NAME TYPE DATA AGE
secret/es-cluster-archiver-cert kubernetes.io/tls 3 5m51s
secret/es-cluster-ca-cert kubernetes.io/tls 2 5m51s
secret/es-cluster-config Opaque 1 5m50s
secret/es-cluster-elastic-cred kubernetes.io/basic-auth 2 5m51s
secret/es-cluster-http-cert kubernetes.io/tls 3 5m51s
secret/es-cluster-transport-cert kubernetes.io/tls 3 5m51s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/data-es-cluster-data-cold-0 Bound pvc-47585d52c11a4a52 10Gi RWO linode-block-storage 5m50s
persistentvolumeclaim/data-es-cluster-data-cold-1 Bound pvc-66aaa122c5774713 10Gi RWO linode-block-storage 4m55s
persistentvolumeclaim/data-es-cluster-data-content-0 Bound pvc-d51361e9352b4e9f 10Gi RWO linode-block-storage 5m50s
persistentvolumeclaim/data-es-cluster-data-hot-0 Bound pvc-3712187a3c6540da 10Gi RWO linode-block-storage 5m50s
persistentvolumeclaim/data-es-cluster-data-hot-1 Bound pvc-2318d4eacb4b453f 10Gi RWO linode-block-storage 5m13s
persistentvolumeclaim/data-es-cluster-data-hot-2 Bound pvc-c309c7058b114578 10Gi RWO linode-block-storage 4m45s
persistentvolumeclaim/data-es-cluster-data-warm-0 Bound pvc-d5950f5b075c4d3f 10Gi RWO linode-block-storage 5m50s
persistentvolumeclaim/data-es-cluster-data-warm-1 Bound pvc-3f6b99d11b1d46ea 10Gi RWO linode-block-storage 4m56s
persistentvolumeclaim/data-es-cluster-ingest-0 Bound pvc-081be753a20a45da 10Gi RWO linode-block-storage 5m50s
persistentvolumeclaim/data-es-cluster-ingest-1 Bound pvc-1bea5a3b5be24817 10Gi RWO linode-block-storage 5m18s
persistentvolumeclaim/data-es-cluster-master-0 Bound pvc-2c49a2ccb4644d6e 10Gi RWO linode-block-storage 5m50s
persistentvolumeclaim/data-es-cluster-master-1 Bound pvc-cb1d970febff498f 10Gi RWO linode-block-storage 4m54s
PetSet
- 6 PetSets are created for 6 types Elasticsearch nodes. The PetSets are named after the Elasticsearch instance with given suffix:{Elasticsearch-Name}-{Sufix}
.Services
- 3 services are generated for each Elasticsearch database.{Elasticsearch-Name}
- the client service which is used to connect to the database. It points to theingest
nodes.{Elasticsearch-Name}-master
- the master service which is used to connect to the master nodes. It is a headless service.{Elasticsearch-Name}-pods
- the node discovery service which is used by the Elasticsearch nodes to communicate each other. It is a headless service.
AppBinding
- an AppBinding which hold the connect information for the database. It is also named after the ElasticsSecrets
- 3 types of secrets are generated for each Elasticsearch database.{Elasticsearch-Name}-{username}-cred
- the auth secrets which hold theusername
andpassword
for the Elasticsearch users.{Elasticsearch-Name}-{alias}-cert
- the certificate secrets which holdtls.crt
,tls.key
, andca.crt
for configuring the Elasticsearch database.{Elasticsearch-Name}-config
- the default configuration secret created by the operator.
Connect with Elasticsearch Database
We will use port forwarding to connect with our Elasticsearch database. Then we will use curl
to send HTTP
requests to check cluster health to verify that our Elasticsearch database is working well.
Port-forward the Service
KubeDB will create few Services to connect with the database. Let’s check the Services by following command,
$ kubectl get service -n demo
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
es-cluster ClusterIP 10.128.132.28 <none> 9200/TCP 10m
es-cluster-dashboard ClusterIP 10.128.99.51 <none> 5601/TCP 10m
es-cluster-master ClusterIP None <none> 9300/TCP 10m
es-cluster-pods ClusterIP None <none> 9200/TCP 10m
Here, we are going to use es-cluster
Service to connect with the database. Now, let’s port-forward the es-cluster
Service to the port 9200
to local machine:
$ kubectl port-forward -n demo svc/es-cluster 9200
Forwarding from 127.0.0.1:9200 -> 9200
Forwarding from [::1]:9200 -> 9200
Now, our Elasticsearch cluster is accessible at localhost:9200
.
Export the Credentials
KubeDB also create some Secrets for the database. Let’s check which Secrets have been created by KubeDB for our es-cluster
.
$ kubectl get secret -n demo | grep es-cluster
es-cluster-archiver-cert kubernetes.io/tls 3 12m
es-cluster-ca-cert kubernetes.io/tls 2 12m
es-cluster-config Opaque 1 12m
es-cluster-dashboard-ca-cert kubernetes.io/tls 2 12m
es-cluster-dashboard-config Opaque 1 12m
es-cluster-dashboard-kibana-server-cert kubernetes.io/tls 3 12m
es-cluster-elastic-cred kubernetes.io/basic-auth 2 12m
es-cluster-http-cert kubernetes.io/tls 3 12m
es-cluster-token-v97c7 kubernetes.io/service-account-token 3 12m
es-cluster-transport-cert kubernetes.io/tls 3 12m
Now, we can connect to the database with es-cluster-elastic-cred
which contains the admin level credentials to connect with the database.
Accessing Database Through CLI
To access the database through CLI, we have to get the credentials to access. Let’s export the credentials as environment variable to our current shell :
$ kubectl get secret -n demo es-cluster-elastic-cred -o jsonpath='{.data.username}' | base64 -d
elastic
$ kubectl get secret -n demo es-cluster-elastic-cred -o jsonpath='{.data.password}' | base64 -d
YQB)~K6M9U)d_yVu
Now, let’s check the health of our Elasticsearch cluster
# curl -XGET -k -u 'username:password' https://localhost:9200/_cluster/health?pretty"
$ curl -XGET -k -u 'elastic:YQB)~K6M9U)d_yVu' "https://localhost:9200/_cluster/health?pretty"
{
"cluster_name" : "es-cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 12,
"number_of_data_nodes" : 8,
"active_primary_shards" : 9,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
Verify Node Role
As we have assigned a dedicated role to each type of node, let’s verify them by following command,
$ curl -XGET -k -u 'elastic:YQB)~K6M9U)d_yVu' "https://localhost:9200/_cat/nodes?v"
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.2.2.30 41 90 3 0.22 0.31 0.34 s - es-cluster-data-content-0
10.2.1.28 70 76 3 0.00 0.03 0.07 h - es-cluster-data-hot-0
10.2.0.28 45 87 4 0.09 0.20 0.26 i - es-cluster-ingest-0
10.2.2.29 33 75 3 0.22 0.31 0.34 w - es-cluster-data-warm-0
10.2.0.29 65 76 3 0.09 0.20 0.26 h - es-cluster-data-hot-1
10.2.0.30 46 75 3 0.09 0.20 0.26 c - es-cluster-data-cold-1
10.2.1.29 56 77 3 0.00 0.03 0.07 m * es-cluster-master-0
10.2.3.50 52 74 3 0.02 0.06 0.11 c - es-cluster-data-cold-0
10.2.2.31 34 75 3 0.22 0.31 0.34 m - es-cluster-master-1
10.2.1.30 21 74 3 0.00 0.03 0.07 w - es-cluster-data-warm-1
10.2.3.49 23 85 3 0.02 0.06 0.11 i - es-cluster-ingest-1
10.2.3.51 72 75 3 0.02 0.06 0.11 h - es-cluster-data-hot-2
node.role
field specifies the dedicated role that we have assigned for each type of node. Whereh
refers to the hot node,w
refers to the warm node,c
refers to the cold node,i
refers to the ingest node,m
refers to the master node, ands
refers to the content node.master
field specifies the acive master node. Here, we can see a*
in thees-cluster-master-0
which shows that it is the active master node now.
Cleaning Up
To cleanup the k8s resources created by this tutorial, run:
$ kubectl patch -n demo elasticsearch es-cluster -p '{"spec":{"deletionPolicy":"WipeOut"}}' --type="merge"
$ kubectl delete elasticsearch -n demo es-cluster
# Delete namespace
$ kubectl delete namespace demo
Next Steps
- Learn about taking backup of Elasticsearch database using Stash.
- Monitor your Elasticsearch database with KubeDB using
out-of-the-box
builtin-Prometheus. - Monitor your Elasticsearch database with KubeDB using
out-of-the-box
Prometheus operator. - Detail concepts of Elasticsearch object.
- Use private Docker registry to deploy Elasticsearch with KubeDB.
- Want to hack on KubeDB? Check our contribution guidelines.