Deploy OpenSearch via Kubernetes OpenSearch Operator

Kubernetes OpenSearch

In the current landscape of containerized applications and orchestration, the efficient deployment and upkeep of databases like OpenSearch require a versatile and efficient approach. The Kubernetes OpenSearch Operator streamlines the process of configuring, sustaining, and scaling OpenSearch databases within a Kubernetes environment. This guide delves into the fundamentals of installing OpenSearch databases using the OpenSearch Kubernetes Operator, examining its benefits, features, and step-by-step guidelines. By adopting this solution, you can achieve enhanced flexibility and automation in the provisioning and administration of your OpenSearch databases, all while adhering to best practices for containerized infrastructure.

Kubernetes is a groundbreaking open-source platform that streamlines the entire workflow for containerized applications. It emphasizes that Kubernetes provides the capability to easily deploy, scale, and manage applications, whether they are running on a single machine or distributed across a multi-cloud environment.

Kubernetes streamlines the administration of multiple containers by automating critical functions such as load balancing, dynamic scaling, and ensuring application robustness with automatic recovery mechanisms. When introducing a new version of your application, Kubernetes takes charge of the update process, minimizing downtime and mitigating the risk of errors.

With a simple declarative configuration, you can specify your desired application behavior, and Kubernetes ensures it follows those specifications. This allows you to concentrate on developing exceptional applications, while Kubernetes guarantees their reliable and efficient operation. Now, you can simplify the provisioning and troubleshooting process and empowering you to confidently tackle the complexity of application deployment.

Why OpenSearch in Kubernetes

OpenSearch, an open-source and exceptionally scalable search engine explicitly created for processing extensive volumes of data. It has many features, including full-text search, structured search, analytics, and logging, OpenSearch proves versatile and applicable across a diverse array of applications and use scenarios. Particularly advantageous for enterprises dealing with substantial real-time data management and search requirements, OpenSearch stands out for its ability to deliver rapid and precise search results.

OpenSearch facilitates horizontal scaling across multiple nodes, ensuring efficient handling of large data loads while maintaining continuous accessibility. Alongside its distributed design, OpenSearch accommodates a versatile data format, allowing the storage and indexing of diverse data types, such as text, numerical data, and geospatial information, whether structured or unstructured.

Integrating OpenSearch within a Kubernetes environment offers a powerful combination that brings a host of advantages. It allows for the seamless management of OpenSearch clusters at scale, ensuring optimal resource utilization and high availability, all within the robust orchestration framework of Kubernetes. Kubernetes simplifies the deployment and scaling of OpenSearch instances, making it easier to adapt to evolving data demands. Additionally, it provides a unified platform for handling both application and data infrastructure, streamlining operations and reducing complexity. This integration enhances the overall efficiency and resilience of OpenSearch deployments, facilitating real-time data processing and search capabilities within Kubernetes clusters, making it a formidable solution for modern data-driven applications.

Deploying OpenSearch on Kubernetes

Pre-requisites

We have to set up the environment to deploy OpenSearch on Kubernetes using a Kubernetes OpenSearch operator. A running Kubernetes cluster and a fundamental understanding of OpenSearch are required to conduct this tutorial. Here, we are going to create our kubernetes cluster using Kind. Additionally, you need to install Helm to your Kubernetes cluster.

In this article, We will use the Kubernetes OpenSearch operator KubeDB to deploy OpenSearch on Kubernetes. We must have KubeDB installed in our Kubernetes cluster. KubeDB provides supports for the official Elasticsearch by Elastic and OpenSearch by AWS, but also other open source distributions like SearchGuard and OpenDistro. KubeDB provides all of these distribution’s support under the Elasticsearch CR of KubeDB. To set up KubeDB in our Kubernetes cluster, we require a license. Through the Appscode License Server, we can get a free enterprise license. We must provide our Kubernetes cluster ID to obtain a license. Run the following command below to get the cluster ID.

$ kubectl get ns kube-system -o jsonpath='{.metadata.uid}'
6c08dcb8-8440-4388-849f-1f2b590b731e

The license server will email us with a “license.txt” file attached after we provide the necessary data. Run the following commands listed below to install KubeDB.

$ helm install kubedb oci://ghcr.io/appscode-charts/kubedb \
  --version v2023.12.11 \
  --namespace kubedb --create-namespace \
  --set-file global.license=/path/to/the/license.txt \
  --wait --burst-limit=10000 --debug

Verify the installation by the following command,

$ kubectl get pods --all-namespaces -l "app.kubernetes.io/instance=kubedb"
NAMESPACE   NAME                                            READY   STATUS    RESTARTS   AGE
kubedb      kubedb-kubedb-autoscaler-8685b5f5f8-kwh9r       1/1     Running   0          2m38s
kubedb      kubedb-kubedb-dashboard-677448dff8-ggrz6        1/1     Running   0          2m38s
kubedb      kubedb-kubedb-ops-manager-f4d869f54-xbtd7       1/1     Running   0          2m38s
kubedb      kubedb-kubedb-provisioner-778795d79-zbn74       1/1     Running   0          2m38s
kubedb      kubedb-kubedb-schema-manager-64f9cc9445-vwfsk   1/1     Running   0          2m38s
kubedb      kubedb-kubedb-webhook-server-85cb5f5fdb-jtpgt   1/1     Running   0          2m38s

We can go on to the next stage if every pod status is running.

Create a Namespace

Now we’ll create a new namespace in which we will deploy OpenSearch. To create a namespace, we can use the following command:

$ kubectl create namespace os-demo
namespace/os-demo created

Deploy OpenSearch via Kubernetes OpenSearch operator

We need to create a yaml configuration to deploy OpenSearch on Kubernetes. And we will apply this yaml below,

apiVersion: kubedb.com/v1alpha2
kind: Elasticsearch
metadata:
  name: os-cluster
  namespace: os-demo
spec:
  enableSSL: true 
  version: opensearch-2.11.1
  storageType: Durable
  topology:
    master:
      replicas: 2
      resources:
      storage:
        storageClassName: "standard"
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
    data:
      replicas: 2
      resources:
      storage:
        storageClassName: "standard"
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
    ingest:
      replicas: 2
      resources:
      storage:
        storageClassName: "standard"
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
  terminationPolicy: WipeOut

You can see the detailed yaml specifications in the Kubernetes OpenSearch documentation.

We will save this yaml configuration to os-cluster.yaml. Then create the above OpenSearch object.

$ kubectl apply -f os-cluster.yaml
elasticsearch.kubedb.com/os-cluster created

If all the above steps are handled correctly and the OpenSearch is deployed, you will see that the following objects are created:

$ kubectl get all -n os-demo
NAME                      READY   STATUS    RESTARTS   AGE
pod/os-cluster-data-0     1/1     Running   0          4m37s
pod/os-cluster-data-1     1/1     Running   0          2m39s
pod/os-cluster-ingest-0   1/1     Running   0          4m47s
pod/os-cluster-ingest-1   1/1     Running   0          2m42s
pod/os-cluster-master-0   1/1     Running   0          4m42s
pod/os-cluster-master-1   1/1     Running   0          2m36s

NAME                        TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/os-cluster          ClusterIP   10.96.99.212   <none>        9200/TCP   4m55s
service/os-cluster-master   ClusterIP   None           <none>        9300/TCP   4m55s
service/os-cluster-pods     ClusterIP   None           <none>        9200/TCP   4m55s

NAME                                 READY   AGE
statefulset.apps/os-cluster-data     2/2     4m37s
statefulset.apps/os-cluster-ingest   2/2     4m47s
statefulset.apps/os-cluster-master   2/2     4m42s

NAME                                            TYPE                       VERSION   AGE
appbinding.appcatalog.appscode.com/os-cluster   kubedb.com/elasticsearch   2.8.0     4m37s

NAME                                  VERSION            STATUS   AGE
elasticsearch.kubedb.com/os-cluster   opensearch-2.11.1   Ready    4m55s

We have successfully deployed OpenSearch to Kubernetes via the Kubernetes OpenSearch operator. Now, we will connect to the OpenSearch database to insert some sample data and verify whether our OpenSearch is usable or not. First, check the database status,

$ kubectl get es -n os-demo os-cluster
NAME         VERSION            STATUS   AGE
os-cluster   opensearch-2.11.1   Ready    4m59s

Insert sample data to the OpenSearch database

Now, we will create few indexes in OpenSearch. The Kubernetes OpenSearch operator establishes a governing service with the name of the OpenSearch object itself when OpenSearch yaml is deployed. Using this service, we will port-forward to the database from our local workstation and establish a connection. After that, we’ll add some data to OpenSearch.

Port-forward the Service

KubeDB will create few Services to connect with the database. Let’s see the Services created by KubeDB for our OpenSearch,

$ kubectl get service -n os-demo -l=app.kubernetes.io/instance=os-cluster
NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
os-cluster          ClusterIP   10.96.220.157   <none>        9200/TCP   5m
os-cluster-master   ClusterIP   None            <none>        9300/TCP   5m
os-cluster-pods     ClusterIP   None            <none>        9200/TCP   5m

Here, we are going to use the os-cluster Service to connect with the database. Now, let’s port-forward the os-cluster Service.

$ kubectl port-forward -n os-demo svc/os-cluster 9200
Forwarding from 127.0.0.1:9200 -> 9200
Forwarding from [::1]:9200 -> 9200

Export the Credentials

Kubernetes OpenSearch operator will create some Secrets for the database. Let’s list the Secrets for our os-cluster.

$ kubectl get secret -n os-demo -l=app.kubernetes.io/instance=os-cluster
NAME                              TYPE                       DATA   AGE
os-cluster-admin-cert             kubernetes.io/tls          3      5m
os-cluster-admin-cred             kubernetes.io/basic-auth   2      5m
os-cluster-ca-cert                kubernetes.io/tls          2      5m
os-cluster-client-cert            kubernetes.io/tls          3      5m
os-cluster-config                 Opaque                     3      5m
os-cluster-http-cert              kubernetes.io/tls          3      5m
os-cluster-kibanaro-cred          kubernetes.io/basic-auth   2      5m
os-cluster-kibanaserver-cred      kubernetes.io/basic-auth   2      5m
os-cluster-logstash-cred          kubernetes.io/basic-auth   2      5m
os-cluster-readall-cred           kubernetes.io/basic-auth   2      5m
os-cluster-snapshotrestore-cred   kubernetes.io/basic-auth   2      5m
os-cluster-transport-cert         kubernetes.io/tls          3      5m

Now, we can connect to the database with any of these secret that have the prefix cred. Here, we will use os-cluster-admin-cred which contains the admin level credentials to connect with the database.

$ kubectl get secret -n os-demo os-cluster-admin-cred -o jsonpath='{.data.username}' | base64 -d
admin
$ kubectl get secret -n os-demo os-cluster-admin-cred -o jsonpath='{.data.password}' | base64 -d
t;gmkX(o!4DuU6XP

We will now use curl to post some sample data into OpenSearch. Use the -k flag to disable attempts to verify self-signed certificates for testing purposes.

$ curl -XPOST -k --user 'admin:t;gmkX(o!4DuU6XP' "https://localhost:9200/music/_doc?pretty" -H 'Content-Type: application/json' -d'
                           {
                               "Artist": "Backstreet Boys",
                               "Song": "Show Me The Meaning"
                           }
                           '
{
  "_index" : "music",
  "_id" : "MRIPuYsBGygDWO9F_G9o",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

Now, let’s verify that the index has been created successfully.

$ curl -XGET -k --user 'admin:t;gmkX(o!4DuU6XP' "https://localhost:9200/_cat/indices?v&s=index&pretty"
health status index                        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .opendistro_security         MtD1G8t7SCKHdRdgESbglw   1   1         10            0    120.8kb         75.4kb
green  open   .opensearch-observability    5miOoG23QQ2tQKJYDlDV1A   1   1          0            0       416b           208b
green  open   kubedb-system                cL0sZYAaTEa7MeE_OYVXcg   1   1          1          270      1.3mb        706.3kb
green  open   music                        7jmr68IFT9S5s0W_2IaP1g   1   1          1            0      9.3kb          4.6kb
green  open   security-auditlog-2023.11.10 EbBSYaTATuaiE7efHLFaKA   1   1         12            0    346.9kb        173.2kb

Also, let’s verify the data in the indexes:

$ curl -XGET -k --user 'admin:t;gmkX(o!4DuU6XP' "https://localhost:9200/music/_search?pretty"
{
  "took" : 93,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "music",
        "_id" : "MRIPuYsBGygDWO9F_G9o",
        "_score" : 1.0,
        "_source" : {
          "Artist" : "Backstreet Boys",
          "Song" : "Show Me The Meaning"
        }
      }
    ]
  }
}

We’ve successfully Deploy OpenSearch to Kubernetes via Kubernetes OpenSearch Operator KubeDB and insert some sample data into it.

OpenSearch on Kubernetes: Best Practices

To ensure the robustness and reliability of your application when leveraging OpenSearch on Kubernetes through the Kubernetes OpenSearch operator, there are some best practices that you should follow:

Dashboard Integration: Deploy OpenSearch Dashboards alongside your OpenSearch cluster to access real-time performance insights and efficient data visualization. Secure OpenSearch Dashboards by implementing access controls and encryption. Leverage the dashboard features to monitor the health of your OpenSearch cluster and extract valuable performance insights for your application.
High Availability: Ensuring high availability by leveraging OpenSearch’s built-in data replication capabilities. Distribute data across multiple nodes to ensure redundancy and resilience. Implement load balancing to evenly distribute traffic among nodes.
Backup and Recovery: Give precedence to backup and recovery protocols by consistently generating data backups using OpenSearch snapshots or alternative compatible backup tools. Safeguard backups by storing them in distinct locations or employing cloud storage, simulate the effectiveness of disaster recovery capabilities. Regularly assess the reliability of data restoration procedures to ensure their effectiveness in crucial scenarios.
Monitoring & Security: Implement a robust monitoring strategy using tools such as Prometheus, Grafana, or OpenSearch’s native monitoring features. Keep a close eye on cluster health and performance metrics to proactively address potential issues. Strengthen security by incorporating Role-Based Access Control (RBAC) and robust authentication mechanisms. Enforce Kubernetes network policies to secure communication between OpenSearch pods and maintain a resilient security posture.

Conclusion

OpenSearch is a robust open-source search and analytics engine known for its capability to handle extensive and varied datasets with speed and accuracy. You have now successfully deployed an OpenSearch database on Kubernetes using the Kubernetes OpenSearch operator, a versatile solution suitable for various applications. Additional details can be found in the official OpenSearch documentation. Managing databases, whether they are located on-premises or in cloud environments, demands a substantial understanding and ongoing commitment. KubeDB provides a full support solution to ensure that your database management fulfills performance and uptime requirements. Regardless of whether your database infrastructure is localized on-site, spread across diverse geographical regions, or relies on cloud services or database-as-a-service providers, KubeDB offers indispensable support in managing the complete process within a production-grade environment.

KubeDB Operator

KubeDB Platform

Internal DBaaS for platform teams

Automate database operations with GitOps

Multi-tenant database infrastructure

White-labeled DBaaS offering

Run DBaaS in secure, offline clusters

PostgreSQL

MySQL

MariaDB

Microsoft SQL Server

Oracle

Percona XtraDB

SAP HanaDB

IBM DB2

MongoDB

Cassandra

DocumentDB

Redis

Valkey

Memcached

Ignite

Hazelcast

Elasticsearch

OpenSearch

Solr

Kafka

RabbitMQ

ClickHouse

Druid

SingleStore

Milvus

Qdrant

Weaviate

Neo4j

PgBouncer

Pgpool

ProxySQL

ZooKeeper

Amazon EKS

Google GKE

Microsoft AKS

Red Hat OpenShift

SUSE Rancher

Nutanix Kubernetes Platform

Mirantis Kubernetes Engine

VMware vSphere Kubernetes Service

Alliance Partners

Channel Partners

Managed Service Providers

Operator Documentation

Platform Documentation

Orange Telecom powers its digital future with KubeDB

10× lower costs. 2× faster performance. All with KubeDB.

White-label DBaaS live in 1 month — serving 20,000+ customers

Deploy Cassandra via Kubernetes Cassandra Operator

How to Deploy ClickHouse via Kubernetes ClickHouse Operator

Deploy Memcached using Kubernetes Memcached Operator

Blog

ClickHouse Ops Requests - Day 2 Lifecycle Management for ClickHouse Using KubeDB (Part-2)

Provision and Manage Milvus on Kubernetes using KubeDB

Provision and Manage Weaviate on Kubernetes using KubeDB

Provision and Manage Qdrant on Kubernetes using KubeDB

Videos