Deploy Solr using Kubernetes Solr Operator

Kubernetes Solr

In a world dominated by data, powerful search capabilities are crucial for businesses looking to improve user experience and drive engagement. Apache Solr, a highly scalable open-source search platform built on Apache Lucene, provides the tools necessary to handle complex search requirements across vast datasets. By deploying Solr on Kubernetes, organizations can use the orchestration power of Kubernetes to create a resilient, scalable, and easily manageable search solution.

Kubernetes has transformed how applications are deployed, managed, and scaled, making it an ideal environment for running Solr. Central to this deployment is the Kubernetes Solr Operator, which simplifies the management of Solr clusters by allowing you to define Solr instances as declarative resources. This operator automates many operational tasks such as scaling, backups, and upgrades, enabling you to focus on your application rather than the underlying infrastructure. In this article, we will explore how to deploy Apache Solr using the Kubernetes Solr Operator, outlining its benefits and providing a step-by-step guide to get you started.

Why Solr in Kubernetes

Apache Solr excels at full-text search, faceted search, and real-time indexing, making it an ideal choice for applications requiring powerful search functionality. Integrating Solr with Kubernetes improves its capabilities in several significant ways.

First, Kubernetes provides the ability to scale Solr instances dynamically based on demand. This is particularly useful for handling varying workloads, such as during peak traffic periods or large data ingestion events. By implementing Horizontal Pod Autoscaler of Kubernetes, you can automatically adjust the number of Solr replicas in response to real-time metrics, ensuring optimal performance without manual intervention. Kubernetes continuously monitors the health of your Solr pods. In the event of a pod failure, Kubernetes automatically restarts or replaces it, maintaining high availability. This self-healing capability is crucial for search applications where downtime can negatively impact user experience and operational efficiency.

Additionally, Kubernetes enables efficient resource management customized to Solr’s specific needs. You can define resource requests and limits for each Solr pod, ensuring that each instance has the necessary CPU and memory to handle indexing and query workloads effectively. This helps prevent resource contention in your cluster, particularly in environments where multiple services are deployed alongside Solr.

Lastly, deploying Solr on Kubernetes simplifies the management of complex configurations. For instance, you can use the Kubernetes Solr Operator, which automates tasks such as scaling, backups, and upgrades. This operator allows you to manage your Solr clusters as declarative resources, streamlining operations and making it easier to implement changes with minimal disruption. Overall, the combination of Solr and Kubernetes creates a powerful, resilient search infrastructure that can scale with your application’s needs.

Deploy Solr on Kubernetes

Pre-requisites

To deploy Solr on Kubernetes using the Kubernetes Solr Operator, you’ll first need to set up your environment. Begin by ensuring you have a functional Kubernetes cluster. In this guide, we will create our cluster using Kind. A basic understanding of Solr will also be beneficial as you navigate the deployment process. Also, it’s important to install Helm in your Kubernetes cluster, as it is essential for managing packages effectively.

This guide uses the Kubernetes Solr operator, KubeDB, so you’ll need to install KubeDB in your Kubernetes environment. To install KubeDB, you’ll also need a license, which you can obtain for free from the Appscode License Server.

To get a license, use your Kubernetes cluster ID. Run the following command to retrieve your cluster ID:

$ kubectl get ns kube-system -o jsonpath='{.metadata.uid}'
6d446615-0doo-3he8-b14f-8y5ec34c451u

The license server will email us with a “license.txt” file attached after we provide the necessary data. Run the following commands listed below to install KubeDB.

$ helm install kubedb oci://ghcr.io/appscode-charts/kubedb \
  --version v2024.8.21 \
  --namespace kubedb --create-namespace \
  --set-file global.license=/path/to/the/license.txt \
  --set global.featureGates.Solr=true \
  --set global.featureGates.ZooKeeper=true \
  --wait --burst-limit=10000 --debug

Verify the installation by the following command,

$ kubectl get pods --all-namespaces -l "app.kubernetes.io/instance=kubedb"
NAMESPACE   NAME                                            READY   STATUS    RESTARTS   AGE
kubedb      kubedb-kubedb-autoscaler-7bf9c48b5c-sk6wq       1/1     Running   0          2m27s
kubedb      kubedb-kubedb-ops-manager-56bbd9b584-9wrmh      1/1     Running   0          2m27s
kubedb      kubedb-kubedb-provisioner-595f6757cd-hmgvx      1/1     Running   0          2m27s
kubedb      kubedb-kubedb-webhook-server-574f8d5767-4gj6p   1/1     Running   0          2m27s
kubedb      kubedb-petset-operator-77b6b9897f-69g2n         1/1     Running   0          2m27s
kubedb      kubedb-petset-webhook-server-75b578785f-wc469   2/2     Running   0          2m27s
kubedb      kubedb-sidekick-c898cff4c-h99wd                 1/1     Running   0          2m27s

We can go on to the next stage if every pod status is running.

Create a Namespace

To keep resources isolated, we’ll use a separate namespace called solr-demo throughout this tutorial. Run the following command to create the namespace:

$ kubectl create namespace solr-demo
namespace/solr-demo created

Create ZooKeeper Instance

Since KubeDB Solr operates in solrcloud mode, it requires an external ZooKeeper to manage replica distribution and configuration.

In this tutorial, we will use KubeDB ZooKeeper. Below is the configuration for the ZooKeeper instance we’ll create:

apiVersion: kubedb.com/v1alpha2
kind: ZooKeeper
metadata:
  name: zookeeper
  namespace: solr-demo
spec:
  version: 3.9.1
  replicas: 3
  adminServerPort: 8080
  storage:
    resources:
      requests:
        storage: "100Mi"
    storageClassName: standard
    accessModes:
      - ReadWriteOnce
  deletionPolicy: "WipeOut"

You can see the detailed yaml specifications in the Kubernetes ZooKeeper documentation.

Let’s save this yaml configuration into zookeeper.yaml Then create the above ZooKeeper CRO,

$ kubectl apply -f zookeeper.yaml 
zookeeper.kubedb.com/zookeeper created

Once the ZooKeeper instance’s STATUS is Ready, we can proceed to deploy Solr in our cluster.

$ kubectl get zookeeper -n solr-demo zookeeper
NAME        TYPE                  VERSION   STATUS   AGE
zookeeper   kubedb.com/v1alpha2   3.9.1     Ready    4m14s

Deploy Solr Cluster using Kubernetes Solr operator

Here is the yaml of the Solr we are going to use:

apiVersion: kubedb.com/v1alpha2
kind: Solr
metadata:
  name: solr-cluster
  namespace: solr-demo
spec:
  version: 9.4.1
  replicas: 3
  zookeeperRef:
    name: zookeeper
    namespace: solr-demo
  storage:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 2Gi
    storageClassName: standard
  deletionPolicy: "WipeOut"

You can see the detailed yaml specifications in the Kubernetes Solr documentation.

Let’s save this yaml configuration into solr-cluster.yaml Then apply the above Solr yaml,

$ kubectl apply -f solr-cluster.yaml 
solr.kubedb.com/solr-cluster created

Once these are handled correctly and the Solr object is deployed, you will see that the following resources are created:

$ kubectl get all -n solr-demo
NAME                 READY   STATUS    RESTARTS   AGE
pod/solr-cluster-0   1/1     Running   0          2m56s
pod/solr-cluster-1   1/1     Running   0          52s
pod/solr-cluster-2   1/1     Running   0          44s
pod/zookeeper-0      1/1     Running   0          5m6s
pod/zookeeper-1      1/1     Running   0          4m37s
pod/zookeeper-2      1/1     Running   0          4m28s

NAME                             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/solr-cluster             ClusterIP   10.96.247.137   <none>        8983/TCP                     2m58s
service/solr-cluster-pods        ClusterIP   None            <none>        8983/TCP                     2m58s
service/zookeeper                ClusterIP   10.96.179.181   <none>        2181/TCP                     5m10s
service/zookeeper-admin-server   ClusterIP   10.96.99.105    <none>        8080/TCP                     5m10s
service/zookeeper-pods           ClusterIP   None            <none>        2181/TCP,2888/TCP,3888/TCP   5m10s

NAME                                              TYPE                   VERSION   AGE
appbinding.appcatalog.appscode.com/solr-cluster   kubedb.com/solr        9.4.1     2m58s
appbinding.appcatalog.appscode.com/zookeeper      kubedb.com/zookeeper   3.9.1     5m10s

NAME                           TYPE                  VERSION   STATUS   AGE
solr.kubedb.com/solr-cluster   kubedb.com/v1alpha2   9.4.1     Ready    2m58s

NAME                             TYPE                  VERSION   STATUS   AGE
zookeeper.kubedb.com/zookeeper   kubedb.com/v1alpha2   3.9.1     Ready    5m10s

Let’s check if the database is ready to use,

$ kubectl get solr -n solr-demo solr-cluster
NAME           TYPE                  VERSION   STATUS   AGE
solr-cluster   kubedb.com/v1alpha2   9.4.1     Ready    3m21s

Connect with Solr Database

We will use port forwarding to connect with our Solr database. Then we will use curl to send HTTP requests to check cluster health to verify that our Solr database is working well.

Port-forward the Service

KubeDB will create few Services to connect with the database. Let’s check the Services by following command,

$ kubectl get service -n solr-demo
NAME                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
solr-cluster             ClusterIP   10.96.247.137   <none>        8983/TCP                     3m53s
solr-cluster-pods        ClusterIP   None            <none>        8983/TCP                     3m53s
zookeeper                ClusterIP   10.96.179.181   <none>        2181/TCP                     6m5s
zookeeper-admin-server   ClusterIP   10.96.99.105    <none>        8080/TCP                     6m5s
zookeeper-pods           ClusterIP   None            <none>        2181/TCP,2888/TCP,3888/TCP   6m5s

To connect to the Solr database, we will use the solr-cluster service. First, we need to port-forward the solr-cluster service to port 8983 on the local machine:

$ kubectl port-forward -n solr-demo svc/solr-cluster 8983
Forwarding from 127.0.0.1:8983 -> 8983
Forwarding from [::1]:8983 -> 8983

Now, the Solr cluster is accessible at localhost:8983.

Export the Credentials

KubeDB creates several Secrets for managing the database. To view the Secrets created for solr-cluster, run the following command:

$ kubectl get secret -n solr-demo | grep solr-cluster
solr-cluster-admin-cred           kubernetes.io/basic-auth   2      4m27s
solr-cluster-auth-config          Opaque                     1      4m27s
solr-cluster-config               Opaque                     1      4m27s
solr-cluster-zk-digest            kubernetes.io/basic-auth   2      4m27s
solr-cluster-zk-digest-readonly   kubernetes.io/basic-auth   2      4m27s

From the above list, the solr-cluster-admin-cred Secret contains the admin-level credentials needed to connect to the database.

Accessing Database Through CLI

To access the database via the CLI, you first need to retrieve the credentials. Use the following commands to obtain the username and password:

$ kubectl get secret -n solr-demo solr-cluster-admin-cred -o jsonpath='{.data.username}' | base64 -d
admin
$ kubectl get secret -n solr-demo solr-cluster-admin-cred -o jsonpath='{.data.password}' | base64 -d
2bJLLUK0!*)Dsnd5

Now, let’s check the health of our Solr cluster.

# curl -XGET -k -u 'username:password' "http://localhost:8983/solr/admin/collections?action=CLUSTERSTATUS"
$ curl -XGET -k -u 'admin:2bJLLUK0!*)Dsnd5' "http://localhost:8983/solr/admin/collections?action=CLUSTERSTATUS"

{
  "responseHeader":{
    "status":0,
    "QTime":1
  },
  "cluster":{
    "collections":{
      "kubedb-system":{
        "pullReplicas":"0",
        "configName":"kubedb-system.AUTOCREATED",
        "replicationFactor":1,
        "router":{
          "name":"compositeId"
        },
        "nrtReplicas":1,
        "tlogReplicas":"0",
        "shards":{
          "shard1":{
            "range":"80000000-7fffffff",
            "state":"active",
            "replicas":{
              "core_node2":{
                "core":"kubedb-system_shard1_replica_n1",
                "node_name":"solr-cluster-1.solr-cluster-pods.solr-demo:8983_solr",
                "type":"NRT",
                "state":"active",
                "leader":"true",
                "force_set_state":"false",
                "base_url":"http://solr-cluster-1.solr-cluster-pods.solr-demo:8983/solr"
              }
            },
            "health":"GREEN"
          }
        },
        "health":"GREEN",
        "znodeVersion":4
      }
    },
    "live_nodes":["solr-cluster-1.solr-cluster-pods.solr-demo:8983_solr","solr-cluster-2.solr-cluster-pods.solr-demo:8983_solr","solr-cluster-0.solr-cluster-pods.solr-demo:8983_solr"]
  }
}

Insert Sample Data

In this section, we’ll create a collection in Solr and insert some sample data using curl. To disable certificate verification (useful for testing with self-signed certificates), use the -k flag.

Execute the following command to create a collection named music in Solr:

$ curl -XPOST -k -u 'admin:2bJLLUK0!*)Dsnd5' "http://localhost:8983/solr/admin/collections?action=CREATE&name=music&numShards=2&replicationFactor=2&wt=xml"

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">3712</int>
</lst>
<lst name="success">
  <lst name="solr-cluster-1.solr-cluster-pods.solr-demo:8983_solr">
    <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">2428</int>
    </lst>
    <str name="core">music_shard1_replica_n2</str>
  </lst>
  <lst name="solr-cluster-2.solr-cluster-pods.solr-demo:8983_solr">
    <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">2634</int>
    </lst>
    <str name="core">music_shard2_replica_n1</str>
  </lst>
  <lst name="solr-cluster-2.solr-cluster-pods.solr-demo:8983_solr">
    <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">2869</int>
    </lst>
    <str name="core">music_shard1_replica_n6</str>
  </lst>
  <lst name="solr-cluster-0.solr-cluster-pods.solr-demo:8983_solr">
    <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">3031</int>
    </lst>
    <str name="core">music_shard2_replica_n4</str>
  </lst>
</lst>
</response>

$ curl -X POST -u 'admin:2bJLLUK0!*)Dsnd5' -H 'Content-Type: application/json' "http://localhost:8983/solr/music/update" --data-binary '[{ "Artist": "John Denver","Song": "Country Roads"}]'
{
  "responseHeader":{
    "rf":2,
    "status":0,
    "QTime":527
  }
}

To verify that the collection has been created successfully, run the following command:

$ curl -X GET -u 'admin:2bJLLUK0!*)Dsnd5' 'http://localhost:8983/solr/admin/collections?action=LIST&wt=json'
{
  "responseHeader":{
    "status":0,
    "QTime":0
  },
  "collections":["kubedb-system","music"]
}

To check the sample data in the music collection, use the following command:

$ curl -X GET -u 'admin:2bJLLUK0!*)Dsnd5' "http://localhost:8983/solr/music/select" -H 'Content-Type: application/json' -d '{"query": "*:*"}'
{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":87,
    "params":{
      "json":"{\"query\": \"*:*\"}"
    }
  },
  "response":{
    "numFound":1,
    "start":0,
    "maxScore":1.0,
    "numFoundExact":true,
    "docs":[{
      "Artist":["John Denver"],
      "Song":["Country Roads"],
      "id":"798b62b5-adcf-4ed3-b83e-af79efe019f6",
      "_version_":1810618668053168128
    }]
  }
}

We’ve successfully deployed Solr to Kubernetes using the Kubernetes Solr Operator, KubeDB. Additionally, we accessed Solr and inserted some sample data.

Solr on Kubernetes: Best Practices

To maintain the stability of your application when using Solr on Kubernetes, consider the following best practices:

  • Configure Sharding and Replication: Take advantage of Solr’s sharding and replication capabilities to improve performance and availability. Strategically distribute your data across multiple shards to improve query speed and scale horizontally. Ensure that your replication strategy is configured to provide redundancy, reducing the risk of data loss.

  • Monitoring and Health Checks: Set up robust monitoring to gain insights into Solr’s performance and health. Use tools like Prometheus and Grafana to visualize key metrics such as query response times, indexing rates, and resource utilization. Continuously analyze query performance to identify bottlenecks and optimize Solr configurations, such as caching and query rewriting.

  • Disaster Recovery Options: Implement automated backup solutions to regularly capture Solr indexes and configurations. Store backups in a durable storage solution, ensuring quick recovery in the event of a failure. Test your disaster recovery procedures regularly to confirm that you can restore functionality with minimal downtime.

  • Security Configurations: Protect your Solr environment by enforcing strict security protocols. Use network policies to segment Solr pods and restrict access based on roles. Implement TLS encryption for data in transit and utilize Solr’s authentication mechanisms to manage user access. Regularly audit your security configurations to align with industry standards.

  • Utilizing the Kubernetes Solr Operator: Use the Kubernetes Solr Operator to streamline the management of your Solr clusters. The operator automates deployment, scaling, and configuration tasks, significantly reducing administrative overhead. It also provides a declarative approach to resource management, helping you maintain consistency and reliability across your deployments.

Conclusion

Apache Solr, renowned for its powerful search capabilities and scalability, provides an exceptional solution for organizations looking to enhance their search functionality across large datasets. Deploying Solr on Kubernetes using the Kubernetes Solr Operator simplifies the deployment process, merging the strengths of a robust search platform with the flexibility and orchestration capabilities of Kubernetes. This method enhances your search infrastructure with automated provisioning, seamless scaling, and efficient management, ensuring high availability and optimal performance. For more information about Solr, visit the Apache Solr documentation. By implementing the Kubernetes Solr Operator from KubeDB, you can streamline operations and simplify the management of your Solr clusters. KubeDB automates deployment and scaling, allowing for quick adaptations to changing workload demands while ensuring that your search solution remains resilient and agile. Following best practices in managing your Solr deployment, especially within the dynamic Kubernetes environment, is crucial for achieving greater efficiency and reliability.

Share on social media

What They Are Talking About us

Trusted by top engineers at the most ambitious companies

faq-image

frequently asked questions

Here are a few of the questions we get the most. If you don't see what's on your mind, contact us anytime.

Can I manage multiple Databases via KubeDB?

Yes, you can manage as many databases as you want within a single subscription to KubeDB, and there is no extra charge for that!

Can I use KubeDB in any cloud?

Yes, of course! KubeDB is platform-independent. You can use KubeDB in any cloud or on-premises.

My cluster is running on bare metal. Will it be safe to use KubeDB?

KubeDB is running in production by multiple Governments and large organizations. Your data is always safe within KubeDB.

Do you have offer technical support?

We offer 24x7 technical system and maintain SLA to provide 100% reliability to our customers.

Is Stash complementary with KubeDB?

Yes, Stash is seemingly integrated with KubeDB. There is no extra charge for using Stash. It is complimentary with KubeDB.

Can we try KubeDB?

Of course! We offer a 30 days license free of cost to try fully featured KubeDB.

Is there any cancellation fee?

There is no cancellation fee. But plans are subject to minimum duration (1 year) as stated above.

What types of payment do you accept?

We prefer ACH transfer for US based customers and international wire transfer for everyone else. We can also accept all popular credit/debit cards such as Visa, Mastercard, American Express, Discover, etc.

Is my payment information safe?

Yes! For ACH transfer and wire transfer you work with your bank for payment. Our credit card processing is powerd by Stripe. You credit card data never touches our servers. For more information, please visit stripe.com.

Run and Manage your Database on Kubernetes FREE !

KubeDB is FREE to use on any supported Kubernetes engines. You can deploy and manage your database in Kubernetes using KubeDB. There is no up-front investment required. We offer a 30 days license FREE of cost to try KubeDB.