As your applications grow, so does the flood of logs and telemetry data. Managing and scaling your telemery infrastructure can quickly become a significant challenge, because they stay ahead of your application growth. For years, the ELK stack (Elasticsearch, Logstash, Kibana) has been a go-to solution.

However, ELK is showing its age, especially Logstash, which is a resource-hog and inflexible compared to modern alternatives. On top of that Elasticsearch has become a source-availabe project after its license change. Many companies refuse to use it due to its licensing terms.

This article is the second part of our series on building a powerful, truly open-source observability stack. In Part 1, we replaced the "E" and "L" of the ELK stack with OpenSearch and a lightweight OpenTelemetry Collector for efficient log management. Now, we'll extend that foundation by integrating Jaeger for distributed tracing.

By the end, you'll understand how to build a robust, scalable, and vendor-neutral observability setup that puts you in control of your data and your budget.

Why this new stack wins

OpenTelemetry

OpenTelemetry is a CNCF-graduated project that standardizes the collection of logs, metrics, and traces.

The OTel Collector is a key component, offering incredible flexibility. It has

Receivers: These define how data gets into the Collector. Examples include OTLP (OpenTelemetry Protocol), Fluent Forwarder, filelog (for tailing files), etc.
Processors: These allow you to manipulate, filter, batch, or route data. You can enrich logs with attributes, remove sensitive information, or ensure efficient batching before export.
Exporters: These define how data is sent from the Collector to one or more backends, such as OpenSearch, Kafka, or any OTLP-compliant system. You can find an extensive list of available components and integrations in the OpenTelemetry Collector Contrib GitHub repository.
Agent mode: The OTel Collector can also act as an agent. This means you can deploy it to collect logs from sources that you can't directly instrument with OTel libraries, such as tailing Nginx access logs or collecting Docker container logs using receivers like filelog or dockerstats.
Performance and efficiency: OpenTelemetry is designed with performance and efficiency in mind. It's generally lighter and more performant than Logstash, allowing you to handle more data with fewer resources
Zero-code instrumentation: OpenTelemetry has zero code instrumentation for most languages and frameworks, making it incredibly easy to adopt.

Opensearch

Elasticsearch, the heart of the ELK stack, also underwent significant licensing changes when Elastic moved core components to a source-available license. This shift raised concerns within the open-source community regarding vendor lock-in and the long-term openness of the project. The community really didn't like this change.

As a direct response, OpenSearch, a community-driven, Apache 2.0 licensed fork, was created to ensure a truly open-source path forward. It has a active community and is backed by AWS, which provides a stable foundation for future development.

Note: Elastic often claims that OpenSearch is inefficient and not as performant as Elasticsearch. However, we operate clusters that ingests billions of spans and logs daily with nominal cost and no performance issues. So we can can attest that OpenSearch is a solid choice for modern observability needs.

Jaeger

Both OpenSearch dashboard and Kibana have a really bad UI for tracing. So we use Jaeger. Jaeger, another CNCF-graduated project, is purpose-built for trace visualization. Its intuitive UI and powerful analysis features make it the one of the top open-source tool for debugging and understanding complex service interactions. We are using Jaeger for the frontend fror tracing and OpenSearch as the datastore.

Implementation guide

Prerequisites

A Kubernetes Cluster: While you can run these components in other environments, Kubernetes is highly recommended, especially for observability systems. These systems often grow faster than anticipated, and Kubernetes provides the scalability and manageability you'll need.

Setting up Opensearch

The recommended way to deploy and manage OpenSearch on Kubernetes is by using the OpenSearch Operator.

Why the Operator? It significantly simplifies deployment, ongoing management (like version ugrades, config changes, and scaling), and day-to-day operations. In our experience managing multiple large OpenSearch clusters that ingests billions of spans and logs daily, the operator has substantially reduced operational overhead.

Follow these instructions to setup the Opensearch Operator using Helm: OpenSearch-k8s-operator

Once you have the operator running, you can deploy an OpenSearch cluster. Following is an exmple of a 3-node OpenSearch cluster. Read this user guide for more details: Opensearch Operator User Guide.

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: observability-opensearch
  namespace: observability
spec:
  general:
    serviceName: observability-opensearch
    version: 2.17.0
    additionalConfig:
      plugins.query.datasources.encryption.masterkey: "cbdda1e0ab9e45c44f9b56a3" # Change this

  security:
    config:
      adminSecret:
        # This secret contains the admin certificate using common name "admin". Use cert-manager to generate it.
        name: observability-opensearch-admin-cert 
      adminCredentialsSecret:
        # This secret contains admin credentials. They keys are "username" and "password".
        name: observability-opensearch-admin-credentials
      securityConfigSecret:
        # This secret contains the security config files.
        # The key is filename and the value is the file content. e.g. "config.yml": "internal_users.yml"
        name: observability-opensearch-security-config-files
    tls:
      transport:
        generate: false
        perNode: false
        secret:
          # Generate this similar to the admin cerntificate, but with common name "opensearch".
          name: observability-opensearch-node-cert
        caSecret:
          # This is the secret that contains the CA certificate used to create the admin and node certificates.
          name: observability-ca-secret
        nodesDn: ["CN=opensearch"]
        adminDn: ["CN=admin"]
      http:
        generate: false
        secret:
          name: observability-opensearch-node-cert
        caSecret:
          name: observability-ca-secret
  dashboards:
    enable: true
    version: 2.17.0
    replicas: 1
    resources:
      requests:
         memory: "512Mi"
         cpu: "200m"
      limits:
         memory: "512Mi"
         cpu: "200m"
    opensearchCredentialsSecret:
      name: observability-opensearch-admin-credentials

  nodePools:
  - component: master 
    replicas: 3
    diskSize: "30Gi"
    nodeSelector:
    resources:
        requests:
          memory: "1.5Gi"
          cpu: "500m"
        limits:
          memory: "2Gi"
          cpu: "1500m"
    roles:
      - "cluster_manager"
      - "data"
    persistence:
      pvc:
        storageClass: "storage-class-name" # Change this to your storage class
        accessModes:
          - "ReadWriteOnce"
    env: 
      - name: DISABLE_INSTALL_DEMO_CONFIG
        value: "false"

Following are the essintial configuration files you need to put in observability-opensearch-security-config-files secret:

internal_users.yml:

_meta:
  type: "internalusers"
  config_version: 2
admin:
  #  Change this to your hashed admin password: Use https://bcrypt-generator.com/
  hash: "$2y$12$eW5Z1z3a8b7c9d8e7f8g9u0h1i2j3k4l5m6n7o8p9q0r1s2t3u4v5w6x7y8z" #
  reserved: true
  description: "Cluster super user"
  backend_roles:
  - "admin"
---

config.yaml:

_meta:
  type: "config"
  config_version: 2

config:
  dynamic:
    http:
      anonymous_auth_enabled: false
    authc:
      basic_internal_auth_domain:
        description: "Authenticate via HTTP Basic against internal users database"
        http_enabled: true
        transport_enabled: true
        order: 4
        http_authenticator:
          type: basic
          challenge: true
        authentication_backend:
          type: intern
      clientcert_auth_domain:
        description: "Authenticate via SSL client certificates"
        http_enabled: false
        transport_enabled: false
        order: 2
        http_authenticator:
          type: clientcert
          config:
            username_attribute: cn #optional, if omitted DN becomes username
          challenge: false
        authentication_backend:
          type: noop
    authz: {}

Setting up data streams for log

Data streams are designed to handle continuously generated time-series append-only data such as logs. We will setup a data stream name logs-stream and make it write to a new index every day. The indices will expire after 30 days. This is a common pattern for log data, allowing you to efficiently manage storage and retention.

Note: You can run the following commands in OpenSearch Dev Tools in Opensearch Dashboards.

Create data stream

PUT _index_template/logs-stream-template
{
  "index_patterns" : "logs-stream",
  "data_stream": {},
  "priority": 100
}

PUT _data_stream/logs-stream

Create ISM policy to rollover indices daily and delete after 30 days

PUT _plugins/_ism/policies/logs-lifecycle-policy
{
  "policy": {
    "description": "Rollover indices daily and delete after 30 days",
    "default_state": "hot",
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "rollover": {
              "min_index_age": "1d"
            }
          }
        ],
        "transitions": [
          {
            "state_name": "delete",
            "conditions": {
              "min_index_age": "30d"
            }
          }
        ]
      },
      {
        "name": "delete",
        "actions": [
          {
            "delete": {}
          }
        ],
        "transitions": []
      }
    ]
  }
}

POST _plugins/_ism/add/logs-stream
{
  "policy_id": "logs-lifecycle-policy"
}

Setting up Jaeger

We will be using Jaeger operator to deploy Jaeger. Run the following to install the Jaeger Operator.

kubectl apply -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.65.0/jaeger-operator.yaml

Note that you need to install the operator in the same namespace as your observability stack. In our case that would be observability. Once that is installed in you can deploy Jaeger.

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: observability-jaeger
  namespace: observability
spec:
  strategy: production
  collector:
    maxReplicas: 2
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: 500m
        memory: 512Mi
  ui:
    options:
      dependencies:
        menuEnabled: false
      menu:
      - label: "About Jaeger"
        items:
          - label: "Documentation"
            url: "https://www.jaegertracing.io/docs/latest"
  storage:
    type: elasticsearch
    # Create a secret named `observability-jaeger-es-credentials` with the following keys
    # ES_USERNAME: your admin username, in this case it is `admin`
    # ES_PASSWORD: <admin password>
    secretName: observability-jaeger-es-credentials
    options:
      es:
        server-urls: https://observability-opensearch.observability.svc.cluster.local:9200
        index-prefix: jaeger
        tls:
          enabled: true
          ca: /tls/ca.crt 
    esIndexCleaner:
      enabled: true
      numberOfDays: 30 # Retain traces for 30 days
      schedule: "55 23 * * *" # Run daily at 23:55 
  query:
    options:
      prometheus:
        query:
          normalize-calls: true
          normalize-duration: true
  volumeMounts:
    - name: ca-cert
      mountPath: /tls/ca.crt
      subPath: ca.crt
  volumes:
    - name: ca-cert
      secret:
        # This is the CA certificate that that signed the OpenSearch TLS certificates
        secretName: observability-ca-secret 
        items:
          - key: ca.crt
            path: ca.crt

Read more about Jaeger Operator and its configuration options here: https://github.com/jaegertracing/jaeger-operator/blob/main/docs/api.md

Setting up the OpenTelemetry collector

Similarly, for the OpenTelemetry Collector, using the OTel Operator is recommended. Follow these instructions to set it up: OpenTelemetry Operator

Once you have the operator running, you can deploy an OpenTelemetry Collector. OpenTelemetry collector is the gateway for your observability data. You applications will send their logs and traces, which will then be processed and exported to Jaeger, which will write it to Opensearch.

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: observability-otel-workers
  namespace: observability
spec:
  mode: deployment
  image: otel/opentelemetry-collector-contrib:0.118.0
  resources:
    resources:
      requests:
        memory: "100Mi"
        cpu: "500m"
      limits:
        memory: "1Gi"
        cpu: "1000m"
  autoscaler:
    maxReplicas: 2
    minReplicas: 1
    targetCPUUtilization: 90
    targetMemoryUtilization: 90
  volumeMounts:
    - name: ca-cert
      mountPath: /tls/ca.crt
      subPath: ca.crt
  volumes:
    - name: ca-cert
      secret:
        #  This is the secret that contains the CA certificate used to create the admin and node certificates.
        secretName: observability-ca-secret 
        items:
          - key: ca.crt
            path: ca.crt
  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: "0.0.0.0:4317"
          http:
            endpoint: ":4318"
            cors:
              allowed_origins:
                - "http://*"
                - "https://*"

    processors:
      batch:
        send_batch_max_size: 3000
        send_batch_size: 1000
        timeout: 5s
    exporters:
      otlp/jaeger:
        endpoint: "observability-jaeger-collector.observability.svc.cluster.local:4317"
        tls:
          insecure: true
      opensearch/logs:
        # This is the OpenSearch data stream we created earlier.
        logs_index: "logs-stream"
        http:
          endpoint: "https://observability-opensearch.observability.svc.cluster.local:9200"
          auth:
            authenticator: basicauth/os
          tls:
            insecure: false
            ca_file: /tls/ca.crt
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [otlp/jaeger]
        logs:
          receivers: [otlp]
          processors: [batch]
          exporters: [opensearch/logs]

Instrumenting your applications: Sending logs and traces

Exposing the collector service

If you are running your application on the same Kubernetes cluster, you can directly send logs to the collector service which is observability-otel-workers.observability.svc.cluster.local:4317. If you are running your application outside the cluster, you can expose the collector service using a LoadBalancer or NodePort service type. We recommand using a internal load balancer if your application is running in the same cloud provider as your Kubernetes cluster.

Here is an example of how to create such load balancer service:

apiVersion: v1
kind: Service
metadata:
  name: observability-otel-collector-alb
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"  # Annotation for internal load balancer
spec:
  type: LoadBalancer
  ports:
    - name: otlp-grpc
      port: 4317
      nodePort: 32007
      protocol: TCP
    - name: otlp-http
      port: 4318
      nodePort: 32008
      protocol: TCP
  selector:
    app.kubernetes.io/component: opentelemetry-collector
    app.kubernetes.io/instance: observability.observability-otel-workers
    app.kubernetes.io/part-of: opentelemetry

You can obtail the alb endpoint using kubectl get svc.

Instrumenting your application

To send traces to the OpenTelemetry Collector, you need to instrument your application using OpenTelemetry libraries. Most languages have zero code instumentation support, which means you can automatically collect logs with miminal changes.

Check the Zero code instrumentation documentation for your language: https://opentelemetry.io/docs/concepts/instrumentation/zero-code/

A critical note on security

Out of the box, the Jaeger UI lacks built-in authentication, which is a non-starter for production. You must place it behind an authentication proxy. We recommend using oauth2-proxy integrated with an OIDC provider like Keycloak. This creates a seamless Single Sign-On (SSO) experience for both OpenSearch Dashboards and Jaeger. Other options include placing Jaeger behind a VPN or a reverse proxy with basic authentication.

Conclusion

Moving away from traditional ELK to a stack powered by OpenTelemetry and OpenSearch offers you a more flexible, efficient, and truly open-source solution for observability. Now we add Jaeger to the mix for distributed tracing. This modern stack frees you from vendor lock-in, provides a great developer experience without the runaway costs of proprietary SaaS solutions.

Cheers!

ELK alternative: Modern log management setup with Opentelemetry and Opensearch
Part of this series, where we are setting up log managment

Securing Opensearch with OIDC
This article details how to use Keycloak to secure Opensearch

ELK Alternative: With distributed tracing using Opentelemetry and Opensearch