Observability strategies to not overload engineering teams — OpenTelemetry Strategy.

Observability strategies to not overload engineering teams — OpenTelemetry Strategy.

Table of Contents

OpenTelemetry provides capabilities to democratize observability data and empowers engineering teams.

One of the strategies I mentioned in my first post, Observability strategies to not overload engineering teams, is leveraging the OpenTelemetry auto instrumentation approach, to help us achieve observability without requesting engineering efforts.

Today, I’ll show you how to collect metrics and traces from a python service without code changes.

Info

To keep the content cleaner, I will leave some settings like the Prometheus configuration file.

What is OpenTelemetry Auto Instrumentation

OpenTelemetry provides an auto-instrumentation feature, that aims to act as an agent collecting telemetry data from a non-instrumented application.

This is what most of the O11y vendors such as New Relic and Data Dog does to collect telemetry data and push it to their platforms, this is a valuable feature because engineering teams can achieve observability with zero instrumentation effort.

Demo application

To help us achieve what we need to understand how the auto instrument works, I’ve created the following simple python service.

from flask import Flask

app = Flask(__name__)

@app.route("/status")
def server_request():
    return "Ok"

if __name__ == "__main__":
    app.run("0.0.0.0", port=9090)
}

Docker Image

This Docker image has all the required dependencies including OpenTelemetry auto instrumentation packages.

# syntax=docker/dockerfile:1

FROM python:3.8
WORKDIR /app

RUN pip3 install opentelemetry-distro opentelemetry-exporter-otlp flask
RUN opentelemetry-bootstrap --action=install

COPY ./app/server.py .

CMD opentelemetry-instrument \
    python \
    server.py

Before we move forward, I would like to highlight the cmd entry, where we’re decorating the python server.py command with the opentelemetry-instrument command, this is what is going to do the auto instrumentation work, we don’t need to change anything else on the application side to collect telemetry data.

Service Infrastructure

Using the following configuration, let’s build the docker image we created above.

The opentelemetry-instrument accepts a couple of flags or environment variables, that allow you to configure protocols and also properties, for more information about the available environment variables, please check this link.

version: '3.8'
services:
  server:
    build:
      context: .
      dockerfile: ./app/Dockerfile
    environment:
      - OTEL_TRACES_EXPORTER=otlp
      - OTEL_SERVICE_NAME=server
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel:4317
    ports:
      - 9091:9090

Above, we’re configuring the traces exporter to use the otel format, setting the service name that will be present on the traces as server , and providing the opentelemetry-collector endpoint.

Observability Infrastructure

Now let’s add the building blocks to provide the observability tooling infrastructure.

OpenTelemetry Collector

The Opentelemetry collector is the component that will receive, process, and also export the telemetry data produced by the python application to the backends such as Prometheus and Jaeger.

OpenTelemetry Collector Configuration

The configuration below will receive the traces produced by the application, and then process all spans to export them to jaeger.

receivers:
  otlp:
    protocols:
      grpc:
  
  # Dummy receiver that's never used, because a pipeline is required to have one.
  otlp/spanmetrics:
    protocols:
      grpc:
        endpoint: "localhost:65535"

processors:
  batch:

  spanmetrics:
    metrics_exporter: prometheus

exporters:
  otlp:
    endpoint: jaeger:4317
    tls:
      insecure: true

  jaeger:
    endpoint: "jaeger:4317"
    tls:
      insecure: true
  
  prometheus:
    endpoint: "0.0.0.0:8989"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [spanmetrics, batch]
      exporters: [otlp]
    
    metrics/spanmetrics:
      receivers: [otlp/spanmetrics]
      exporters: [prometheus]

Since we have all the spans passing through the collector, we can leverage a processor called spanmetrics that exposes metrics about the number of calls and also latency for every operation using the Prometheus standard.

This approach helps us to generate metrics based on spans and have two different telemetry data out of the box.

Docker-Compose file

Now we have the OpenTelemetry Collector configuration, we can spin up Jaeger and Prometheus, using the following configuration

version: '3.8'
services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./conf/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    ports:
      - 9090:9090

  jaeger:
    image: jaegertracing/all-in-one:1.39
    restart: unless-stopped
    command:
      - --collector.otlp.enabled=true
    ports:
      - "16686:16686"
    environment:
      - METRICS_STORAGE_TYPE=prometheus
      - PROMETHEUS_SERVER_URL=http://prometheus:9090
    depends_on:
      - prometheus

  otel:
    image: otel/opentelemetry-collector-contrib
    command:
      - --config=/etc/otel-collector-config.yaml
    volumes:
      - ./conf/otel-collector-config.yaml:/etc/otel-collector-config.yaml
    depends_on:
      - jaeger

  server:
    build:
      context: .
      dockerfile: ./app/Dockerfile
    environment:
      - OTEL_TRACES_EXPORTER=otlp
      - OTEL_SERVICE_NAME=server
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel:4317
    ports:
      - 9091:9090
    depends_on:
      - otel

I just would like to highlight the environmental variables in the jaeger service, since we have the spanmetrics processor active on the OpenTelemetry Collector, we can leverage the SPM feature from jaeger, for more information please check this link.

The Final Result

It’s time to see the outcome; all of those configurations will support us in collecting telemetry data that will be useful for the entire company to start adopting observability without requiring engineering efforts.

HTTP Load Testing

To see properly the telemetry data, I’ll create a small load on the service using Vegeta.

echo "GET http://localhost:9091/checkouts" | vegeta attack -duration=60s | vegeta report

Jaeger Tracing

On the Tracing view, we can track the flow of requests across your platform, and gather useful data that will assist teams to understand performance issues as well as complex distributed architecture.

Jaeger — Trace View
Jaeger — Trace View

Jaeger SPM

The Jaeger Service Performance Monitor provides a service-level aggregation, as well as an operation-level aggregation within the service, of Request rates, Error rates, and Durations (P95, P75, and P50), also known as RED metrics.

Jaeger — Trace View
Jaeger — Trace View

This tab is filled with the information created by the span metrics processor on the Opentelemetry collector, and this is also available in Prometheus as we can see below.

Prometheus Metrics

As described above, the spanmetrics the processor creates two metrics such as calls_total and latency_bucket

Prometheus UI
Prometheus UI

Conclusion

This is a very simple example, and the main idea is to provide insights into what type of telemetry data could be collected using OpenTelemetry auto instrumentation.

The code is available on my GitHub account, feel free to look at and explore it by running it in your local environment.

Let me know if you’re leveraging this on your company to collect telemetry data or aim to use it.

Thanks 😃

comments powered by Disqus

Related Posts

Observability strategies to not overload engineering teams.

Observability strategies to not overload engineering teams.

No doubt, implementing a certain level of Observability at your company without requiring engineering effort, is a dream for everyone on that journey. Today I’m gonna share with you strategies, to help you implement Observability without adding cognitive load on your engineering teams.

Read More
Observability strategies to not overload engineering teams — Proxy Strategy.

Observability strategies to not overload engineering teams — Proxy Strategy.

A web proxy is a perfect place to start collecting telemetry data without required engineering efforts.

Read More