Balancing Act: Cutting Kubernetes Event Storage Costs Without Losing Visibility with OpenTelemetry
- Nicolas takashi
- Observability , Open telemetry , Kubernetes
- November 25, 2023
A few weeks ago I wrote a blog post about efficient Kubernetes event tracking . In that blog post I explained how you can collect Kubernetes events using OpenTelemetry, and I also explained why I think OpenTelemetry is a great solution for Kubernetes event collection.
After I shared that blog post, I received a lot of feedback from the community, and one of the most common feedback was about the storage costs, mainly because the Kubernetes events are stored as logs, and logs are generally expensive to store.
So I decided to write this blog post to share how you can reduce the storage costs of your Kubernetes events without losing visibility, using an initial idea from my previous blog post for event filtering.
Initial OpenTelemetry Configuration
As a starting point, let’s use the same OpenTelemetry configuration from my previous blog post:
receivers:
k8sobjects:
objects:
- name: event
mode: watch
group: events.k8s.io
processors:
batch:
filter:
logs:
log_record:
- 'IsMatch(body["reason"], "(Pulling|Pulled)") == true'
exporters:
otlp:
endpoint: otelcol:4317
service:
pipelines:
logs:
receivers: [k8sobjects]
processors: [filter,batch]
exporters: [otlp]
This configuration works pretty well, and it’s already filtering high-volume events, such as Pulling and Pulled events, but we’re losing visibility of these events, you want to drop the whole event, but you want to keep the visibility about how many events of the discarded type happened.
Enriching Kubernetes Events
First, let’s add a new processor to the OpenTelemetry configuration, this processor is responsible for enriching the Kubernetes Events resource with the event reason, so we can use it later to filter and count the events.
receivers:
k8sobjects:
objects:
- name: event
mode: watch
group: events.k8s.io
processors:
batch:
transform:
error_mode: ignore
log_statements:
- context: log
statements:
- merge_maps(cache, body["object"], "upsert")
- set(attributes["event.reason"], ConvertCase(cache["reason"], "lower"))
exporters:
otlp:
endpoint: otelcol:4317
service:
pipelines:
logs:
receivers: [k8sobjects]
processors: [transform, batch]
exporters: [otlp]
Info
I have taken the step of temporarily removing the filter processor from the pipeline.
Counting Kubernetes Events
Before we can filter the events, we need to count them, so we can keep the visibility about the amount of events per reason the Kubernetes Cluster is producing, even if we don’t export all these events to the backend.
Luckily, OpenTelemetry has a processor that can help with that, the count processor, this processor can be used to count the number of events per reason and export them as metrics, after adding this processor to the pipeline, the configuration looks like this:
Note
If you’re not familiar to the OpenTelemetry connectors, I recommend you to read the OpenTelemetry Connectors documentation
receivers:
k8sobjects:
objects:
- name: event
mode: watch
group: events.k8s.io
connectors:
count:
logs:
k8s.events.count:
description: "Count the number of events"
conditions:
- 'attributes["k8s.resource.name"] == "events"'
attributes:
- key: event.reason
default_value: unknown
processors:
batch:
transform:
error_mode: ignore
log_statements:
- context: log
statements:
- merge_maps(cache, body["object"], "upsert")
- set(attributes["event.reason"], ConvertCase(cache["reason"], "lower"))
exporters:
otlp:
endpoint: otelcol:4317
prometheus:
endpoint: localhost:9090
service:
pipelines:
logs:
receivers: [k8sobjects]
processors: [transform, batch]
exporters: [count]
metrics:
receivers: [count]
processors: [batch]
exporters: [prometheus]
Let’s break down this configuration:
I added a new connectors section to the configuration, this section is responsible for creating a new metric called k8s.events.count, and it will count the number of events per reason, but only for the entries that have the k8s.resource.name attribute set to events.
Instead of exporting the events to the otlp exporter, I’m exporting the events to the count connector.
Added a new pipeline called metrics, this pipeline is responsible for receiving the metrics from the count connector, and then it will apply the batch processor, and finally it will export the metrics to the prometheus exporter.
Adding filter processor back to the pipeline
Now it’s time to add the filter processor back to the pipeline, but this time we’re going to leverage the event.reason attribute to filter the events, so we can drop the events that we don’t want to export to the backend, the configuration looks like this:
receivers:
k8sobjects:
objects:
- name: event
mode: watch
group: events.k8s.io
connectors:
count:
logs:
k8s.events.count:
description: "Count the number of events"
conditions:
- 'attributes["k8s.resource.name"] == "events"'
attributes:
- key: event.reason
default_value: unknown
processors:
batch:
transform:
error_mode: ignore
log_statements:
- context: log
statements:
- merge_maps(cache, body["object"], "upsert")
- set(attributes["event.reason"], ConvertCase(cache["reason"], "lower"))
filter:
logs:
log_record:
- 'IsMatch(attributes["event.reason"], "(pulling|pulled)") == true'
exporters:
otlp:
endpoint: otelcol:4317
prometheus:
endpoint: localhost:9090
service:
pipelines:
logs:
receivers: [k8sobjects]
processors: [transform, batch]
exporters: [count]
metrics:
receivers: [count]
processors: [batch]
exporters: [prometheus]
You probably noticed that I didn’t add the filter processor to the logs pipeline, this is because I will lose the visibility of the events, and I don’t want that. What I want is to filter the events after they have been counted, so I can drop the events that I don’t want to export to the backend, but still keep the visibility of the events.
Unfortunately, this is not possible with the current OpenTelemetry configuration, because the filter processor is applied before the count connector, so we need to find a way to work around this.
Splitting the OpenTelemetry Pipeline
To achieve what I described in the previous section, we need to split the OpenTelemetry logs pipeline in two pipelines, one pipeline will be responsible for counting the events, and the other pipeline will be responsible for exporting the events to the backend.
This is super simples to do, because the OpenTelemetry collector provides another connector called forward, and we can foward the events from one pipeline to another pipeline, so the configuration looks like this:
receivers:
k8sobjects:
objects:
- name: event
mode: watch
group: events.k8s.io
connectors:
forward:
count:
logs:
k8s.events.count:
description: "Count the number of events"
conditions:
- 'attributes["k8s.resource.name"] == "events"'
attributes:
- key: event.reason
default_value: unknown
processors:
batch:
transform:
error_mode: ignore
log_statements:
- context: log
statements:
- merge_maps(cache, body["object"], "upsert")
- set(attributes["event.reason"], ConvertCase(cache["reason"], "lower"))
filter:
logs:
log_record:
- 'IsMatch(attributes["event.reason"], "(pulling|pulled)") == true'
exporters:
otlp:
endpoint: otelcol:4317
prometheus:
endpoint: localhost:9090
service:
pipelines:
logs:
receivers: [k8sobjects]
processors: [transform, batch]
exporters: [count, forward]
logs/filtered:
receivers: [forward]
processors: [filter, batch]
exporters: [otlp]
metrics:
receivers: [count]
processors: [batch]
exporters: [prometheus]
Ok, let’s break down this configuration:
The first pipeline named logs, is responsible for receive the events from the Kubernetes API, and then it will apply the transform and batch processors, and finally it will export the events to the count connector, and to the forward connector.
The second pipeline named logs/filtered, is responsible for receive the events from the forward connector, and then it will apply the filter and batch processors, and finally it will export the events to the otlp exporter.
The third pipeline named metrics, is responsible for receive the metrics from the count connector, and then it will apply the batch processor, and finally it will export the metrics to the prometheus exporter.
Conclusion
This is what I most like about the OpenTelemetry collector, you can design a observability pipeline that fits your needs, do all the processing that you need, and then export the data to the backend, avoiding unnecessary costs on the backend.
The strategy that I described in this blog post is a great way to reduce costs on the backend, and can be applied to other use cases, not only for Kubernetes events, but also for other high-volume data sources, such as logs, metrics, and traces.
I hope you enjoyed this blog post, and if you have any questions, please feel free to reach out to me on Twitter . Donβt forget to share this post with your friends and colleagues. π