Observability strategies to not overload engineering teams — Proxy Strategy.
- Nicolas takashi
- Observability , Infrastructure
- October 6, 2022
Table of Contents
A web proxy is a perfect place to start collecting telemetry data without required engineering efforts.
One of the strategies I mentioned in my first post, Observability strategies to not overload engineering teams, is collecting telemetry data from Proxies which usually exist before your services.
Today, I’ll show you how to collect metrics and traces from an NGINX Proxy using Open Source solutions such as Prometheus and Jaeger.
Info
To keep the content cleaner, I’m going to leave out some settings like Prometheus and NGINX.
Service Infrastructure
Using the following configuration, let’s build a fake scenario using an echo server to mimic two applications behind an NGINX proxy.
version: '3.8'
services:
proxy:
image: nginx
container_name: proxy
restart: unless-stopped
ports:
- 8080:8080
volumes:
- ./conf/nginx/nginx.conf:/etc/nginx/nginx.conf
checkoutapi:
image: hashicorp/http-echo
container_name: checkoutapi
restart: unless-stopped
command: -text="Checkout API"
paymentapi:
image: hashicorp/http-echo
container_name: paymentapi
restart: unless-stopped
command: -text="Payment API"
Right after bringing up the environment, you will have two different services running and accessible through the following addresses.
- http://localhost:8080/checkouts
- http://localhost:8080/payments
Collecting Metrics
We must utilize the NGINX Exporter to collect the metrics because NGINX by default doesn’t expose them.
NGINX Exporter
To enable the NGINX Exporter, we just need to add the following configuration to the docker-compose file.
proxy_exporter:
image: nginx/nginx-prometheus-exporter:0.10.0
container_name: proxy_exporter
restart: unless-stopped
command: -nginx.scrape-uri=http://proxy:8080/nginx_status
If the NGINX metrics which are currently provided are insufficient for your use case, check out this incredible article that demonstrates how to use NGINX logs to generate Prometheus metrics.
Collecting Traces
Because I’m already using the NGINX docker image with the OpenTracing Module embedded in this example, we just need to provide jaeger-config.json the following configuration.
{
"service_name": "nginx_proxy",
"diabled": false,
"reporter": {
"logSpans": true,
"localAgentHostPort": "jaeger:6831"
},
"sampler": {
"type": "const",
"param": "1"
}
}
This configuration samples every HTTP Request made to the NGINX Proxy and sends it to the Jaeger.
Observability Infrastructure
To spin up Prometheus and Jaeger, you just need to have the following containers running:
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
volumes:
- ./conf/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--web.enable-lifecycle'
ports:
- 9090:9090
jaeger:
image: jaegertracing/all-in-one
restart: unless-stopped
ports:
- "16686:16686"
The Final Result
It’s time to see the outcome; all of those configurations will support us in collecting telemetry data that will be useful for the entire company to start adopting observability without requiring engineering efforts.
Prometheus
Talking about metrics, we can now query the number of HTTP requests that were handled by the NGINX Server, as well as many other metrics about the NGINX Proxy.
Tracing
On the Tracing view, we can track the flow of requests across your platform, and gather useful data that will assist teams to understand performance issues as well as complex distributed architecture.
Conclusion
This is a very simple example, and the main idea is to provide insights into what type of telemetry data could be collected from the proxy layer; if you need more details and context, you can leverage log information to generate detailed metrics or add extra metadata to the traces.
The code is available on my GitHub account, feel free to look at it and explore it by running it in your local environment.
Let me know if you’re leveraging this on your company to collect telemetry data or aim to use it.
Thanks 😃