Open Tracing via Jaeger

3 min readDec 14, 2021

Problem Statement

In a distributed application it is difficult to debug when things go wrong( when a request is fulfilled via multiple services). The two common tools to figure out root cause of the problem are logging and metrics. But the fact of the matter is that logs and metrics fail to give us complete picture of a situation in a distributed system.

Logging and Metrics are not enough to build an observable systems. The idea is to apply and bring in distributed tracing so that we can get :-

1)Distributed transaction monitoring
2)Root cause analysis
3)Performance and latency optimization
4)Service dependency analysis

Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. It helps bring visibility into systems. — Wikipedia

During the execution of the flow if something goes wrong, debugging is a nightmare. You never now which part of the system failed.

We do have Logs & metrics for the services but logs do not give complete picture because they are scattered across a number of log files and it is difficult to link them together to form a shared context. Metrics also can tell you that service is having high response time but it will not be able to help you easily identify the root cause.

As a result a lot of time is lost in defect triaging, determining ownership ,as services are owned by different teams, this results in high MTTR (which no one wants or like)

Solution Approach

Distributed tracing(via Jaeger) comes to the rescue. Distributed tracing has two parts:

Code instrumentation: This involves adding instrumentation code in your application to produce traces.
This involves collection of data and providing meaning over it. They also provide visualization tools to easily understand request lifetime.

Distributed tracing help tell stories of transactions that cross process boundaries.

Jaeger Components

When deploying Jaeger Tracing, you’ll need to address the following components:

Agent is the component co-located with your application to gather the Jaeger trace data locally. It handles the connection and traffic control to the Collector (see below) as well as data enrichment.
Collector is a centralized hub collecting traces from the various agents in the environment and sends for backend storage. The collector can run validations and enrichment on the spans.
Query retrieves the traces and serves them over a UI.