Observability is the ability to understand what a system is doing by looking at its outputs, usually Metrics, Logging, and traces.

It matters most when a system is distributed or has many moving parts, because you often cannot infer the full state of the system from a single component.

Common observability signals are:

  • Metrics, such as latency, error rate, and CPU usage
  • Logging, which records discrete events
  • Traces, which show how a request moves through the system

Why people use it

Observability helps you answer questions such as:

  • Is the system healthy?
  • What changed when latency increased?
  • Where are errors coming from?
  • Which dependency is slowing requests down?

Compare

Observability is the broad concept. It covers the overall ability to inspect and reason about a system from its outputs.

Traceability is narrower. It is about following a specific request, event, or item through a system.

Observability is often discussed together with Metrics, Logging, Traceability, monitoring, and distributed tracing.