Observability is the ability to understand what a system is doing by looking at its outputs, usually Metrics, Logging, and traces.
It matters most when a system is distributed or has many moving parts, because you often cannot infer the full state of the system from a single component.
Common observability signals are:
- Metrics, such as latency, error rate, and CPU usage
- Logging, which records discrete events
- Traces, which show how a request moves through the system
Why people use it
Observability helps you answer questions such as:
- Is the system healthy?
- What changed when latency increased?
- Where are errors coming from?
- Which dependency is slowing requests down?
Compare
Observability is the broad concept. It covers the overall ability to inspect and reason about a system from its outputs.
Traceability is narrower. It is about following a specific request, event, or item through a system.
Related idea
Observability is often discussed together with Metrics, Logging, Traceability, monitoring, and distributed tracing.