Engineering

Debugging Microservices with Distributed Tracing

Distributed tracing connects logs across services using trace IDs. Learn how to implement it, propagate context, and use LogFlow's trace correlation to debug microservice issues in minutes.

LogFlow TeamMay 15, 202612 min read

Microservices create a debugging nightmare. A single user action triggers calls across five services, and when something fails, the error surfaces in service C but originated in service A. Without distributed tracing, you're manually correlating timestamps and guessing.

TL;DR: Add a traceId to every request and propagate it through all service calls. Every log line includes the traceId, so you can reconstruct the full request flow across services with a single search.

What is Distributed Tracing?

A trace is the complete journey of a single request through your system. A span is one unit of work within that trace — a single service handling the request, a database query, or an external API call.

In a traditional monolith, all logs for a request share the same process. In microservices, they're spread across services, containers, and machines. Distributed tracing solves this by assigning a unique traceId at the entry point and passing it through every subsequent call.

Trace IDs vs. Span IDs

  • Trace ID (traceId): Unique identifier for the entire request, shared across all services. Use this to find all logs for a specific user request.
  • Span ID (spanId): Identifies one unit of work within the trace. A trace can have many spans. Use this to measure duration at each step.
  • Parent Span ID: Links child spans to their parent, enabling visualization of the call tree.

For most debugging purposes, you only need traceId. Span IDs become valuable when you want to build a service map or measure which step is slow.

Implementing Trace Propagation

Step 1: Generate at the Entry Point

The API gateway or first service generates the trace ID:

import { randomUUID } from 'crypto'

// Express middleware
export function traceMiddleware(req, res, next) {
  const traceId = req.headers['x-trace-id'] || `tr_${randomUUID().replace(/-/g, '')}`
  const spanId = randomUUID().replace(/-/g, '').slice(0, 16)

  res.setHeader('x-trace-id', traceId)
  req.traceId = traceId
  req.spanId = spanId

  next()
}

Step 2: Log Every Request with the Trace ID

import logger from './lib/logger'

app.use(traceMiddleware)

app.use((req, res, next) => {
  const start = Date.now()
  res.on('finish', () => {
    logger.info('http.request', {
      method: req.method,
      path: req.path,
      status: res.statusCode,
      duration: Date.now() - start,
      traceId: req.traceId,
      spanId: req.spanId,
    })
  })
  next()
})

Step 3: Forward the Trace ID to Downstream Services

When service A calls service B, it must forward the trace ID in the request headers:

async function callPaymentService(orderId, traceId) {
  const response = await fetch(`${PAYMENT_SERVICE_URL}/charge`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-trace-id': traceId,   // propagate!
    },
    body: JSON.stringify({ orderId }),
  })
  return response.json()
}

Step 4: AsyncLocalStorage for Automatic Propagation

Passing traceId as a parameter to every function is tedious. Use AsyncLocalStorage to make it implicit:

import { AsyncLocalStorage } from 'async_hooks'

const store = new AsyncLocalStorage()

export function runWithTrace(traceId, spanId, fn) {
  return store.run({ traceId, spanId }, fn)
}

export function getTraceContext() {
  return store.getStore() || {}
}

// Logger automatically adds trace context
export function createContextualLogger(base) {
  return {
    info: (msg, data) => base.info(msg, { ...getTraceContext(), ...data }),
    error: (msg, data) => base.error(msg, { ...getTraceContext(), ...data }),
    warn: (msg, data) => base.warn(msg, { ...getTraceContext(), ...data }),
  }
}

Now your logger automatically includes traceId and spanId in every log line, even inside deeply nested function calls.

Using OpenTelemetry

If you're already using or planning to use OpenTelemetry, it handles trace propagation automatically using the W3C Trace Context standard (traceparent header). The trace ID from OTel's span becomes your log's trace ID.

import { trace, context } from '@opentelemetry/api'

function getCurrentTraceId() {
  const span = trace.getActiveSpan()
  if (!span) return undefined
  return span.spanContext().traceId
}

logger.info('Processing order', {
  orderId,
  traceId: getCurrentTraceId(),
})

See our OpenTelemetry tutorial for a complete setup with context propagation.

Trace Correlation in LogFlow

LogFlow stores trace_id as a dedicated column in ClickHouse, making trace searches fast regardless of log volume.

Searching by Trace ID

In the Logs Explorer, use the structured search:

traceId:tr_abc123def456

This returns every log line from every service that handled that request, in chronological order.

Trace View Mode

When you click "View trace" on any log line with a trace_id, LogFlow enters Trace View mode:

  • All logs for the trace, sorted chronologically
  • Duration badges showing relative time (+0ms, +23ms, +156ms)
  • Service and level badges for each log line
  • Full log details on click

This lets you reconstruct the exact sequence of events across all services in seconds.

URL-Based Trace Linking

LogFlow supports ?traceId= URL parameters, so you can link directly to a trace from your incident management tool, Slack alerts, or error tracking system:

https://app.getlogflow.com/dashboard/logs?traceId=tr_abc123def456

Debugging a Real Incident

Here's how distributed tracing changes the debugging workflow:

Without trace IDs:

  1. User reports "payment failed"
  2. Search logs for "payment" — find 10,000 results
  3. Manually scan timestamps to find the relevant ones
  4. Realize the error is in a different service
  5. Repeat for each service
  6. Total time: 45 minutes

With trace IDs:

  1. User reports "payment failed" — they got a traceId in the error response
  2. Search LogFlow: traceId:tr_abc123def456
  3. See all 23 log lines from 4 services in one view
  4. See the error in the payments service, the root cause in the inventory service
  5. Total time: 3 minutes

Best Practices

Always include the trace ID in error responses (in development/staging). This lets users and support teams give you actionable information:

{
  "error": "Payment processing failed",
  "traceId": "tr_abc123def456",
  "message": "Please include this ID when contacting support"
}

Use short, URL-safe trace IDs. The UUID format (550e8400-e29b-41d4-a716-446655440000) works but is verbose. Consider a shorter format like tr_ + 24 hex characters.

Log at service boundaries, not inside every function. Each HTTP request, database query, and external API call should log on entry and exit. Internal function calls generally don't need their own logs unless they're the kind of thing that fails independently.

Set a sampling strategy for high-traffic services. If you process 10,000 requests per second, logging every request is expensive. Sample 1-10% of successful requests and 100% of errors. See our log sampling guide for details.

Internal Links

Start monitoring your logs today

Free plan available. No credit card required. Up and running in 2 minutes.

Get started free