Tips

How Log Sampling Can Cut Your Logging Costs by 80%

Not all logs are equal. Drop health check endpoints, sample debug noise, and keep what matters. A practical guide to log sampling strategies that reduce costs without losing visibility.

LogFlow TeamMay 1, 20266 min read

Log management costs scale with volume. A poorly configured application can generate hundreds of gigabytes per month — mostly redundant health check pings and verbose debug output that nobody ever reads.

The fix isn't to log less indiscriminately. It's to log strategically: keep 100% of errors and anomalies, sample or drop the noise.

TL;DR: Health check endpoints, successful high-frequency operations, and debug-level logs in production account for 80-90% of log volume but less than 5% of debugging value. Drop them and you've solved your cost problem.

Where the Volume Comes From

Before sampling anything, understand what you're actually generating. In a typical web application:

Source % of volume % of debugging value
Health check endpoints (/health, /ping) 30-40% ~0%
Successful requests (200 status) 20-30% 5-10%
Debug-level logs 15-25% 2-5%
Cron job heartbeats 5-10% 1%
Errors and warnings 5-15% 80-90%

The math is brutal: you're paying for the first four categories mostly to have the last one.

Strategy 1: Drop Known-Useless Endpoints

Start with zero-regret drops. You will never need these logs:

const SKIP_PATHS = new Set([
  '/health',
  '/healthz',
  '/ping',
  '/ready',
  '/metrics',
  '/_next/static',
  '/favicon.ico',
])

app.use((req, res, next) => {
  res.on('finish', () => {
    if (SKIP_PATHS.has(req.path)) return  // don't log
    logger.info('http.request', { method: req.method, path: req.path, status: res.statusCode })
  })
  next()
})

If your load balancer pings /health every 5 seconds across 10 instances, that's 1.7 million pointless log lines per day.

Strategy 2: Always Log Errors, Sample Successes

Errors need 100% capture. You never want to miss a 500 or a caught exception. But successful operations are candidates for sampling:

function shouldLog(level, statusCode) {
  // Always log errors and warnings
  if (level === 'error' || level === 'fatal' || level === 'warn') return true
  
  // Always log 4xx and 5xx responses
  if (statusCode >= 400) return true
  
  // Sample 10% of successful info logs
  if (level === 'info') return Math.random() < 0.10
  
  // Drop debug entirely in production
  if (level === 'debug') return process.env.NODE_ENV !== 'production'
  
  return true
}

A 10% sample of successful requests still gives you enough for latency percentile analysis and capacity planning. You see the shape of traffic without the full volume.

Strategy 3: Head-Based vs. Tail-Based Sampling

Head-based sampling decides at the start of a request whether to log it. Simple to implement, but you might discard requests that turn out to be interesting (a slow request that succeeds, for example).

Tail-based sampling buffers logs in memory and decides at the end. If the request took longer than 2 seconds or returned an error, you keep all its logs. Otherwise, discard. More accurate but requires buffering infrastructure.

For most teams, head-based sampling is good enough:

class SampledLogger {
  constructor(baseLogger, sampleRate = 0.1) {
    this.base = baseLogger
    this.sampleRate = sampleRate
    this.shouldCapture = Math.random() < sampleRate
  }

  info(msg, data) {
    if (this.shouldCapture) this.base.info(msg, data)
  }

  error(msg, data) {
    // Always capture errors regardless of sample decision
    this.base.error(msg, data)
  }
}

The Cost Math

Let's say you're on LogFlow's Growth plan with a 100 GB/month limit.

Your service currently generates:

  • 50 GB from health checks (dropping → 0 GB saved)
  • 30 GB from successful request logs (10% sample → 3 GB)
  • 10 GB from debug logs (dropping in prod → 0 GB)
  • 10 GB from errors and warnings (keeping 100%)

Before: 100 GB/month After: 0 + 3 + 0 + 10 = 13 GB/month

An 87% reduction. You could fit that on the Starter plan at $19/month instead of Growth.

What You Should Never Sample

Some log categories must remain at 100%:

  • Authentication events — every login, logout, failed attempt, and token refresh
  • Payment events — every transaction attempt, success, and failure
  • Error and fatal logs — never drop these
  • Security events — admin actions, permission changes, data exports
  • User-facing errors — anything shown to the user as an error

For these, the cost is justified. The risk of missing a security incident or payment failure far outweighs the storage cost.

Implementation with LogFlow

LogFlow doesn't apply sampling on its end — you control what gets sent. The best approach is to sample at the logger level before sending to the API:

import { createLogger } from '@getlogflow/js'

const base = createLogger({
  apiKey: process.env.LOGFLOW_API_KEY,
  service: 'api',
})

export const logger = {
  debug: (msg, data) => {
    if (process.env.NODE_ENV === 'development') base.debug(msg, data)
  },
  info: (msg, data) => {
    if (Math.random() < 0.1) base.info(msg, data)
  },
  warn: (msg, data) => base.warn(msg, data),
  error: (msg, data) => base.error(msg, data),
  fatal: (msg, data) => base.fatal(msg, data),
}

This keeps your LogFlow plan costs predictable while ensuring complete coverage where it matters.

See also: Node.js logging best practices and the LogFlow quickstart for SDK setup.

Start monitoring your logs today

Free plan available. No credit card required. Up and running in 2 minutes.

Get started free