Optimizing Latency in Microservices

Peter Lawrey’s talk breaks down how to engineer microservices that consistently respond in microseconds rather than milliseconds.

Key Takeaways

  • Prefer mechanical sympathy: align data structures with CPU cache lines and avoid false sharing.
  • Use binary protocols and preallocated buffers to eliminate GC pressure during hot paths.
  • Treat the network like a critical dependency—pin threads, batch syscalls, and monitor queue depths.
  • Benchmark your service using matching payload sizes and concurrency; synthetic tests often mislead.
  • Instrument the 99.9th percentile latency and trace hop-by-hop to locate jitter.
  • Document predictable fallback behaviour (timeouts, retry budgets) so partial failures do not cascade.