Practical, team-focused operability techniques for distributed systems

Matthew Skelton, Conflux

In this talk, we explore five practical, tried-and-tested, real world techniques for improving operability with many kinds of software systems, including cloud, Serverless, on-premise, and IoT.

  • Logging as a live diagnostics vector with sparse Event IDs
  • Operational checklists and ‘Run Book dialogue sheets’ as a discovery mechanism for teams
  • Endpoint healthchecks as a way to assess runtime dependencies and complexity
  • Correlation IDs beyond simple HTTP calls
  • Lightweight ‘User Personas’ as drivers for operational dashboards

Based on our work in many industry sectors, we will share our experience of helping teams to improve the operability of their software systems through

Required audience experience

Some experience of building web-scale systems or industrial IoT/embedded systems would be helpful.

Objective of the talk

We will share our experience of helping teams to improve the operability of their software systems. Attendees will learn some practical operability approaches and how teams can expand their understanding and awareness of operability through these simple, team-friendly techniques.

Track 2
Location: Date: May 17, 2018 Time: 1:35 pm - 2:20 pm Matthew Skelton, Conflux Matthew Skelton, Conflux