Do SLAs, Error Budgets, and Availability Metrics Include Maintenance Windows?

šŸ”§ Do SLAs, Error Budgets, and Availability Metrics Include Maintenance Windows?

When it comes to service reliability, maintenance windows can be a gray area. Whether you're tracking uptime, setting SLOs, or managing customer expectations through SLAs, the question often comes up:

ā€œShould scheduled maintenance count against our SLA? What about our error budget or availability metrics?ā€

Let’s unpack how scheduled (and unscheduled) maintenance affects your SLAs, error budgets, and availability calculations — and what best practices look like.

image


šŸ“œ SLA: Do Maintenance Windows Count?

Service Level Agreements (SLAs)Ā are typicallyĀ contractual commitmentsĀ made to customers, promising a certain level of service availability (e.g., 99.9% uptime).

āœ…Ā Planned Maintenance Is Usually Excluded

Most SLAsĀ exclude scheduled and communicated maintenance windowsĀ from downtime calculations. That means if:

  • Maintenance was planned,

  • Properly communicated in advance (often 24–72 hours), and

  • Done within agreed-upon time windows (e.g., off-peak hours),

…it usually doesĀ not count against the SLA.

āŒ Unscheduled or Overrun Maintenance May Count

However, if:

  • The maintenance wasn't properly communicated,

  • It ran longer than scheduled,

  • It was done during peak usage without approval,

…itĀ can count as downtimeĀ and lead to SLA violations or service credits.


šŸŽÆ Error Budgets: Are They Affected by Maintenance?

Error budgetsĀ represent the amount of failure or unreliability tolerated over a period, based on anĀ SLOĀ (Service Level Objective). If your SLO is 99.9% uptime per month, your error budget is aboutĀ 43.2 minutesĀ of allowed downtime.

🚫 Planned Maintenance Usually Doesn’t Burn Budget

If maintenance is pre-approved and doesn't disrupt users, it’s typically excluded from the error budget — especially in SRE frameworks that prioritizeĀ user-perceived reliability.

āœ…Ā User-Impacting Events Do Burn Budget

If users are affected — even during scheduled maintenance — some orgs choose to count it against the error budget. The key question is:

ā€œWould a user notice or be blocked?ā€

If yes, it probably burns error budget. If no, it likely doesn't.


šŸ“ˆ Availability: Does Maintenance Affect It?

AvailabilityĀ is the actualĀ measured uptimeĀ of your service over a specific period — typically expressed as a percentage like 99.95%.

Whether maintenance counts against it depends onĀ how you define availabilityĀ in your metrics.

šŸ”øĀ User-Facing Availability

If your availability metric reflectsĀ user impact, planned maintenance that’s properly communicated isĀ often excluded.

šŸ”¹Ā System-Level (Strict) Availability

If you're measuring raw service uptime (e.g., pings, monitoring checks), all downtime — including planned maintenance — might be included.


šŸ“Œ Summary Table

Maintenance Type Counts Toward SLA? Burns Error Budget? Affects Availability?
Planned & Communicated āŒ Usually Not āŒ Usually Not āŒ* If defined that way
Unplanned or Overrun āœ… Yes āœ… Yes āœ… Yes
Poorly Communicated āœ… Yes āœ… Yes āœ… Yes

🧠 Best Practices

  • šŸ“‘Ā Define everything explicitly: Make sure SLAs, SLOs, and availability metrics clearly state how maintenance is handled.

  • šŸ“£Ā Communicate proactively: Proper notification is key to excluding maintenance from SLAs and error budgets.

  • šŸŽÆĀ Focus on user impact: Base decisions on whether users are affected, not just whether systems are up or down.

  • šŸ¤Ā Align across teams: Ensure engineering, product, and legal are aligned on how you track and report service health.

comments powered by Disqus