Reference

sre.yaml reference

Every field in the sre.yaml schema explained.

Full example

service: payments-api
team: platform-engineering

slos:
  - name: availability
    target: 99.9
    window: 30d
    indicator:
      metric: http_requests_total
      good_filter: 'status!~"5.."'

error_budget:
  burn_rate_alerts:
    - rate: 14.4
      severity: critical
      remediate: scale-up
      notify:
        slack: "#incidents"
        pagerduty: your-routing-key
    - rate: 6.0
      severity: warning
      notify:
        slack: "#sre-warnings"

runbooks:
  scale-up:
    mode: auto
    steps:
      - kubectl scale deploy/payments --replicas=+2
      - wait: 60s
      - assert: availability > 99.9

oncall:
  provider: pagerduty
  escalation_minutes: 10
  notify_slack: "#sre-incidents"

dashboards:
  provider: grafana
  auto_generate: true

Field reference

Field

Type

Required

Description

servicestringyesName of your service. Used in notifications and dashboards.
teamstringyesTeam responsible for this service.
sloslistyesList of SLO definitions. At least one required.
slos[].namestringyesName of the SLO.
slos[].targetfloatyesTarget percentage. Must be between 0 and 100.
slos[].windowstringyesRolling window for the SLO calculation.
error_budget.burn_rate_alertslistnoBurn rate thresholds that trigger actions.
burn_rate_alerts[].ratefloatyesBurn rate multiplier. 14.4 = critical (budget gone in 2 days).
burn_rate_alerts[].severitystringyescritical or warning.
burn_rate_alerts[].remediatestringnoName of the runbook to execute.
runbooksmapnoNamed runbooks with executable steps.
runbooks[].modestringnoauto (execute immediately) or semi-auto (post to Slack for approval).
runbooks[].stepslistyesShell commands to execute in order.
oncall.providerstringnoOn-call provider. Currently: pagerduty.
dashboards.auto_generateboolnoAuto-generate Grafana dashboard from SLO definitions.