Site Reliability / Platform Engineering
Keep production running at scale
SLOs/SLIs, on-call, observability, incident response, capacity planning, and the platform tooling that makes other engineers productive. Senior version of cloud engineering. High salary, high impact.
Salary range
$105k – $180k
entry → mid US
Time to complete
10 wk
20 wk part-time
Lessons
12
4 phases
Capstone
Yes
real cloud account
Roles this path prepares you for
- Site Reliability Engineer (SRE)
- Platform Engineer
- Senior DevOps Engineer
- Production Engineer
Curriculum
SRE Foundations
Free previewWhat SRE is, SLOs/SLIs/error budgets, blameless postmortems.
3 lessons
Observability + On-Call
LockedPrometheus + Grafana, OpenTelemetry tracing, alerting + on-call, distributed tracing.
5 lessons
Platform Engineering
LockedRunbooks + toil, chaos engineering, capacity + cost, platform engineering practice.
3 lessons
Career Launch (SRE)
LockedSRE portfolio, system-design + on-call interviews, resume tuning.
1 lessons
Capstone project + completion certificate
Run an SLO program for a multi-service stack
Deploy a 3-service stack, define SLOs/SLIs, build a dashboard, write runbooks, simulate an incident and run the response.
Deliverables
- ·SLO dashboard in Grafana with 3+ services
- ·Runbooks for at least 2 alert types
- ·Incident timeline document from a simulated outage
- ·Error budget policy document
Before you start
- ·Cloud Foundations + DevOps Engineering (or equivalent)
- ·Comfortable with Kubernetes
What you walk out with
- Run an SLO program
- Lead production incidents
- Build internal developer platforms
- Interview for SRE / Platform Engineer roles
Preview — opening soon
Curriculum is built, content is being polished. Sign up now and you'll get notified the moment this path opens to enrollment.
Sign up to be notified