How To Land an SRE Role

Mar 4, 2022 • Guide

Contents

*This post is a work in progress, but may still be useful as is. Please be patient.

Cloud Certificate

If you don't have a Cloud Certificate yet, I would recommend either the AWS Certified Cloud Practitioner or Microsoft Certified: Azure Fundamentals exams. These are the intro level certificates for each cloud service. Having one of these will definitely help your chances of landing an interview.

Study Material

While studying I highly recommend you create an account and follow along. All the public cloud providers provide some free services or free credits to get you started.

Operating Systems

No two SRE roles are created equal. Most will have you working on Linux Systems, and some Windows/both.

Scripting Languages

Bash is the command shell and scripting language for the majority of Linux systems.
While PowerShell is the command shell and scripting language for the majority of Windows systems.

Some of the most important concepts to know and be able to troubleshoot are Networking, Process Management, Threads/Concurrency, I/O Management, Virtualization, Memory storage, File systems, etc.

Networking

This is a very broad topic, but some of the most important concepts are DNS, HTTPS, Virtual Networks, Network Security Groups, Bastions

It is good to be familiar with the OSI Model. It is the universal language for computer networking. It splits up the communication system into seven abstract layers. This can be useful when debugging networking issues.

Docker

In reality, this section is also about containers.

Best thing you can do is, install Docker Desktop and find a tutorial like this one to get you started.

Once you're somewhat comfortable, take it to the next level by installing Docker on an AWS or Azure instance and run some sort of web application on it. See if you can access the web application from a public address. Doing this will start to combine everything you've learned so far together. This is the kind of experimenting that will begin to give you the experience needed for a Cloud based role.

Terraform

Now that you're familiar with Cloud Providers and Containers, let's keep using those Azure/AWS credits and take it up one more level.

The next step is to have Terraform create our infrastructure/instances for us. As a best practice clicking around in the Cloud UI should be kept to a minimum.

Monitoring, Logging, Alerting, Outages

Look into services like Splunk, Datadog & Elastic Stack, which are used for Log Analysis, Monitoring and Management.

SLI/SLO

RCA/Post Mortem

There is a great podcast called The Downtime Project where they go analyze Outages, RCA's & Post Mortem released by major companies.

Interview - How to Prepare

Besides being moderately comfortable with everything mentioned above, there a few things you can do to better prepare for an SRE interview.

You should read the entire Google SRE Book and Google SRE Workbook, these are the unofficial SRE bibles.

Sites like Leetcode are great for the coding part of the interview. They also have problem sets on Concurrency/Race Conditions, which are typically seen in SRE interviews.

This is a great SRE Discord. It even has a channel dedicated to interviewing questions.

Get in Touch