Demo

Site Reliability Manager

Karsun Solutions
Washington, DC Full Time
POSTED ON 6/23/2024 CLOSED ON 12/26/2024

What are the responsibilities and job description for the Site Reliability Manager position at Karsun Solutions?

Overview

We are seeking a highly skilled and experienced Site Reliability Manager to join our team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our systems and services. They will lead a team of engineers in designing, implementing, and maintaining robust infrastructure and automation solutions.  The ideal candidate must reside in the Washington DC area and be available to work on site in downtown Washington DC as required.

Responsibilities

  • Lead a service delivery team of 8-20 people (Service Support specialist, DevSecOps and Site reliability engineers)  
  • Define and implement best practices for infrastructure as code, deployment automation, and monitoring
  • Collaborate with cross-functional teams to design scalable and fault-tolerant architectures.
  • Develop and maintain service level objectives (SLOs) and key performance indicators (KPIs) to measure system reliability and performance.
  • Conduct post-mortems and root cause analyses for incidents and implement preventive measures to mitigate future incidents.
  • Drive continuous improvement initiatives to enhance the reliability, scalability, and efficiency of our systems and services.
  • Mentor and coach team members to foster a culture of learning and innovation.

Qualifications and Education

Required:

  • Bachelor’s degree in computer science, Engineering, or a related field; Master's degree preferred.
  • 10 years of experience in a similar role managing a team of site reliability engineers and delivering in AWS cloud platform.
  • Proven track record of managing high-performance teams.
  • 5 years of experience supporting operations and maintenance for cloud-native applications in production that are fault-tolerant, self-healing, scalable and high available,
  • Deep understanding of cloud computing platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes).
  • Strong knowledge of infrastructure as code tools (e.g., Terraform, Ansible, ArgoCD) and CI/CD pipelines.
  • Experience with monitoring, logging, and observability tools like DataDog, AWS Cloudwatch, ELK, Prometheus, Splunk etc.
  • Excellent communication and interpersonal skills, with the ability to collaborate effectively with cross-functional teams.
  • Strong problem-solving and analytical skills, with a keen attention to detail.
  • Certifications such as AWS Certified DevOps Engineer or Google Professional Cloud DevOps Engineer are a plus.
  • Ability to obtain and maintain a Public Trust clearance.

Preferred:

 

  • Understanding of modern architecture, e.g. micro-services, EDA, etc., and cautious against overcomplexity and overengineering
  • Experience with monitoring and metrics platforms, e.g. New Relic, Prometheus, InfluxDB, Grafana, Splunk, etc
  • Experience designing and operating distributed systems and cloud infrastructure at scale

Compensation

In accordance with pay transparency guidelines, the proposed salary range for this position is $140,000.00 to $180,000.00. Final salary will be determined based on various factors such as relevant skills, experience and certifications. 

 

 

Salary : $140,000 - $180,000

Staff Site Reliability Engineer
Visa -
Ashburn, VA
Site Reliability Developer (JoinOCI-Ns2)
Oracle -
Reston, VA

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Site Reliability Manager?

Sign up to receive alerts about other jobs on the Site Reliability Manager career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$154,184 - $199,940
Income Estimation: 
$189,563 - $242,917
Income Estimation: 
$71,493 - $96,419
Income Estimation: 
$92,369 - $122,605
Income Estimation: 
$92,369 - $122,605
Income Estimation: 
$117,024 - $149,811
Income Estimation: 
$117,024 - $149,811
Income Estimation: 
$137,568 - $176,908
Income Estimation: 
$137,568 - $176,908
Income Estimation: 
$158,960 - $205,707

Sign up to receive alerts about other jobs with skills like those required for the Site Reliability Manager.

Click the checkbox next to the jobs that you are interested in.

  • Architecture Skill

    • Income Estimation: $146,487 - $189,921
  • Availability Management Skill

    • Income Estimation: $205,940 - $255,928
    • Income Estimation: $228,175 - $287,213
This job has expired.
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Not the job you're looking for? Here are some other Site Reliability Manager jobs in the Washington, DC area that may be a better fit.

Site Reliability Manager

Karsun Solutions, LLC, Herndon, VA

AI Assistant is available now!

Feel free to start your new journey!