What are the responsibilities and job description for the Site Reliabilirty (Only Locals) position at Xyant Services?
Role: SRE (Only Locals)
Location: Lehi, UT - Hybrid Model
Long Term contract
Common monitoring tools:
Prometheus
Nagios
Zabbix
Datadog (Monitoring)
New Relic (Monitoring)
AWS CloudWatch
Azure Monitor
Grafana (visualization monitoring)
Common observability tools:
Datadog (Observability/APM)
New Relic APM
Splunk
Elastic Stack (ELK – Elasticsearch, Logstash, Kibana)
Grafana Loki Tempo
OpenTelemetry
Jaeger
Dynatrace
Job Description:
What you'll Do
• Perform SRE roles including deployment, capacity management, observability, and performance tuning
• Collaborate with our Security Architecture team to define attestation for a variety of workloads spanning multiple compute platforms
• Support engineering teams who will be onboarding to this new service
What you need to succeed
• Proficiency in operating and supporting cloud-based services using IaC (infrastructure as code, Terraform)
• Proven experience as a Service reliability engineer
• Experience with CI/CD processes and source control mechanisms (GitHub)
• Knowledge of federated trust models for identity and security
• Understanding and use of public cloud infrastructure (AWS, Azure, GCP)
• Strong focus on prioritizing customer experience and support
• Ability to communicate clearly and efficiently with customers and leadership
• Experience working with large enterprises with heterogeneous compute platforms
Skills:
Generalist SRE profile
Strong AI-related skillset
Hands-on experience with Terraform
Containerized workloads ( kubernetes)
Experience building and maintaining CI/CD pipelines ( GitHub)
Proficient in monitoring and observability tools