What are the responsibilities and job description for the Cloud Platform Operations Manager position at Amano McGann, Inc?
Cloud Platform Operations Manager
Position Overview: The Manager of Cloud Platform Operations will play a critical role in the success of Amano by ensuring the health of the company’s SaaS applications, including availability, performance, security, and recoverability of those applications. This role encompasses technical oversight, team management, and collaboration with cross-functional departments. This is a new role and the successful candidate will contribute significantly to shaping the company's cloud strategy/future architecture.
Reports to: Chief Technology Architect
Direct Reports: Site Reliability Engineer (SRE) and Cloud Support Engineer(s).
Key Responsibilities:
- Ensure that the SaaS applications and cloud platforms remain available, responsive and secure.
- With the SRE, optimize monitoring to ensure that we know about inadequate response times or other issues before we hear about them from our customers.
- Optimize the use of contracted Managed Service Provider (currently Rapid Scale) resources for monitoring and incident response and resolution.
- Evaluate, update and implement processes/procedures to manage operational incidents (impacting the availability of the SaaS application(s)); work with Director of Customer Support to determine appropriate engagement of members from each team.
- Implement and track metrics to measure and report on the health of SaaS applications and cloud platforms. Ensure that all availability and performance SLAs are met.
- With the Cloud Support Engineer(s), optimize CI/CD pipelines and processes.
- Ensure that only fully tested code that is approved by QA and Product Management is deployed into the production environment.
- Model culture of intellectual curiosity, problem solving, collaboration, reasonable risk taking, and big thinking in a self-directed environment.
- Hire and mentor team members with the appropriate knowledge, skills, and abilities.
- Manage contract with Managed Service Provider for 24x7 support of the SaaS applications. Ensure that the billing is accurate, and that Amano receives the appropriate value for the fees paid.
- Coordinate with Scrum and Release Managers, Architects and Product Managers to:
- Plan for and execute code releases to the Production environment, including:
- Regularly scheduled releases for new features
- “Hot Fix” releases to address critical defects as needed
- Updates to AWS services or supporting applications
- Plan for and execute code releases to the Production environment, including:
- Ensure that AMI’s SDLC and other such processes are followed; provide input for improvements that could be made to the existing SDLC.
- Plan, test, and execute Business Continuity/Disaster Recovery processes.
- Maintain the Cloud Platform Operations Team budget. This includes approving invoices, ensuring that the expenses represented in the invoices are appropriately coded for posting, regularly reviewing the team’s budget and reporting budget status monthly to the Vice President of R&D.
- Provide project management oversight for infrastructure updates/projects.
- Working with peer managers, Architects, and cross functional personnel, continuously seek opportunities for improvement in processes, tools, etc.
- Maintain all documentation necessary for the Cloud Platform Operations team to pass SOC 2, PCI and other industry standard audits.
- Develop and implement an on-call schedule for the Cloud Platform Operations team, relying on the contracted Managed Service Provider for off-hours monitoring, initial evaluation and escalation only as needed.
- All other duties as assigned
- Knowledge:
- Deep understanding of SaaS application delivery and support principles and best practices.
- Ongoing growth of knowledge of cloud technology trends, new thinking, new products, etc.
- Skills:
- Management:
- Strong interviewing skills.
- Planning and monitoring skills to ensure timely completion of work assigned to the Cloud Platform Operations team.
- Goal setting and monitoring skills to ensure that all team members can meet or exceed their goals.
- High level estimation skills.
- Coaching and mentoring skills.
- Technical:
- Knowledge of Linux Administration.
- Experience with Git source control.
- Foundational understanding of security best practices.
- Exposure to programming languages such as C#, Java, or Python.
- Experience with scripting languages such as PowerShell, Bash, or Python.
- Experience with Application Performance Monitoring (APM) tools, such as Datadog or Dynatrace.
- Knowledge of AWS, Microsoft Azure, GCP or similar cloud platforms. Preferred experience with AWS.
- Experience using IaC tools such as Terraform.
- Management:
- Abilities:
- Ability to communicate clearly and concisely verbally and in writing.
- Ability to coach and motivate team members to contribute at an optimal level.
- Ability to assist in dispute resolution, as it occurs, among team members and with other managers and teams.
- Attitude:
- Maintains focus on Cloud Platform Operations, R&D, Engineering and Company goals.
- Proactive in their approach to problem-solving.
- Holds their team members and fellow managers accountable for their words and actions.
- Holds themselves accountable for their words and actions.
- Maintains an ongoing desire to learn and stay current regarding cloud operations in general and AWS services and capabilities in particular.
- Accepts and provides feedback graciously and learns from everything they do.
- Reacts with due consideration to obstacles and issues in a timely and appropriate manner.
- Knows and follows company rules described in the Employee Handbook and requires employees to do so as well.
- Keeps colleagues and management informed of significant issues or developments.
Preferred Qualifications:
- 7 years of experience managing cloud infrastructure/environments.
- Experience building and supporting CI/CD pipelines deploying to multiple environments.
- Experience building, testing, maintaining and executing Business Continuity/Disaster Recovery plans.
- Bachelor’s degree or an equivalent combination of education and related work experience.
Salary : $150,000 - $160,000