Site Reliability Engineer resolving Level 3 incidents | Santiago, Chile📍
Bookmark Details
Job title: Site Reliability Engineer resolving Level 3 incidents | Santiago, Chile📍
Company: Prometeo Talent
Job description: About usThis company provides a software platform powered by AWS. This platform can be easily customized to help various clients, including financial service providers, debt collection agencies, debt buyers, telecommunication companies, utility companies, and others, achieve better results when working with customers facing financial challenges. Importantly, they prioritize treating customers with respect and dignity while delivering an industry-leading return on investment. With experience in over 60 countries and supporting over 650 different types of financial obligations, they simplify complex situations and create positive experiences for consumers.Our ProposalWe are looking for a Site Reliability Engineer (SRE) to work in the Cloud Operations area to organize the response to the different serious and high-impact incidents that arise. Your main objective will be to restore customer service by mobilizing and empowering the appropriate tools to evaluate and repair the issues. In addition, you will be responsible for identifying any remedial gaps and driving continuous improvement.Your responsibilities will be:
- Diagnose and Resolve Level 3 Incidents: Investigate and resolve escalated technical issues related to systems, applications, and cloud infrastructure.
- Monitor System Performance: Monitor the performance and availability of critical systems using monitoring tools. Analyze system logs and performance metrics to identify potential issues and proactively address them before they escalate.
- Implement Security Measures: Collaborate with the security team to implement and enforce security policies, protocols, and procedures.
- Optimize System Performance: Identify opportunities for performance optimization and capacity planning to ensure scalability and efficiency of infrastructure resources.
- Participate in Incident Response: Assist in incident response activities during critical incidents, including analysis, resolution, and post-incident reviews.
- Collaborate with Cross-Functional Teams: Work closely with other IT teams, including network engineers, developers, and support personnel, to address complex technical challenges and improve overall system performance and reliability.
- Document Processes and Procedures.
Requirements
- +7 years of experience as a Site Reliability Engineer, ideally in the financial sector.
- Experience in Level 3 incident resolution.
- Experience working in AWS environments and infrastructure.
- Experience in cybersecurity practices.
- Experience in maintenance and troubleshooting of Linux systems incidents.
- Experience in using monitoring tools.
- Scripting knowledge with some programming languages (Bash, Python,)
- Knowledge of Site Reliability Engineering and ITIL frameworks.
- Background working in Scrum, Kanban, Agile environments, and with tools such as Jira or Confluence.
- Advanced English proficiency. You will work with international teams from all over the world
What do we offer?
- Working as an employee
- Hybrid work: 3 days in the office in Santiago.
Our values
- We foster trust
- Ownership is our guiding principle
- Collaboration
- Integrity
- Inclusivity
- Commitment to excellence
Expected salary:
Location: Rancagua, Libertador B. O’Higgins – O’Higgins, Aysén
Job date: Wed, 13 Mar 2024 23:12:58 GMT
Apply for the job now!
Share
Facebook
X
LinkedIn
Telegram
Tumblr
WhatsApp
VK
Mail