Monitoring Manager
confidential - Tenochtitlán, Veracruz-Llave
Apply NowDescripción del trabajo
We are seeking a highly skilled and proactive Monitoring Manager to lead our observability and monitoring strategy across critical fintech infrastructure. This position plays a key role in maintaining system reliability, availability, and performance through effective monitoring solutions and real-time incident response. This position can be based in Montevideo, Uruguay; El Salvador; Colombia; or Mexico. You need to comnbine a strong technical expertise with leadership capabilities, and thrives in fast-paced, highly regulated environments. Responsibilities Lead a team responsible for real-time monitoring of infrastructure, applications, and services in cloud and on-prem environments. Define and track KPIs and SLAs to ensure system performance and compliance with industry standards. Select, implement, and manage modern monitoring tools (e.g., Prometheus, Grafana, Datadog, Splunk, New Relic, ELK Stack). Collaborate closely with DevOps, SRE, and engineering teams to design scalable monitoring solutions. Establish effective alerting mechanisms for detecting anomalies, security threats, and performance issues. Lead incident response and Root Cause Analysis to reduce recurrence and improve resilience. Deliver executive-level reporting on system health, performance trends, and operational risks. Implement AI/ML-driven monitoring capabilities to support predictive analysis and maintenance. Manage team shifts, performance, workload allocation, and contribute to hiring and training processes. Requirements Bachelor's or Master's degree in Computer Science, Information Technology, or a related field. At least 5 years of experience in IT monitoring, infrastructure operations, or SRE, preferably in fintech or mission-critical environments. Proficiency with monitoring platforms such as Prometheus, Datadog, Grafana, Splunk, New Relic, Pandora FMS, or ELK Stack. Strong scripting skills (e.g., Python, Bash, PowerShell) for automation and customization of monitoring tasks. Experience with cloud platforms (AWS, Azure, GCP) and containerized environments (Kubernetes, Docker). Familiarity with CI/CD pipelines, Infrastructure-as-Code, and incident management frameworks. Understanding of compliance and cybersecurity practices related to observability and data privacy is a plus. Advanced English proficiency is required. Key Competencies Strong leadership and team management skills Proactive and solution-oriented mindset Excellent communication and stakeholder engagement abilities High attention to detail and commitment to operational excellence.
Creado: Jue, 01 de Ene de 1970