Loading job details...
Posted 2 days ago
Engineer – Site Reliability Engineering (SRE) Company: London Stock Exchange Group (LSEG) Location: Bengaluru, India Job Type: Full-Time Job Overview We are seeking a Site Reliability Engineer (SRE) to support platform reliability, cloud operations, automation, and observability initiatives. The role involves working closely with engineering, platform, and product teams to ensure applications and services remain reliable, scalable, secure, and highly available. Candidates will contribute to operational excellence through automation, monitoring, incident management, and infrastructure improvements. Key Responsibilities Reliability Engineering & Platform Operations • Support platform reliability and operational activities across cloud environments • Monitor and maintain service reliability metrics including SLIs, SLOs, and error budgets • Participate in incident response, troubleshooting, root cause analysis, and remediation activities • Develop automation solutions for operational tasks and incident handling • Improve deployment processes and operational efficiency CI/CD & Infrastructure Automation • Maintain and enhance CI/CD pipelines using automation tools • Contribute to infrastructure-as-code implementations using Terraform and GitHub Actions • Support automated testing and deployment workflows • Work with containerized environments and orchestration platforms Monitoring & Observability • Build and maintain monitoring dashboards and alerting systems • Work with observability and logging tools to improve system visibility • Monitor application performance, infrastructure health, and operational stability • Support proactive issue detection and reliability improvements Cloud & Platform Support • Support cloud infrastructure across Azure and AWS environments • Work with Kubernetes, Docker, API gateways, and cloud databases • Assist with system scalability, availability, and performance optimization • Maintain disaster recovery readiness and operational documentation Service Management & Compliance • Follow ITIL-based service management practices • Manage incidents, service requests, and operational workflows using ServiceNow • Ensure compliance with internal operational and security standards • Support audit readiness and governance activities Collaboration & Knowledge Sharing • Work closely with engineering and product teams on reliability initiatives • Share technical knowledge through documentation and collaboration sessions • Participate in cross-functional and global engineering initiatives • Contribute to continuous improvement and operational excellence programs Required Skills • Bachelor’s degree in Computer Science, Engineering, or related field • Understanding of Site Reliability Engineering (SRE) principles • Experience with Azure and AWS cloud platforms • Knowledge of Kubernetes, Docker, Terraform, Jenkins, and GitHub Actions • Familiarity with monitoring and observability tools such as Datadog and OpenTelemetry • Understanding of API gateways, databases, and cloud infrastructure concepts • Experience with scripting languages such as Python or Bash • Knowledge of incident management and operational support processes Preferred Skills • Familiarity with Kong API Gateway and Snowflake • Understanding of Azure SQL, Cosmos DB, and cloud-native architectures • Exposure to ServiceNow and ITIL processes • Strong troubleshooting and automation mindset • Experience with scalable and distributed systems Ideal Candidate Profile • Detail-oriented and proactive problem solver • Strong communication and collaboration skills • Passionate about automation, cloud technologies, and reliability engineering • Eager to learn modern DevOps and SRE practices • Comfortable working in fast-paced and collaborative environments
Get latest job alerts instantly
Trusted by thousands of job seekers