skillindiajobs
Hyderabad Jobs
Banglore Jobs
Chennai Jobs
Delhi Jobs
Ahmedabad Jobs
Mumbai Jobs
Pune Jobs
Vijayawada Jobs
Gurgaon Jobs
Noida Jobs
Oil & Gas Jobs
Banking Jobs
Construction Jobs
Top Management Jobs
IT - Software Jobs
Medical Healthcare Jobs
Purchase / Logistics Jobs
Sales
Ajax Jobs
Designing Jobs
ASP .NET Jobs
Java Jobs
MySQL Jobs
Sap hr Jobs
Software Testing Jobs
Html Jobs
IT Jobs
Logistics Jobs
Customer Service Jobs
Airport Jobs
Banking Jobs
Driver Jobs
Part Time Jobs
Civil Engineering Jobs
Accountant Jobs
Safety Officer Jobs
Nursing Jobs
Civil Engineering Jobs
Hospitality Jobs
Part Time Jobs
Security Jobs
Finance Jobs
Marketing Jobs
Shipping Jobs
Real Estate Jobs
Telecom Jobs

Site Reliability Engineer

3.00 to 7.00 Years   Bangalore   16 Feb, 2021
Job LocationBangalore
EducationNot Mentioned
SalaryNot Disclosed
IndustryIT - Software
Functional AreaGeneral / Other Software,Network / System Administration
EmploymentTypeFull-time

Job Description

A Site Reliability Engineer at Catchpoint is responsible for supporting the systems that run Catchpoint s global monitoring platform. In this role, you will interact directly with operations and development teams on building and maintaining automation and monitoring to ensure Catchpoint has a scalable and highly reliable system for our customers.

The role requires an operational mindset and a love of solving problems at a global scale with solutions that maintain high reliability and availability. You ll be exploring and making sense of systems telemetry, logs, passive monitoring and our own synthetic monitors to create automation that controls, rolls out, and maintains our platform.

Responsibilities:

Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation and refinement

Maintain services once they are live by measuring and monitoring availability, latency and overall system health. Establish performance baselines, define actions and automation correlating data from multiple sources

Design, build, and maintain logging and telemetry systems that are used to manage all services.

Design, code, test, and deliver software to automate manual operational work.

Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents.

Identify application patterns and analytics in support of better service level objectives.

Deploy and maintain systems that run on multiple cloud providers (AWS, GCP, Azure, Alibaba, Tencent, Oracle, IBM) and physical systems around the world.

Be part of an on-call rotation to support production systems

Desired Skills Experience:

Strong Linux and or Windows system administration

Good networking knowledge and experience with Internet Architecture (BGP, peering, DNS).

2 years of incident resolution experience in a large-scale operations environment.

Experience/knowledge administering application servers, web servers, and databases

Hands-on experience with cloud deployment, monitoring, and ops analysis tools such as Prometheus, Elasticsearch, Grafana, Kibana, Splunk, Terraform, Jenkins, etc.

3 years with python, bash, PowerShell, C, etc

Linux experience required; Windows Server desired.

BS degree in Computer Science or related technical field involving coding or equivalent practical experience.

Appreciation of the value of diversity of opinions, approaches, and backgrounds.

,

Keyskills :
javaacademicsacpalgorithmsandroidwindows system administrationweb serversservice levelwindows servercomputer scienceproduction systemsapplication serverssystem administrationdnsgcpbashlinuxcloudazure

Site Reliability Engineer Related Jobs

© 2020 Skillindia All Rights Reserved