Site Reliability Engineer (DevOps) - Product Development

4.00 to 8.00 Years Australia, United Kingdom, United States of America, Pune, Singapore 01 Nov, 2021

Job Location	Australia, United Kingdom, United States of America, Pune, Singapore
Education	Not Mentioned
Salary	Not Disclosed
Industry	Management Consulting / Strategy
Functional Area	General / Other Software
EmploymentType	Full-time

Job Description

Summary of the Position

Looking for people who are part of a product development company, especially cater to the Machine Learning and Big data domain, currently based in Pune and can join immediately / within a month. Hands-on experience in Network troubleshooting experience with Python / Bash Scripting and hands-on experience in Linux Based system is mandatory.

About the Organisation:

HQ in Singapore, it has offices in Singapore, Sydney, London, and New York but it services the marketing needs of organisations in every corner of the globe. Their petabyte-scale data platform with a key focus on finding solutions that can support the Machine Learning product road-map.

About the Role

In this role, you will be working on bleeding edge hybrid cloud/on-premise infrastructure handing billions of events and terabytes of data a day.

You will be responsible for working closely with various engineering teams to design, build and maintain a globally distributed infrastructure footprint.

As part of role, you will be responsible for researching new technologies, managing a large fleet of active services and their underlying servers, automating the deployment, monitoring and scaling of components and optimizing the infrastructure for cost and performance.

Day-to-day responsibilities Ensure the operational integrity of the global infrastructure Design repeatable continuous integration and delivery systems Test and measure new methods, applications and frameworks Analyze and leverage various AWS-native functionality Support and build out an on-premise data centre footprint Provide support and diagnose issues to other teams related to our infrastructure Participate in 24/7 on-call rotation ( No night shift involved, only on call support if required)

Candidates Profile:

Essential Qualifications Expert-level administrator of Linux-based systems Expert-level scripting with Python or Bash Prior experience of managing monitoring platform Prometheus, Grafana to the extend of writing custom metrics. Prior experience of designing alerts with Alert Manager and integration with PagerDuty. Prior experience of managing large infrastructure deployments using Ansible or equivalent Configuration Management tools. Prior Experience in automating provisioning and managing Hybrid-Cloud infrastructure (AWS and On-Prem) at scale with terraform. Flexible working hours and ability to participate in 24/7 on call support with other team members. Working Knowledge of managing distributed data platforms (Kafka, Spark, Cassandra, etc) Aerospike experience is a plus. Working knowledge with continuous delivery systems (Jenkins, Gitlab, BitBucket, Docker) Network troubleshooting experience (TCP, DNS, IPv6 and tcpdump) Experience managing hundreds to thousands of servers globally. Ability to troubleshoot problems in complex systems Ability to adapt to a rapidly changing environment Comfortable collaborating and supporting a diverse team of engineers Enjoy automating tasks, rather than repeating them

Candidate Profile:

Minimum 3 Years experience as a Site Reliability Engineer / DevOps in a product development company

Someone who can join immediately / Within a month

hands-on experience in AWS infrastructure and ES6, EC2, Lambda etc.

Hands-on experience in Python / Bash Scripting

Used tools like Prometheus and Grafana

Hands-on experience in Alert Management System

Excellent Communication Skill

Qualifications

Someone who can join immediately / Within a month
Hands-on experience in AWS (Ec2 / Lambda etc)
Hands-on experience in alert management syst,

Keyskills :
javaacademicsacpalgorithmsandroidresearching new technologiesbig datadata centerdata domainit servicesbash scriptingcomplex systemsmachine learningmanagement systemproduct developmentcontinuous deliverycontinuous integrationnetwork troubl

APPLY NOW

Site Reliability Engineer (DevOps) - Product Development

Job Description

Site Reliability Engineer (DevOps) - Product Development Related Jobs

Site Reliability Engineer (DevOps) - Product Development

Jobs By Category

Jobs By Skills

Jobs By Location

Main Menu

Jobseekers

Employers