Principal Site Reliability Engineer - Remote

Talentify.io

Worldwide
Full-time
Software Development
node.js

Description

Talentify helps candidates around the world to discover and stay focused on the jobs they want until they can complete a full application in the hiring company career page/ATS.

Principal Site Reliability Engineer - Remote

Brazil /

Engineering /

Full-time

Apply for this job

Hiring Company is an open source platform for secure collaboration across the entire software development lifecycle. Hundreds of thousands of developers around the globe trust Hiring Company to increase their productivity by bringing together team communication, task and project management, and workflow orchestration into a unified platform for agile software development.

Founded in 2016, Hiring Company’s open source platform powers over 800,000 workspaces worldwide with the support of over 4,000 contributors from across the developer community. The company serves over 800 customers, including European Parliament, NASA, Nasdaq, Samsung, SAP, United States Air Force and Wealthfront, and is backed by world-class investors including Battery Ventures, Redpoint, S28 Capital, YC Continuity. To learn more, visit www.Hiring Company.com.

We value high impact work, ownership, self-awareness and being focused on customer success. If these values match who you are, we hope you'll learn more about working at Hiring Company and apply!

We are looking for an engineer with demonstrated experience in software development and infrastructure using Kubernetes. You will be ensuring high reliability and scaling of Hiring Company’s new SaaS offering through building tools, deploying infrastructure and automation in Kubernetes.

Here is some of the challenges and work of SRE team:
  • Monitoring Cloud Environments at Scale with Prometheus and Thanos
  • How We Use Sloth to do SLO Monitoring and Alerting with Prometheus
  • Automate EKS Node Rotation for AMI Releases
Responsibilities
  • Build services and tools to ensure the stability of Hiring Company’s SaaS offering
  • Define infrastructure in code with IaC tools like Terraform
  • Write thoughtful and high-quality code in Go
  • Follow our engineering best practices, and ensure alignment with our Leadership Principles
  • Provide technical mentorship for fellow engineers
  • Develop services to handle automatic recovery from incidents and disasters
  • Automate incident or disaster simulations to identify blindspots
  • Set technical vision and innovate to be on the forefront of self-healing SaaS services
  • Implement, maintain and tune monitoring and alerting systems
  • Deploy applications to and manage Kubernetes clusters
  • Participate in our on-call rotation to respond to incidents and resolve problems.
Required Background/Skills
  • Bachelor's degree in Computer Science or related fields, or significant professional DevOps or SRE experience
  • 5+ years of previous experience as a developer or SRE with operational responsibilities
  • Proven experience responding on-call to incidents with superior knowledge of incident response processes
  • Strong skills and experience working with Kubernetes inside and out
  • Strong skills and experience working with infrastructure as code tools, such as Terraform
  • Solid programming skills and experience with or an ability to quickly become proficient in Go
  • Familiarity with container systems such as Kubernetes & Docker
  • Familiarity with GitOps and Chaos Engineering
  • Ability and willingness to be on-call
Preferences
  • Experience with distributed application systems using HTTP, WebSockets, RPC, pub/sub, etc. at scale
  • Open source contributions to related projects
  • Knowledge of Grafana and Prometheus suite
  • Comfortable with GitHub, Jira, Jenkins, CircleCI
  • Experience with WebRTC for real-time communication architectures
  • Experience working in open source communities
Hiring Company is a remote-first company with sta

Job Summary

Job ID:975
Company:Talentify.io
Location:Worldwide
Job Type:Full-time
Primary Tag:Software Development

To claim this job, send an email to admin@remoteng.com from your work email with the job ID.

More Details


Website:

https://www.talentify.io

Job Posted:

3 years ago