A
Database Systems SRE, ASE Cassandra SRE
Apple
- Location
- Onsite (Seattle, Washington)
- Employment
- Full-time
- Level
- Senior Level
Posted 4 days ago
About the Role
Join Apple's Services Engineering organization as a Cassandra SRE to develop and manage critical database systems at massive scale. You will work with a team of experts on cutting-edge distributed systems that power iCloud and other core Apple services, impacting hundreds of millions of users.
Skills
Apache Cassandra
Site Reliability Engineering
Kubernetes
Java
Go
Python
Distributed Systems
Automation Tooling
Incident Management
Performance Analysis
Cloud Architecture
Multi-datacenter Design
Networking Topologies
Monitoring
Backup Services
Infrastructure Management
Full job details
Apple’s Services Engineering organization (ASE) is seeking experienced database systems engineers to join our Cassandra SRE team. Engineers in ASE Cassandra SRE develop and contribute to software built to manage Apache Cassandra, an open source distributed database powering some of Apple's most critical internet services. You will be joining a team of experts, working at the cutting edge of modern database deployment architectures, distributed systems. The team's work is deployed at massive scale, serving millions of queries per second over hundreds of petabytes of data across our data-centers worldwide. It also has big impact, forming the platform upon which iCloud and many other internet services at Apple are built. In ASE, your work will benefit hundreds of millions of users and is critical to the success of some of the most visible current and future Apple features.
The ASE Cassandra SRE team develops applications and tooling that are safe, reliable, scalable, and fast. This work requires an innovative spirit and an extraordinary degree of care and rigor in engineering. Team members contribute to all major components of Cassandra deployment infrastructure, including maintenance automation, backup service application, monitoring and alerting tooling/dashboards, deployment architecture, as well as contributing back to the upstream patches to the database focused on stability, performance, and scaling. This role also requires excellent communication, ability to partner with our Core Storage and Analytics teams, and a high degree of customer focus when engaging with internal platform customers. As a distributed team, ability to work effectively with colleagues based in other locations is also essential; experience in this area is a plus. Prior experience with development or maintenance of distributed databases / storage systems is recommended.
BS or MS in Computer Science / related fields or equivalent work experience 7+ years in a Site Reliability Engineering Infrastructure focused role Support of internet-facing production services and distributed systems via deployments, On Call and Incident Management. Experience running large scale infrastructure with a heavy reliance on automation tooling Excellent troubleshooting and performance deep dive analysis Real operational experience managing services at scale on Kubernetes Proficient in one or more of the following programming languages: Java, Go (golang), Python Operational experience deploying in and running on Datacenter and Cloud architectures (networking topologies, host placement strategies, and failure modes); design of multi-datacenter systems; failure domains; and wide-area networking. Self motivated, inquisitive with an aptitude to learn new technologies quickly and effectively.
Support of internet-facing production services and distributed systems via deployments, On Call and Incident Management. Experience running large scale infrastructure with a heavy reliance on automation tooling Excellent troubleshooting and performance deep dive analysis Real operational experience managing services at scale on Kubernetes Proficient in one or more of the following programming languages: Java, Go (golang), Python Operational experience deploying in and running on Datacenter and Cloud architectures (networking topologies, host placement strategies, and failure modes); design of multi-datacenter systems; failure domains; and wide-area networking. Self motivated, inquisitive with an aptitude to learn new technologies quickly and effectively.
Description
The ASE Cassandra SRE team develops applications and tooling that are safe, reliable, scalable, and fast. This work requires an innovative spirit and an extraordinary degree of care and rigor in engineering. Team members contribute to all major components of Cassandra deployment infrastructure, including maintenance automation, backup service application, monitoring and alerting tooling/dashboards, deployment architecture, as well as contributing back to the upstream patches to the database focused on stability, performance, and scaling. This role also requires excellent communication, ability to partner with our Core Storage and Analytics teams, and a high degree of customer focus when engaging with internal platform customers. As a distributed team, ability to work effectively with colleagues based in other locations is also essential; experience in this area is a plus. Prior experience with development or maintenance of distributed databases / storage systems is recommended.
Minimum Qualifications
BS or MS in Computer Science / related fields or equivalent work experience 7+ years in a Site Reliability Engineering Infrastructure focused role Support of internet-facing production services and distributed systems via deployments, On Call and Incident Management. Experience running large scale infrastructure with a heavy reliance on automation tooling Excellent troubleshooting and performance deep dive analysis Real operational experience managing services at scale on Kubernetes Proficient in one or more of the following programming languages: Java, Go (golang), Python Operational experience deploying in and running on Datacenter and Cloud architectures (networking topologies, host placement strategies, and failure modes); design of multi-datacenter systems; failure domains; and wide-area networking. Self motivated, inquisitive with an aptitude to learn new technologies quickly and effectively.
Preferred Qualifications
Support of internet-facing production services and distributed systems via deployments, On Call and Incident Management. Experience running large scale infrastructure with a heavy reliance on automation tooling Excellent troubleshooting and performance deep dive analysis Real operational experience managing services at scale on Kubernetes Proficient in one or more of the following programming languages: Java, Go (golang), Python Operational experience deploying in and running on Datacenter and Cloud architectures (networking topologies, host placement strategies, and failure modes); design of multi-datacenter systems; failure domains; and wide-area networking. Self motivated, inquisitive with an aptitude to learn new technologies quickly and effectively.
Not the right fit?
Browse all DevOps & SRE roles.