Staff AI Platform Engineer
CarParts.com
- Location
- Onsite (Long Beach, California)
- Compensation
- $166k - $232k/yr
- Employment
- Contract
- Level
- Senior Level
About the Role
CarParts.com is a leading eCommerce platform for auto care, enabling drivers to find quality parts and schedule maintenance. This role is central to expanding their innovative AI platform, Axle, and its operational AI agent, OpsWhisperer.
Skills
Full job details
What We Do
CarParts.com is the go-to eCommerce platform for auto care and maintenance. We provide drivers with quality parts at competitive prices and enable them to schedule appointments with trusted mechanics directly through our website. Using world-class design principles and the latest technologies, we deliver a fast, intuitive digital experience backed by our company-owned national distribution network.
With over 1,000 employees worldwide, we are scaling rapidly, fueled by our most recent strategic partnership and $35 million investment. This positions us for the next phase of growth as we continue to empower drivers along their journey.
We’ve built Axle - CarParts.com’s domain AI platform and winner of the MACH Alliance Impact Award for Best Multi-Agent Ecosystem — and we’re expanding it. This role is central to that expansion.
Our Culture
At CarParts.com, our culture goes beyond our core values of Safety First, Customer Focused, and Commitment to Excellence. We are a performance-driven, data-focused, and fast-paced team where results matter and winning is expected.
- Hungry & Hardworking: We set ambitious goals, measure progress with clear metrics, and hold ourselves accountable to deliver results.
- Promote from Within: We reward top performers with opportunities for growth and advancement.
- Collaborative & In-Person: We believe the best ideas and fastest execution happen face-to-face.
- High Standards: We move quickly, pay attention to details, and dig deep - whether it’s analyzing contracts, aggregating complex scenarios, or building clear, data-driven presentations.
- No Passengers: We value grit, ownership, and the relentless pursuit of results
THE OPPORTUNITY
One exceptional engineer. AI as the team.
This is not a standard DevOps posting. We are looking for one unusually capable, AI-native engineer to own our entire platform engineering and SRE function — using autonomous agents, LLM-powered pipelines, and MCP-based tooling as force multipliers to do the work of a team, on-site, in close partnership with our engineering leadership.
You will inherit a mature, fully containerized AWS estate (9 EKS clusters, 27 accounts, 228 Kubernetes nodes), an Akamai CDN layer managing live traffic splits, GitHub Actions + Jenkins CI/CD pipelines for a Webpack 5 micro-frontend monorepo, and an operational AI agent platform — OpsWhisperer — already in production monitoring 25 AWS accounts with a 91% autonomous resolution.
Your job is to extend all of it, automate what remains manual, and be the person who makes every deployment, incident, and infrastructure change happen with speed, precision, and intelligence.
SCOPE OF OWNERSHIP
What you’ll own
AWS Multi-Account Infrastructure
- EKS clusters across dedicated AWS accounts
- EC2 worker nodes via Auto Scaling Groups
- SQS pipelines
- AWS Bedrock (Claude) for AI agent workloads
Kubernetes & Containerization
- EKS clusters
- Node group mgmt
- Kops clusters alongside EKS
- Multiple environment tiers with full blast-radius isolation
CI/CD & Release Management
- Multiple Repos
- GitHub Actions workflows + Jenkins pipeline management
- Turbo build system across multiple micro-frontend packages
- Canary release gating and rollback automation
CDN & Traffic Management
- Akamai Property Manager config
- Phased Release Cloudlet for Canary and Production split
- Security, Throttling and Monitoring
- Jenkins-driven cache invalidation
Observability & Incident Response
- Elastic/Kibana
- CloudWatch across all AWS accounts
- Business performance monitoring
- SQS backlog + pipeline health alerting
- On-call ownership, proactive, AI-assisted triage
NON-NEGOTIABLE
The AI-native expectation
This is a role where AI fluency is not a bonus — it is how you do the job. We expect you to build, operate, and improve autonomous agents that handle monitoring, alerting, triage, and routine operational work. You are not just a consumer of AI tools; you are the person who builds them, deploys them into production, and iterates on them based on real operational data.
You will extend OpsWhisperer(AI Platform and Observability agent), contribute to the Axle platform, build MCP servers that give agents new capabilities, and apply LLM-powered reasoning to infrastructure problems that previously required multiple humans. If you’ve never built an agent that runs in production unsupervised, this is not the right role.
WHAT YOU’LL INHERIT & EXTEND
The tech stack
Category
Technologies
Cloud & Orchestration
AWS EKS · Kubernetes · Kops · AWS Organizations · Auto Scaling Groups · AWS SQS · AWS Bedrock · CloudWatch
CDN & Networking
Akamai Property Manager · Phased Release Cloudlet · Fast Purge · · Content Protector
CI/CD & Frontend
GitHub Actions · Jenkins · Turbo (monorepo) · Webpack 5 Module Federation · Canary / Blue-Green Deployments
AI & Agentic
MCP (Model Context Protocol) · Claude API / AWS Bedrock · Azure Bot Service · Microsoft Entra ID · Operational AI Agents
Observability & Data
Elastic / Kibana · BlueTriangle · Databricks · Cloudinary · New Relic
Languages
Node.js / TypeScript · Python · Bash / Shell · SQL · PowerShell
REQUIREMENTS
What we’re looking for
- 10+ years of hands-on DevOps, SRE, or platform engineering experience in production AWS cloud environments.
- Deep AWS expertise: EKS, EC2, SQS, CloudWatch, IAM, Organizations, and multi-account architectures
- Strong Kubernetes skills: cluster operations, node group management, workload isolation, taints/tolerations, auto-scaling
- Experience with Akamai or equivalent enterprise CDN — configuration, purge operations, traffic routing rules
- CI/CD ownership: GitHub Actions and/or Jenkins pipeline design, monorepo build systems, release gating
- Production experience building or operating AI agents — LLM integration, autonomous workflow design, prompt engineering
- Proficiency in Node.js and/or Python for automation, tooling, and MCP server development
- Observability stack ownership: Elastic/Kibana, log analysis, alerting design, SLO/SLI instrumentation
- Comfortable owning on-call responsibility for a production e-commerce platform with significant revenue exposure
- Strong written and verbal communication — will interface with engineering leadership and present findings to executives
- Based in or willing to relocate to the Los Angeles / Long Beach area for on-site work
Equal Opportunity Employer
CarParts.com is an equal-opportunity employer. We enthusiastically accept our responsibility to make employment decisions without regard to race, religious creed, color, age, sex, sexual orientation, national origin, religion, marital status, medical condition, physical or mental disability, military service, pregnancy, childbirth and related medical conditions, or any other classification protected by federal, state, and local laws and ordinances. Our management is dedicated to ensuring that we fulfill this policy with respect to hiring, placement, promotion, transfer, demotion, layoff, termination, recruitment advertising, pay, and other forms of compensation, training, and general treatment during employment.
The above-noted job description is not intended to describe, in detail, the multitude of tasks that may be assigned but rather to give the incumbent a general sense of the responsibilities and expectations of his/her position. As the nature of business demands change so, too, may the essential functions of this position.
Not the right fit?
Browse all DevOps & SRE roles.