E
Dev Ops System Administrator (TS/SCI with Polygraph) #ESF1365
ExpertHiring
- Location
- Onsite (Scottsdale, AZ)
- Compensation
- $126k - $135k/yr
- Employment
- Full-time
- Level
- Senior Level
Posted 3 days ago
About the Role
Our client develops mission-critical solutions across various domains, offering career advancement and autonomy. This role involves designing, implementing, and maintaining scalable infrastructure for AI/ML model training and inference.
Skills
DevOps
System Administration
CI/CD
Linux
Docker
Kubernetes
Ansible
Terraform
Infrastructure as Code
GPU Resource Management
AI/ML Infrastructure
TS/SCI with Polygraph
Full job details
Top Reasons to work with our client
#INDEH123
- We develop mission critical solutions across the land, sea, air, space and cyber domains.
- Career advancement
- Autonomy in the workplace
- Well liked management
What you will be doing: Dev Ops System Administrator
- Design, implement, and maintain scalable and robust infrastructure for AI/ML model training and inference.
- Develop and manage CI/CD pipelines for automated building, testing, and deployment of AI applications and machine learning models.
- Administer and optimize Linux-based systems and virtualized environments.
- Manage containerization and orchestration platforms (e.g., Docker, Kubernetes) to deploy and scale ML services.
- Automate infrastructure provisioning, configuration management, and deployment processes using Infrastructure as Code (IaC) tools like Ansible or Terraform.
- Manage and allocate GPU resources efficiently for model training and other high-performance computing tasks.
- Implement and maintain monitoring, logging, and alerting systems to ensure platform health and performance.
- Collaborate with development teams to support their infrastructure needs and troubleshoot issues.
Experience you will need: Dev Ops System Administrator
- Bachelor’s degree in Computer Science, a related field or equivalent experience is required plus a minimum of 8 years of relevant experience; or Master’s degree plus 6 years of relevant experience.
- Department of Defense TS/SCI with Polygraph security clearance is required at time of hire.
- Advanced understanding of server-based operating systems.
- Strong Linux/Container/AI Skills.
- Subject matter expert (SME) with the ability to mentor others on administrating the server environment.
- Enhanced troubleshooting skills within the server OS as well as both networking and storage technologies.
- Hands-on experience developing, deploying, and supporting large-scale enterprise server solutions.
- Experience working with or familiarity with AI/ML models is preferred.
#INDEH123
Not the right fit?
Browse all DevOps & SRE roles.