Platform Engineer
Triomics
- Location
- Onsite (New York City, NYC)
- Compensation
- $150k - $200k/yr
- Employment
- Full-time
- Level
- Mid Level
About the Role
Triomics is seeking a Platform Engineer to build backend services and manage cloud infrastructure for their platform, which processes millions of clinical documents. This role offers the opportunity to impact the stability and scalability of a growing healthcare technology solution.
Skills
Perks
- Remote work
Full job details
Job Description:
This role spans backend product engineering and infrastructure. You'll build backend services and application features, and also own the cloud infrastructure, deployments, and CI/CD that keeps them running in production. The platform processes millions of clinical documents monthly across multi-tenant deployments in customer as well as Triomics cloud environments, with GPU infrastructure serving AI extraction models. We need someone who can write application code in the morning and debug a Kubernetes deployment issue in the afternoon.
What Success Looks Like in the First 90 Days
Days 1-30: Map the entire infrastructure and find what's fragile.
Get access to every deployment - AWS, Azure, customer-hosted environments. Understand the full topology: how Kubernetes clusters are configured, how GPU nodes serve models, how document pipelines move data from EHR ingestion to extraction to structured output. Your first job is to understand what is already built, where the sharp edges are, and what breaks when load spikes or a deployment goes sideways. By end of month one, you should have a written map of every production environment, know which deployments are most fragile, and have identified the top 3 infrastructure risks.
Days 30-60: Own production stability and start shipping backend services.
Take ownership of at least one customer deployment end-to-end - monitoring, alerting, incident response. Set up observability that catches pipeline failures and data quality regressions before customers report them (today, customers often find issues first). Simultaneously, pick up a backend product feature - patient data processing, document pipeline improvement, or a platform feature the product team needs. Ship it. The goal is to make sure you can context-switch between infra firefighting and product engineering.
Days 60-90: Standardize deployments and Monitor Everything.
Document deployment runbooks, automate what's manual, and build CI/CD improvements that make releases safer and faster. You should have a clear plan for what the infrastructure needs to look like to support 2-3x the current customer count without adding headcount proportionally.
Responsibilities
Build and ship infrastructure services that power our product - document pipelines, application logic, and platform features
Own cloud infrastructure and deployment pipelines across both Triomics and customer environments (AWS, Azure)
Manage Kubernetes clusters, containerized services, CI/CD, and release processes including GPU node management for model serving
Build monitoring, alerting, and observability across production deployments - we process millions of documents and need to catch pipeline failures, data quality regressions, and infrastructure issues before customers do
Debug and resolve production issues end-to-end - from application-layer bugs to infrastructure failures
A significant portion of our engineering team is offshore and this role requires working with that team as well on architecture decisions, code reviews, and production stability
Requirements
3+ years as a platform/infrastructure engineer at a startup or growth-stage company
Strong backend engineering: can design, build, and ship production services
Comfortable across the infrastructure stack: cloud (AWS or Azure), Kubernetes, Docker, CI/CD, networking, monitoring
Experience managing production deployments and debugging issues across application and infrastructure layers.
Can context-switch between writing product code and doing infra/ops work without treating either as out of scope of their job
Preferred
Experience with data-heavy applications - document processing pipelines, batch and real-time data workflows
Worked with ML/AI systems in production - model serving, GPU infrastructure, pipeline orchestration
Built infrastructure at an early-stage company where you were one of few engineers owning the full stack
Familiarity with building third party integrations in product is a plus