Infrastructure Engineer (Hybrid in NYC)
Salary: $140,000 – $200,000
About the Opportunity
We are seeking an experienced Infrastructure Engineer to build, maintain, and optimize the foundational systems that power a modern data, analytics, and AI platform. This role is primarily focused on infrastructure, cloud operations, and DevOps engineering, with ownership of cloud environments, deployment pipelines, orchestration, networking, security, and observability.
The ideal candidate is passionate about building scalable, reliable systems and enjoys working across cloud infrastructure and platform engineering. Candidates interested in expanding into AI and agentic systems will have opportunities to do so, but prior AI experience is not required.
Responsibilities
DevOps & Platform Engineering
- Deploy, configure, and maintain shared platform services as containerized workloads.
- Manage cloud infrastructure, including networking, security, storage, secrets management, identity services, and container registries.
- Design, build, and maintain CI/CD pipelines, release processes, and Git-based workflow standards.
- Configure and support container orchestration environments and service-to-service connectivity.
- Monitor platform health and performance through logging, alerting, tracing, and observability tools.
- Continuously evaluate and implement technologies that improve reliability, scalability, security, and developer productivity.
- Create and maintain technical documentation and architecture diagrams.
AI & Platform Growth Opportunities
While not required, candidates interested in AI infrastructure will have opportunities to:
- Support deployment and operation of AI and agent-based workflows.
- Build infrastructure and tooling that enables efficient development, testing, and deployment of AI applications.
- Contribute to AI evaluation, testing, and monitoring frameworks.
- Extend observability and reliability practices into AI execution environments.
Qualifications
Required
- 3+ years of experience in Infrastructure Engineering, DevOps, Platform Engineering, or Site Reliability Engineering (SRE).
- Strong experience with cloud infrastructure and modern container technologies.
- Hands-on experience with:
- Docker
- Kubernetes
- At least one major cloud provider (AWS, Azure, or GCP)
- Experience deploying and managing containerized services in production environments.
- Knowledge of networking, load balancing, security, and service connectivity.
- Experience building and maintaining CI/CD pipelines and Git-based development workflows.
- Familiarity with workflow orchestration tools such as Prefect, Airflow, Dagster, or similar platforms.
- Experience with monitoring, logging, alerting, and observability solutions.
- Strong written and verbal communication skills with the ability to document technical solutions effectively.
Preferred
- Interest in AI, machine learning infrastructure, and emerging agentic systems.
- Familiarity with frameworks such as LangChain, MCP, or similar technologies.
- Experience supporting MLOps environments, model deployment, or inference workflows.
- Knowledge of model serving architectures, batch processing, and real-time APIs.
- Experience with ML observability and evaluation platforms such as MLflow, Langfuse, or similar tools.
- Experience managing internal Python package repositories (Private PyPI, Artifactory, etc.).
- Willingness to contribute across adjacent engineering disciplines, including data engineering and software development when needed.
Culture & Success Factors
Successful candidates will:
- Demonstrate a strong bias for action and execution.
- Take ownership of projects and outcomes from start to finish.
- Solve problems proactively and focus on results.
- Remain composed and effective in fast-paced environments.
- Simplify complex technical challenges into practical solutions.
- Thrive in a collaborative, high-performance team environment.