About The Role
Teza is a systematic trading firm building quantitative strategies across multiple asset
classes. We are looking for a DevOps Engineer to own and evolve our infrastructure
platform — the systems that our quants, developers, and traders rely on every day.
Our infrastructure team is at an inflection point. We have a working platform that
supports active trading and research, and we are now investing in making it more robust,
observable, and developer-friendly. The successful candidate will have significant
influence over the direction of our infrastructure — shaping tooling choices, establishing
standards, and building systems that scale with the firm.
Location
Austin, Texas or Yerevan, Armenia
What We're Building Toward
As a team, we are working toward the following goals. You will play a central role in
defining the approach and driving execution:
- Full-stack observability** across all internal services — unified metrics, centralized logging, distributed tracing, and actionable alerting — so that engineers and traders have clear, real-time visibility into system health and performance.
- Reliable, self-service compute orchestration** spanning Slurm (HPC/ML workloads), Hadoop (batch data processing), and Airflow (workflow scheduling) — enabling researchers and data engineers to run workloads at scale without infrastructure bottlenecks.
- Mature secrets management** for trading credentials, API keys, certificates, and service-to-service authentication — with rotation policies, auditing, and tight integration into deployment workflows.
- Unified release pipelines** that bring consistency to how diverse applications — trading strategies, data pipelines, and real-time trading systems — move from development to production, each with their own build, test, and deployment needs.
- A well-maintained platform foundation** where shared services — identity management, GitHub Actions runners, VPN, observability tooling — stay current and reliable without disrupting active trading.
- Strong security posture** across production and research environments — network segmentation, access controls, vulnerability management, and compliance — that evolves alongside the platform rather than being bolted on after the fact.
Requirements
- Experience building and operating internal platform services for development teams (CI/CD, compute, monitoring, developer tooling) — not just consuming them.
- Strong proficiency in Linux systems administration and container orchestration (Docker, Kubernetes, or similar).
- Deep understanding of the software development lifecycle and how infrastructure supports engineering teams — from local development through CI to production deployment.
- Familiarity with ML and data-intensive workflow requirements: GPU scheduling, large dataset access patterns, experiment tracking, and reproducible compute environments.
- Proficiency with at least one major cloud provider (AWS, GCP, or Azure) including networking, IAM, and managed services.
- Experience designing and operating hybrid infrastructure — cloud, on-premises, and colocation environments — with an understanding of the tradeoffs between them.
- Hands-on programming ability in Python or another scripting language, sufficient to build tooling, automation, and infrastructure-as-code — not just run playbooks.
- Solid understanding of core network protocols and services: DNS, LDAP, SMTP, TLS, HTTP, and SSH.
- Practical knowledge of infrastructure security: firewall management, access control models (zero-trust, bastion hosts), vulnerability scanning, patch management, and audit logging.
Nice to Have
- Experience in a trading firm or other environment with strict uptime and latency requirements.
- Familiarity with infrastructure-as-code tools (Terraform, Pulumi, Ansible).
- Experience with log aggregation and SIEM systems.
- Understanding of compliance frameworks relevant to financial services.
Benefits
- Health, visual and dental insurance
- Flexible sick time policy