DevOps Engineer

Weekday AI (YC W21) • Full-time • Pune, IN • 3d ago

This role is for one of the Weekday's clients

Min Experience: 4 years

Location: Pune (On-site / Warehouse + ORice)

JobType: full-time

We are seeking a DevOps Engineer with strong systems and release engineering expertise to manage deployment, reliability, and on-premise infrastructure for large-scale distributed systems. This role sits at the intersection of Linux internals, automation, and release management, and is ideal for engineers who are comfortable operating directly at the OS, network, and application layers.

Unlike traditional cloud-focused DevOps roles, this position requires deep hands-on experience with bare-metal Linux environments, containerized workloads, and high-availability systems operating under real-world constraints.

Requirements

Key Responsibilities

Linux Systems & Infrastructure

Operate, tune, and troubleshoot bare-metal Linux servers across CPU, memory, disk, and network layers.
Perform deep OS-level diagnostics using system logs, process inspection, and kernel-level tooling.
Resolve complex production issues without reliance on cloud dashboards or managed abstractions.

Release Engineering & CI/CD

Own end-to-end CI/CD pipelines, including build, release orchestration, staged rollouts, and rollback strategies.
Manage versioning and release lifecycle to ensure safe, repeatable deployments.

Containerization

Build, optimize, and debug Docker images with attention to layering, performance, and reliability.
Integrate containerized services into on-prem environments and troubleshoot runtime issues.

Networking & Reliability

Diagnose and resolve network issues such as latency, packet loss, jitter, and Wi-Fi instability.
Ensure reliable system performance in high-load, real-time, and high-density environments.

Automation & Tooling

Develop automation using Bash and Python for build workflows, log parsing, system utilities, and operational tooling.

Monitoring & Operations

Monitor system throughput, latency, and health using tools such as Prometheus and Grafana.
Design alerts, perform Root Cause Analysis (RCA), and implement preventive improvements.

Required Skills & Experience

Expert-level knowledge of UNIX/Linux systems (Ubuntu Server preferred), including process, memory, and log management.
Strong experience in release engineering, deployment strategies, and rollback planning.
Proficiency with Docker, including image optimization and debugging.
Advanced scripting skills in Bash and Python.
Solid understanding of networking fundamentals (TCP/UDP, routing, VLANs, Wi-Fi performance).
Hands-on experience with monitoring, observability, and log analysis tools (Prometheus, ELK, or similar).

Good to Have

Background in SRE, systems reliability, or build/release engineering.
Experience running Kubernetes in on-prem or edge environments.
Operational experience with databases such as PostgreSQL, MongoDB, or Redis (availability, backups).

What This Role Is Not

Not a cloud-only DevOps role focused solely on AWS/Azure/GCP services.
Not suitable for engineers dependent on dashboards without deep terminal-based troubleshooting skills.

What Success Looks Like

Reliable, well-orchestrated releases with seamless rollback capabilities.
Clear visibility into system behavior through meaningful metrics and logs.
Rapid and accurate root-cause analysis at the OS, network, and application layers.

Key Skills

Linux

Docker
Bash Scripting
Python
Release Engineering
On-Prem Infrastructure
Monitoring & Observability

Apply