Job Description Summary
The Production DevOps Engineer serves as a critical link in the "Middle-Mile" of software delivery for the GE Vernova’s Grid Software SaaS products. This role is responsible for ensuring that software moves from development to production environments through a standardized, secure, and highly observable path. You will own the Change Management Process, serving as a primary authority for production deployments to ensure that new SaaS product versions do not compromise the stability of global energy grid operations. This position requires a strong technical background in automation and a disciplined approach to release safety in a 24/7 operational environment.
Works independently and is seen as a Technical Leader. The role demonstrates deep understanding of concurrent software development, its effect on build management and releasing the builds across versions and environments
Job Description
Roles and Responsibilities
Day 0: Pipeline Implementation & Standardization
- Golden Path Execution: Maintain and improve standardized CI/CD pipelines using GitHub Actions and ArgoCD, ensuring all product teams follow the established "Golden Path" to avoid bespoke, non-standard deployment utilities.
- Policy Enforcement: Implement and manage automated "quality gates" within the delivery pipeline to verify that every release meets security and architectural standards before reaching production.
- Provisioning Support: Assist the SaaS Cloud Engineers in automating highly secure, resilient customer’s cloud infrastructure.
Day 1: Release Authority & Deployment Management
- Change Control Authority: Review and provide final approval for production deployment requests, ensuring all pre-release criteria—such as performance testing and security scanning—are satisfied.
- Progressive Delivery: Execute advanced rollout strategies, including Canary and Blue/Green deployments on Kubernetes, to minimize the "blast radius" of changes .
- Validation: Perform automated verification and acceptance testing post-deployment to confirm service health and trigger automated rollbacks if necessary.
Day 2: Operational Support & Optimization
- 24/7 Follow-the-Sun Support: Participate in global on-call rotations, ensuring a seamless transition of operational responsibility between time zones through standardized handover protocols.
- Incident & Root Cause Analysis: Support high-severity incident response and participate in blameless Root Cause Analysis (RCA) to identify and fix systemic deployment risks .
- FinOps & Capacity: Track and report on cloud resource consumption for CI/CD infrastructure, assisting in cost-optimization efforts and right-sizing production workloads.
- Manage key deliverables and mentors junior team members.
- Contribute in driving initiatives such as defining standards and processes to ensure quality.
- Develop and enhance the test infrastructure and continuous integration framework used across teams.
- Learn new build and releases techniques and methodologies and trains the team in the same.
- Partner with and provides direction to fellow team members to diagnose bugs and formulate solutions.
Technical Requirements
- CI/CD & GitOps: Hands-on experience with Jenkins, Artifactory, GitHub Actions and ArgoCD for automated software delivery.
- Container Orchestration: Proficiency in managing workloads on Kubernetes, specifically with EKS clusters.
- Automation Tools: Strong skills in Ansible and Terraform for configuration management and infrastructure-as-code.
- Cloud Platform: Solid understanding of AWS cloud services (VPC, IAM, EKS, RDS, S3, MSK, etc) in a production setting.
- Observability: Experience using Prometheus, Grafana, Splunk, Datadog or Dynatrace to monitor deployment health and system performance .
Experience & Qualifications
- Professional Background: 5+ years of experience in DevOps, SRE, or Release Engineering roles for cloud-native SaaS applications.
- Overall Experience: 8+ Years.
- Operational Discipline: Proven ability to manage production changes and troubleshooting under pressure in a high-stakes environment.
- Compliance Awareness: Familiarity with regulated industries and security frameworks such as NERC CIP, SOC2, ISO 27001, IEC 62443 is highly preferred.
- Communication: Strong ability to document technical procedures and communicate clearly with stakeholders during global shift handovers.
Key Performance Indicators (KPIs)
- System Availability: Help maintain 99.99% availability of mission critical grid SaaS products.
- Customer Onboarding Speed: Contribution towards the 4-hour SLA target.
- Change Failure Rate: Maintaining a low rate of failed production deployments through improved quality gates .
- Mean Time to Recover (MTTR): Ensuring fast restoration of service through automated rollbacks and executing run books diligently.
- Toil Reduction: Automating repetitive manual tasks to ensure at least 50% of time is spent on engineering improvements.
Education Qualification
Bachelor's Degree in Computer Science or “STEM” Majors (Science, Technology, Engineering and Math) with advanced experience.
Business Acumen
- Strong problem solving abilities and capable of articulating specific technical topics or assignments
- Experience in building scalable and highly available distributed systems
- Skilled in breaking down problems and estimate time for development tasks
- Evangelizes how our technology solves customer problems from a technology and business perspective
Leadership
- Demonstrates clarity of thinking to work through limited information and vague problem definitions
- Influences through others; builds direct and "behind the scenes" support for ideas
- Proactively identifies and removes project obstacles or barriers on behalf of the team
- Shares knowledge, power, and credit, establishing trust, credibility, and goodwill
Personal Attributes
- Able to work under minimal supervision
- Excellent communication skills and the ability to interface with senior leadership with confidence and clarity
- Skilled in providing oversight and mentoring team members. Shows ability to effectively delegate work.
- Applies values, business strategy, policies, precedent, and experience to make complex decisions in
ambiguity and with uncertain consequences.
Additional Information
Relocation Assistance Provided: Yes