Senior AWS Cloud Platform Engineer / AWS Infrastructure Operations Lead
Location
Open (Hybrid/Remote based on business requirements)
Experience
10+ Years Overall Experience | 5+ Years in AWS Cloud Infrastructure & Platform Operations
Job Summary
We are seeking a highly skilled Senior AWS Cloud Platform Engineer to own and manage the health, security, reliability, compliance, and operational excellence of our AWS environments.
This role is tactical and execution-focused, responsible for ensuring AWS platforms remain secure, compliant, highly available, cost-efficient, and operationally mature. The ideal candidate will work closely with Software Engineering, DevOps, Security, Infrastructure, and Architecture teams to maintain and continuously improve the cloud platform.
The successful candidate will act as the custodian of AWS infrastructure, implementing best practices, governance controls, automation, security standards, and cost optimization strategies while ensuring seamless day-to-day cloud operations.
Primary Mission
Keep AWS environments secure, compliant, reliable, scalable, cost-efficient, and operationally stable for the broader organization.
Own the health, governance, maintenance, and continuous improvement of the AWS platform.
Key Responsibilities
AWS Platform Operations & Infrastructure Management
- Manage and maintain AWS infrastructure across multiple environments and accounts.
- Provision and manage AWS accounts, VPCs, subnets, route tables, security groups, IAM roles, EKS clusters, ECS services, RDS databases, and related cloud resources.
- Design, deploy, and maintain cloud infrastructure using Infrastructure as Code (IaC).
- Maintain and enhance Terraform and AWS CloudFormation templates.
- Collaborate with Development, DevOps, Security, and Infrastructure teams to support platform requirements.
- Ensure AWS services are operating optimally and aligned with organizational standards.
Cloud Governance & Security
- Implement and manage AWS Organizations, Service Control Policies (SCPs), guardrails, and governance frameworks.
- Enforce least-privilege access models across AWS environments.
- Manage IAM policies, roles, federated access, and cross-account access patterns.
- Rotate credentials, enforce MFA policies, and implement security controls.
- Monitor and respond to security findings from AWS Security Hub, GuardDuty, Inspector, and other security tools.
- Ensure logging, monitoring, and auditing capabilities are properly configured and maintained.
- Support compliance initiatives and security audits.
Networking & Connectivity
- Design and manage VPC architectures, subnets, routing configurations, NAT gateways, and network segmentation.
- Implement and maintain VPC peering, Transit Gateway, VPN, Direct Connect, and hybrid connectivity solutions.
- Troubleshoot complex networking and connectivity issues.
- Ensure secure and scalable network architecture across AWS environments.
Kubernetes & Container Platform Management
- Manage and optimize Amazon EKS environments.
- Support containerized workloads running on Kubernetes and ECS.
- Maintain cluster health, security, upgrades, scaling, and operational best practices.
- Collaborate with engineering teams on deployment automation and platform improvements.
Monitoring, Reliability & Incident Management
- Implement monitoring, alerting, observability, and operational dashboards.
- Proactively identify platform risks and reliability concerns.
- Diagnose and resolve complex cloud infrastructure incidents.
- Lead root cause analysis (RCA) and preventive action initiatives.
- Ensure backup, disaster recovery, and business continuity capabilities are regularly tested and maintained.
Cost Optimization & Resource Management
- Review AWS Cost Explorer and usage reports regularly.
- Monitor cloud spend and identify optimization opportunities.
- Right-size compute, storage, and database resources.
- Implement tagging governance and ensure tag accuracy across environments.
- Generate reports and recommendations on cloud cost trends and optimization initiatives.
Documentation & Continuous Improvement
- Maintain detailed operational runbooks, architecture documentation, and platform standards.
- Develop and improve operational procedures and automation.
- Drive continuous improvements in cloud reliability, security, and efficiency.
- Provide technical mentorship and guidance to engineering teams.
Required Qualifications
- Bachelor's Degree in Computer Science, Information Technology, Engineering, or related field preferred.
- 10+ years of IT infrastructure experience with at least 5+ years managing AWS environments.
- Strong hands-on experience operating large-scale AWS cloud platforms.
- Deep expertise in Infrastructure as Code (IaC) and cloud automation.
- Strong networking fundamentals including routing, DNS, firewalls, load balancing, VPNs, and hybrid cloud connectivity.
- Proven experience managing multi-account AWS environments.
- Experience supporting highly available and mission-critical cloud workloads.
- Strong troubleshooting and incident management skills.
- Excellent documentation and communication skills.
- Ability to work independently and collaborate effectively across cross-functional teams.
Technical Skills
AWS Services
- AWS Organizations
- IAM
- VPC
- Transit Gateway
- Route 53
- EC2
- Auto Scaling
- Load Balancers
- S3
- EBS
- RDS
- ECS
- EKS
- CloudTrail
- CloudWatch
- Config
- Security Hub
- GuardDuty
- AWS Backup
- Systems Manager
- KMS
Infrastructure as Code & Automation
- Terraform
- AWS CloudFormation
- AWS SDK
- Ansible
- GitOps practices
Containers & Orchestration
- Kubernetes
- Amazon EKS
- ECS
- Docker
Security & Compliance
- Identity & Access Management
- Zero Trust Principles
- Security Monitoring
- Governance Controls
- Compliance Frameworks
- Audit Readiness
Monitoring & Observability
- CloudWatch
- Logging & Alerting Platforms
- Operational Dashboards
- Incident Response
Preferred Certifications
Mandatory
- AWS Certified Solutions Architect – Associate
- AWS Certified Security – Specialty
Strongly Preferred for Senior Resources
- AWS Certified Solutions Architect – Professional
- AWS Certified DevOps Engineer – Professional
- HashiCorp Terraform Associate
- Certified Kubernetes Administrator (CKA)
Success Metrics
The successful candidate will be measured on:
- AWS platform availability and reliability
- Security compliance and audit readiness
- Reduction in security findings and operational risks
- Infrastructure automation maturity
- Cloud cost optimization outcomes
- Incident response effectiveness
- Platform governance adherence
- Documentation quality and operational excellence
- Stakeholder satisfaction and collaboration effectiveness
Ideal Candidate Profile
A hands-on AWS platform expert who thrives in operational ownership, cloud governance, security, networking, automation, and cost optimization. This individual takes accountability for keeping cloud environments stable, secure, compliant, and efficient while enabling development teams to focus on delivering business value.