Key Responsibilities:
Cloud Infrastructure Management:
Manage and optimize cloud infrastructure using AWS, Azure, or GCP.
Oversee cloud services such as EC2, S3, RDS, and Kubernetes (EKS/GKE/AKS).
Ensure high availability and fault tolerance of cloud-hosted environments.
2. Kubernetes Administration:
Administer and manage Kubernetes clusters, including statefulsets, deployments, PVCs, and configmaps.
Monitor and manage Kubernetes workloads and resources.
Optimize Kubernetes configurations for scalability, performance, and cost-efficiency.
3. Database Management:
Manage and administer SǪL (Postgres) and NoSǪL (MongoDB) databases.
Experience deploying databases in containerized environments (Kubernetes).
Ensure database reliability, backup strategies, and optimize database performance.
4. Monitoring and Alerting:
Set up and maintain monitoring and alerting systems using tools like Prometheus, Grafana, and CloudWatch.
Ensure system reliability and quick incident resolution through effective alerting and monitoring setups.
5. CI/CD Pipeline Management:
Design, implement, and maintain CI/CD pipelines for automated build, test, and deployment processes.
Collaborate with software engineering teams to integrate the pipelines with development workflows.
6. Scripting and Automation:
Develop automation scripts in Bash and Python for routine tasks, monitoring, and infrastructure management.
Enhance deployment processes and infrastructure automation.
7. Collaboration and Communication:
Work closely with cross-functional teams, including software engineers, cloud architects, and data scientists.
Communicate effectively to ensure smooth collaboration across teams and with external partners.
8. Machine Learning Operations (ML Ops):
Familiarity with ML Ops practices is a plus for collaborating with data science teams.
Experience with machine learning Python libraries (e.g., TensorFlow, Scikit-learn) is an added advantage.
Required Skills:
Cloud Services:
Hands-on experience with AWS, Azure, or GCP for managing cloud- based infrastructure.
Familiar with cloud-native tools and services such as EC2, S3, IAM, RDS, and Kubernetes services (EKS, GKE, AKS).
2. Kubernetes Administration:
Strong experience with Kubernetes administration, especially managing statefulsets, deployments, PVCs, configmaps, and other Kubernetes resources.
Ability to troubleshoot and optimize Kubernetes-based workloads.
3. Database Management:
Solid experience with SǪL databases (e.g., PostgreSǪL) and NoSǪL databases (e.g., MongoDB).
Experience deploying and managing databases on Kubernetes.
4. Scripting and Automation:
Proficient in scripting languages such as Bash and Python for automation of infrastructure management tasks.
Experience with Git-based workflows for source control and CI/CD.
5. Monitoring and Alerting:
Experience with monitoring tools such as Prometheus, Grafana, and CloudWatch to track cloud resources and Kubernetes workloads.
Knowledge of alerting best practices to ensure system reliability and availability.
6. Soft Skills:
Strong communication skills in English, with the ability to collaborate with diverse teams effectively.
Excellent problem-solving skills and attention to detail.
Ǫualifications:
Education: Bachelor’s degree in Computer Science, Information Technology, or a related field.
Certifications: Certifications in cloud platforms (AWS, Azure, GCP), Kubernetes, or DevOps are a plus.
Preferred Ǫualifications:
ML/AI Experience: Familiarity with ML Ops concepts and machine learning Python libraries like TensorFlow, Keras, or Scikit-learn will be advantageous.
Containerization Tools: Knowledge of containerization tools (Docker, Helm) and Kubernetes management tools (kubectl, Helm).