We're looking for
Site Reliability Engineers!
Job Description
- Person will be responsible to Perform Big Data SRE and Engineering activities on multiple Open-source Hadoop, HBASE, Spark, or KAFKA or AWS managed clusters
- Experience on Cloud platform preferably AWS.
- Cross-team teamwork, build and maintain relationships with the customer teams, the user community, architects and engineering teams jointly work on key deliverables ensuring production scalability and stability
- Effective Root cause analysis of major production incidents and developing learning documentation.
- Automation of repetitive tasks to reduce manual effort and avoid Human errors.
- Tune alerting and setup observability to proactively identify the issues and performance problems.
- Work closely with L-3 teams in reviewing new use cases, cluster hardening techniques for building a robust and reliable platforms.
- Ensure the platform can effectively meet performance and SLA requirements.
- This is a hybrid position. Expectation of days in office will be confirmed by your Hiring Manager.
Basic Qualifications
- Proficiency working as a Hadoop or KAFKA or AWS SRE.
- Proficiency in building, managing and tuning performance of Hadoop, Kafka or AWS platforms particularly in the areas of MSK, EMR.
- Proficiency on Hortonworks distribution or Open Source are preferred
- Excellent Python programming skills for automation requirement for repetitive dev-ops tasks
- Understanding of Linux, networking, CPU, memory and storage.
- Excellent interpersonal, verbal, and written communication skills.
(
*Note: Scoutit is an independent candidate sourcing partner for this hiring. All applications, candidate screening, and hiring-related actions are conducted directly by the respective employer’s authorized recruitment team. Relevant candidates will be contacted via email and asked to fill in additional details as part of the next stage of the application process.)
Skills: hadoop,aws,kafka