Job Summary We are looking for an experienced
AWS Data Engineer with strong expertise in
Python and PySpark to design, build, and maintain large-scale data pipelines and cloud-based data platforms. The ideal candidate will have hands-on experience with
AWS services, distributed data processing, and implementing scalable solutions for analytics and machine learning use cases.
Key Responsibilities
- Design, develop, and optimize data pipelines using Python, PySpark, and SQL
- Build and manage ETL/ELT workflows for structured and unstructured data
- Leverage AWS services (S3, Glue, EMR, Redshift, Lambda, Athena, Kinesis, Step Functions, RDS) for data engineering solutions
- Implement data lake/data warehouse architectures and ensure data quality, consistency, and security
- Work with large-scale distributed systems for real-time and batch data processing
- Collaborate with data scientists, analysts, and business stakeholders to deliver high-quality, reliable data solutions
- Develop and enforce data governance, monitoring, and best practices for performance optimization
- Deploy and manage CI/CD pipelines for data workflows using AWS tools (CodePipeline, CodeBuild) or GitHub Actions. Required Skills & Qualifications
- Strong programming skills in Python and hands-on experience with PySpark
- Proficiency in SQL for complex queries, transformations, and performance tuning
- Solid experience with AWS cloud ecosystem (S3, Glue, EMR, Redshift, Athena, Lambda, etc.)
- Experience working with data lakes, data warehouses, and distributed systems
- Knowledge of ETL frameworks, workflow orchestration (Airflow, Step Functions, or similar), and automation
- Familiarity with Docker, Kubernetes, or containerized deployments
- Strong understanding of data modeling, partitioning, and optimization techniques
- Excellent problem-solving, debugging, and communication skills