AWS Cloud Engineer
Seattle, WA/St. Louis, MO/Plano, TX/Dallas, TX/Houston, TX
Fulltime
Job Description:
- AWS data services (S3, Glue, Redshift, Athena, Lambda, Step Functions, Kinesis, etc.).
- Unity Catalog, Pyspark, AWS Glue, Lambda, Step Functions, and Apache Airflow
- AWS data services (S3, Glue, Redshift, Athena, Lambda, Step Functions, Kinesis, etc.)
- Programming skills in Python, Scala, or Pyspark for data processing and automation
- Expertise in SQL and experience with relational and NoSQL databases (e.g., RDS, DynamoDB).
- Data Pipeline Development: Design, develop, and optimize ETL/ELT pipelines using AWS & Databricks services such as Unity Catalog, Pyspark, AWS Glue, Lambda, Step Functions, and Apache Airflow.
- Data Integration: Integrate data from various sources, including relational databases, APIs, and streaming data, ensuring high data quality and consistency.
- Cloud Infrastructure Management: Build and manage scalable, secure, and cost-efficient data infrastructure using AWS services like S3, Redshift, Athena, and RDS.
- Data Modeling: Create and maintain data models to support analytics and reporting requirements, ensuring efficient querying and storage.
- Performance Optimization: Monitor and optimize the performance of data pipelines, databases, and queries to meet SLAs and reduce costs.
- Collaboration: Work closely with data scientists, analysts, and software engineers to understand data needs and deliver solutions that enable business insights.
- Security and Compliance: Implement best practices for data security, encryption, and compliance with regulations such as GDPR, CCPA, or ITAR.
- Automation: Automate repetitive tasks and processes using scripting (Python, Bash) and Infrastructure as Code (e.g., Terraform, AWS CloudFormation).
- Agile Development: Build and optimize continuous integration and continuous deployment (CI/CD) pipelines to enable rapid and reliable software releases using Gitlab in an Agile environment.
- Monitoring and Troubleshooting: Set up monitoring and alerting for data pipelines and infrastructure, and troubleshoot issues to ensure high availability.