Job Description
Stefanini Group is hiring!
Stefanini is looking for a Cloud DevOps Engineer for Remote.
For quick Apply, please reach out to Rahul Kumar: 248 936 5060/ Rahul.kumar@stefanini.com
W2 candidates only!
Responsibilities
Client is looking for a Full Stack Engineer with extensive technical breadth of experience in JPasS and cloud technologies. This will be a key position leading the development teams in the cloud strategy integration and execution for the Markets Business Technology organization in partnership with System IT. In this role, you will use your technological expertise to integrate highly scalable, flexible, and resilient cloud solutions that address customer needs while reflecting cloud principles and best practices. As a trusted advocate, you will help shape cloud strategies, and support with sizing, scope, and direction for new engagements. You will be a major contributor to innovation and help anchor Proof of Concept activities, and guide technological direction for the cloud. The primary focus of this role will be to participate as a part of our dynamic and customer-focused team, developing an integration strategy for a Markets Business Technology Rapid Solutions Team based web applications in the cloud. The position will focus on effective use of leading edge technologies creating a foundation for resilient and effective environments.
Key Responsibilities
- Engage with architects, developers and technical support teams to achieve success in deployment of Cloud application services within a managed service environment.
- Work with business partners, architects, and other groups to identify technical and functional needs of systems, and determine priority of needs.
- Collaborate with performing teams to deliver new capabilities in business applications and/or remediate issues.
- Analyze, define and document requirements for data, workflow, and logical processes
- Analyze and translate business, information and technical requirements to create patterns for solutions that integrate across applications, systems and platforms to achieve business objectives.
- Assess foundational services, integration services, cloud operations and management capabilities.
- Champion, communicate and rationalize approaches with business leaders, organization management and within teams to develop structured outcome of proof of fit/proof of concept.
- Establish the direction for application development approaches, including tools, process and frameworks.
- Support preparation of documentation including application designs, Assessments, Security Management, Implementation Plans and post implementation documentation.
- Lead by example, demonstrating high performance in the areas of customer satisfaction, collaboration, teamwork, and reliability.
- Participate in the establishment of local standards, patterns, and practices for cloud integration, while providing input and influence to broader organizational standards and initiatives.
- Spearhead use of innovative new technology and best practices for new product and solution development initiatives.
- Participate in IT strategy development, including environmental analysis, opportunity identification, business cases, business innovation, and portfolio development. Responsible for the planning and engineering of an organization's systems infrastructure.
- Includes the implementation and design of hardware and software.
- Monitors the performance of systems. Typically requires a bachelor's degree in area of specialty and 6-8 years of experience in the field or in a related area.
- Familiar with concepts, practices, and procedures within a particular field.
- Relies on extensive experience and judgment to plan and accomplish goals.
- Performs a variety of complicated tasks.
- Works under general supervision.
- Leads and directs the work of others.
- A wide degree of creativity and latitude is expected.
- Typically reports to a manager or head of a unit/department.
Top 5 hard skills:Splunk Architecture & Administration
- Design and maintain distributed Splunk deployments (search heads, indexers, forwarders, deployers)
- Manage indexer clustering and search head clustering for high availability
- Configure data inputs, parsing, and index management
- Implement role-based access control (RBAC) and authentication integration
- Performance tuning and capacity planning
Data Onboarding:
- Design and implement data onboarding strategies for diverse data sources
- Create and maintain props.conf and transforms.conf for data parsing and routing
- Develop source type definitions and field extractions
- Configure input specifications and monitor data quality post-onboarding
- Establish data retention policies and index lifecycle management
Splunk HTTP Event Collector (HEC):
- Configure and manage HEC endpoints for REST API-based data ingestion
- Implement HEC tokens with appropriate permissions and index routing
- Troubleshoot HEC connectivity, authentication, and data formatting issues
- Scale HEC deployments for high-volume event ingestion
- Integrate cloud-native applications and serverless functions with HEC
Splunk DB Connect:
- Install, configure, and maintain DB Connect app across search heads
- Create database connections and manage JDBC drivers for various database types
- Design and schedule database inputs (rising column, batch, and tail inputs)
- Optimize SQL queries for performance and minimize database load
- Configure database identity management and credential security
- Troubleshoot connection issues, query timeouts, and data ingestion gaps
Relevance: Essential for maintaining platform health, scalability, ensuring data availability across the enterprise, and enabling seamless integration of diverse data sources into the Splunk ecosystem
AWS Infrastructure & Services Core Competencies:
- Deploy and manage EC2 instances for Splunk components with proper sizing
- Configure VPCs, security groups, NACLs, and networking for secure Splunk communication
- Implement EBS storage optimization and snapshot strategies for Splunk data
- Leverage S3 for SmartStore architecture and backup solutions
- Use AWS Systems Manager, CloudWatch, and Auto Scaling for monitoring and automation
Relevance: Critical for cost-effective, secure, and resilient infrastructure supporting enterprise-scale log aggregation
Infrastructure as Code (IaC) & Automation Core Competencies:
- Terraform or CloudFormation for provisioning Splunk infrastructure
- Ansible, Puppet, or Chef for Splunk configuration management
- Python/Bash scripting for custom automation tasks
- CI/CD pipeline integration (Jenkins, GitLab CI, GitHub Actions)
- Version control with Git for infrastructure and configuration code
Relevance: Enables repeatable deployments, reduces human error, and accelerates disaster recovery and scaling operations
Monitoring, Logging & Troubleshooting Core Competencies:
- Create Splunk monitoring dashboards and alerts for platform health
- Implement log forwarding strategies using universal/heavy forwarders
- Troubleshoot data ingestion issues, search performance, and cluster health
- Integrate AWS CloudWatch metrics with Splunk for unified monitoring
- Analyze Splunk internal logs (_internal, _introspection, _audit indexes)
Relevance: Ensures platform reliability, rapid incident response, and proactive identification of issues before they impact users
Security & Compliance Core Competencies:
- Implement encryption in-transit (SSL/TLS) and at-rest for Splunk data
- Configure AWS IAM roles and policies following least-privilege principles
- Ensure compliance with standards (PCI-DSS, HIPAA, SOC 2) for log data
- Implement backup and disaster recovery procedures
- Secure API access and credential management (AWS Secrets Manager, HashiCorp Vault)
Relevance: Protects sensitive log data, maintains audit trails, and ensures regulatory compliance in enterprise environments
Cribl Stream & Cribl Edge - Data Pipeline Optimization Cribl Stream (LogStream) Competencies:
- Deploy and manage Cribl Stream architecture (Leader nodes, Worker nodes, Worker groups)
- Configure data sources and destinations for multi-platform routing (Splunk, S3, other SIEMs)
- Design and implement pipelines for data transformation, enrichment, and reduction
- Create routes and filters to optimize data flow and reduce ingestion costs
- Implement data sampling, aggregation, and redaction for compliance and cost savings
- Configure event breakers, parsers, and field extractions within Cribl
- Manage Cribl packs for pre-built data optimization solutions
- Integrate Cribl Stream with Splunk HEC and S3 for hybrid storage strategies
- Monitor pipeline performance and troubleshoot data flow issues
- Implement GitOps workflows for Cribl configuration management
Cribl Edge Competencies:
- Deploy and manage Cribl Edge fleets for distributed edge data collection
- Configure Edge nodes as lightweight agents replacing traditional forwarders
- Implement centralized management of Edge fleets through Cribl Cloud or Stream Leader
- Collect data from edge sources (logs, metrics, Windows events, syslog)
- Perform edge-side data processing to reduce bandwidth and central processing load
- Configure auto-discovery and dynamic data source management
- Manage Edge node updates, configuration versioning, and fleet-wide deployments
- Monitor Edge node health and connectivity across distributed environments
- Implement edge-to-cloud data routing strategies for hybrid architectures
Incident Management & Service Request Support Core Competencies: Incident Response:
- Triage and respond to platform incidents following ITIL or similar frameworks
- Diagnose and resolve P1/P2 incidents affecting Splunk availability or data ingestion
- Perform root cause analysis (RCA) and create post-incident reports
- Coordinate with cross-functional teams during major incidents
- Implement corrective and preventive actions to reduce incident recurrence
- Maintain on-call rotation and provide 24/7 platform support
Service Request Management:
- Process user access requests (account creation, role assignments, permission changes)
- Handle data onboarding requests for new applications and data sources
- Fulfill infrastructure change requests (index creation, retention policy updates, capacity expansion)
- Coordinate app installations and updates on search heads
- Provision and configure new forwarders, HEC tokens, or DB Connect inputs
- Create custom dashboards and reports based on user requirements
Ticket Management & Communication:
- Utilize ticketing systems (ServiceNow, Jira Service Management, Remedy)
- Document troubleshooting steps and resolution procedures
- Maintain SLA compliance for incident response and service request fulfillment
- Communicate effectively with stakeholders on status updates and timelines
- Create and maintain knowledge base articles for common issues
- Escalate complex issues to vendors (Splunk Support, AWS Support) when necessary
Proactive Support:
- Conduct health checks and performance reviews
- Identify trending issues and implement preventive measures
- Provide user training and guidance on Splunk best practices
- Participate in change advisory board (CAB) meetings for platform changes
Relevance: Ensures rapid resolution of platform issues, maintains high availability and user satisfaction, and provides structured support that aligns with enterprise IT service management practices. Essential for maintaining operational excellence and meeting business-critical SLAs.
Agile Methodology & Project Collaboration Core Competencies: Scrum Framework Experience:
- Participate in sprint planning, daily stand-ups, sprint reviews, and retrospectives
- Commit to sprint goals and deliver incremental value within 2-week sprint cycles
- Collaborate with Scrum Master to remove impediments and optimize team velocity
- Contribute to backlog refinement and story estimation sessions (story points, planning poker)
- Demonstrate completed work during sprint reviews and incorporate feedback
- Identify process improvements during retrospectives for continuous team enhancement
Kanban Framework Experience:
- Work within continuous flow model with WIP (Work in Progress) limits
- Manage work items through defined workflow stages (To Do, In Progress, Review, Done)
- Prioritize tasks dynamically based on business value and urgency
- Monitor cycle time and lead time metrics for process optimization
- Participate in Kanban board reviews and workflow refinement
- Balance operational support work with project-based initiatives
Story Creation & Management:
- Write clear, concise user stories with acceptance criteria following "As a [user], I want [goal], so that [benefit]" format
- Break down epics into manageable user stories and technical tasks
- Define technical requirements, dependencies, and effort estimates
- Update story status, track progress, and document blockers in real-time
- Create technical debt and bug stories with appropriate prioritization
- Maintain story traceability through completion and closure
Product Owner Collaboration:
- Participate in backlog grooming sessions to clarify requirements and priorities
- Provide technical feasibility input and effort estimates for proposed features
- Communicate constraints, risks, and technical dependencies proactively
- Negotiate scope and timelines based on technical complexity and resource availability
- Seek clarification on ambiguous requirements before implementation
- Provide regular status updates on work progress and potential delivery impacts
- Offer alternative technical solutions to meet business objectives
- Present completed work demonstrations and gather stakeholder feedback
Agile Tools & Practices:
- Utilize project management tools (Jira)
- Maintain transparency through accurate story updates and burndown tracking
- Participate in capacity planning and release planning activities
- Contribute to definition of done (DoD) and team working agreements
- Practice iterative development with continuous integration and delivery
Relevance: Enables effective collaboration in fast-paced development environments, ensures alignment with business priorities, and facilitates continuous delivery of value. Critical for balancing platform operations with strategic initiatives and maintaining transparent communication with stakeholders in modern DevOps organizations.
Top 5 soft skills: Team Size (Direct Team) Immediate team is 5 engineers, larger chapter is 15 engineers & 2 data analysts.
Key aspects of the role: Provide engineering and devops support of the CoLo migration with emphases on infrastructure supporting the Splunk and Cribl product lines.
Day-to-day Expectations: Provide engineering and devops support of the CoLo migration with emphases on infrastructure supporting the Splunk and Cribl product lines.
Preferred Qualifications
- Master's Degree.
- AWS Certification (Certified Developer or Solutions Architect)
- 6+ years software development experience.
- 2+ years of experience in Agile.
- 2+ years of experience in deploying Red Hat OpenShift Container Platform solutions.
- 2+ years of experience in deploying cloud-based solutions.
- 2+ years of experience in deploying hybrid cloud-based solutions.
- Master's degree from an accredited college or university with specialization in an Information Technology field, or an equivalent combination of related education and work experience.
- Minimum eight years of broad technical experience in one or more phases of information technology and management information systems.
- Experience to include managing highly complex IT efforts and / or operations.
- Eight years of information technology experience directly related to software design and development, of which 2+ years focused on cloud architecture, design, and implementation. Excellent oral and written communication, presentation, leadership, interpersonal, collaborative/relationship-building, organizational & planning, and analytical & problem solving skills.
- Demonstrated experience being part of a high velocity DevOps team
- Practical articulation of IaC - thorough knowledge and practical experience implementing container management concepts and practices using OCP, Kubernetes, and Docker.
- Innate experience with and application of Agile principles Highly proficient in Java Fullstack Engineering, development and Cloud technology platforms: Cloud-native Architecture: Well Architected Framework, 12 Factor App, Cloud Reference Architectures, Cloud Service Models, Microservices architecture, Serverless architecture, Decoupled UI including JavaScript frameworks (AngularJS, React), single page applications, and modern web applications.
- Cloud Strategy: Business case development, Application assessment and migration planning, Cloud operating model design.
- Cloud Security: Shared Security Model, Cloud Security Architecture, IAM policies/roles, WAF, OWASP Web/API, SAML, vulnerabilities and compensating controls (CSP, CSRF, XSS, SQLI) etc.
- Cloud-native Services: Optimization of applications to scalable cloud designs, designing, arch and integration of applications to modern cloud patterns.
- Containerization (Docker) and container orchestration (Docker Swarm, Kubernetes), Infrastructure as Code
- Cloud-native monitoring, logging - ELK, Splunk, CloudWatch, amongst others.
- Application Integration Services - RestAPI, API gateway ex. MuleSoft Well-versed in Agile software development practices & ability to contribute to sprint ceremonies such as, Refinement, Planning, Review, Retrospectives.
- Well-versed in Continuous Integration and Continuous Delivery (CI/CD) practices, automated testing, Automated code quality scanning, and Automated Deployments. ex Deploy XL , Deploy Release, Subversion, Jenkins, Jira, Remedy Demonstrated experience in mentorship and serving as technical subject matter expert (SME) for a development team. Experience in the Finance industry or experience developing solutions involving financial products is a plus.
Skill Definition And Sharing
- Responsible for implementing and maintaining Splunk environments across Cloud (AWS) and on-premise solutions (both for Splunk Enterprise and Splunk Universal Forwarders.
- Work with various stakeholders to ensure data is properly parsed, stored, and accessible.
- Developing and maintaining dashboards, reports, and custom queries, as well as onboarding new data sources into Splunk.
- Experience in optimizing Splunk's performance, troubleshooting issues, and ensuring security and compliance within the Splunk environment.
- Installation, configuration, administration, and maintenance of platform (Splunk Enterprise & Splunk Universal Forwarder) components.
- Experience in operating and maintaining a distributed Linux environment with hands-on experience as a Linux system administrator.
- Experience in operating in an Agile environment (Scrum/Sprint & Kanban).
- Understanding of JIRA fundamentals, such as:
- Familiarities with navigation interface.
- Basic understanding of Jira project structure
- Working with stories and features.
- Creating and contributing to exceptional operational support documentation.
Listed salary ranges may vary based on experience, qualifications, and local market. Also, some positions may include bonuses or other incentives.
Stefanini takes pride in hiring top talent and developing relationships with our future employees. Our talent acquisition teams will never make an offer of employment without having a phone conversation with you. Those face-to-face conversations will involve a description of the job for which you have applied. We also speak with you about the process including interviews and job offers.
About Stefanini Group
The Stefanini Group is a global provider of offshore, onshore and near shore outsourcing, IT digital consulting, systems integration, application, and strategic staffing services to Fortune 1000 enterprises around the world. Our presence is in countries like the Americas, Europe, Africa, and Asia, and more than four hundred clients across a broad spectrum of markets, including financial services, manufacturing, telecommunications, chemical services, technology, public sector, and utilities. Stefanini is a CMM level 5, IT consulting company with a global presence. We are CMM Level 5 company.