Title: DevOps & Site Reliability Lead-Retail Devops Duration: Full Time Location: Deerfield IL, 60015 Job Description Technology and Programming (Expert Level)
- Strong proficiency in Java full stack developer
- Object-Oriented programming principles and concepts
- Hands-on experience with Spring Framework (Spring Boot, Spring MVC, Spring Security)
- Knowledge if RESTful API development
- Experience with database like Oracle, DB2, MySQL
- Proficiency in Payment Switch BASE24 EPS, C++, AS400 and Python is also added advantage
Domain, Cloud & Platform Engineering
- Must have domain experience on Retail Point of Sale/Payment Systems/Merchandising/Inventory/Logistics area
- Expertise in Microsoft Azure, including:
- Compute (VMs, App Services, Azure Container Apps)
- Containers & Orchestration (AKS, Docker)
- Networking (VNETs, Private Endpoints, Application Gateway, Load Balancers)
- Storage, Azure Key Vault, Azure Monitor, Log Analytics
- Proven experience designing enterprise-grade, highly available cloud platforms
DevOps & Engineering Excellence
- Advanced experience with Azure DevOps and CI/CD pipeline architecture
- Strong scripting skills (PowerShell, Bash)
- GitOps concepts, branching strategies, release orchestration
Site Reliability Engineering (Leadership Level)
- Ownership of platform reliability, resiliency, and performance
- Definition and governance of:
- SLIs, SLOs, SLAs
- Error budgets and reliability metrics
- Advanced observability strategy, designing and implementation:
- Metrics, logs, traces, alerts, dashboards using Dynatrace
- Incident response leadership, RCA facilitation, and long-term remediation planning
- Experience operating 99.9%-99.99% availability systems
Security, Compliance & Cost
- Secure cloud design using Key Vault, managed identities, RBAC
- Cost optimization (FinOps mindset) across cloud infrastructure
Roles & Responsibilities
- Act as Lead SRE for client's Retail platforms, owning reliability and stability outcomes
- Define and enforce SRE standards, best practices, and operating models
- Architect and govern highly available, scalable cloud platforms
- Lead the design and implementation of CI/CD and IaC strategies
- Establish proactive monitoring, alerting, and incident prevention mechanisms
- Own major incident leadership, RCA execution, and corrective action tracking
- Partner with application, security, and architecture teams to build reliability by design
- Drive automation to reduce toil and improve operational efficiency
- Mentor and coach SRE and DevOps engineers across teams
- Influence roadmap decisions with a reliability, scalability, and cost lens
Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.