Build client-critical software at KeY2Moon
At KeY2Moon Solutions, you will work on real client problems that affect revenue, operations, and customer experience. We combine agency speed with engineering discipline, so people who join us get broad ownership and measurable impact.
Direct exposure to product, architecture, and client decision-making
A digital subscription business is scaling quickly, but release windows trigger recurring incidents and rollback-heavy weekends for the internal team.
Their current pipeline was assembled in phases and lacks guardrails. We need a pragmatic engineer who can improve reliability without freezing product delivery.
You will redesign delivery controls, observability, and incident workflows so teams can ship often without breaking production.
Engagement Stack
Terraform
GitHub Actions
Responsibilities
Rework release flow using GitHub Actions, Terraform, and Kubernetes rollout controls that match real failure patterns
Improve incident readiness through better service ownership, Datadog/Sentry observability, and runbook quality
Set practical reliability KPIs from AWS infrastructure, deployment, and error telemetry that engineering and product can track together
Coach client squads on operational discipline, on-call readiness, and post-incident follow-through
Requirements
You have improved unstable pipelines in high-pressure environments using AWS, Kubernetes, and Infrastructure as Code
You can define reliability controls that teams adopt because they are practical for daily delivery, not just policy‑compliant
You are strong at production troubleshooting across infra, application, and CI/CD layers with clear incident communication
You can convert repetitive outage patterns into preventive engineering backlog with measurable reliability outcomes
Nice to have
Experience in subscription or payment‑heavy systems where uptime directly affects revenue
Experience running blameless postmortems with cross‑functional technical and business teams
Experience mentoring product engineers in reliability fundamentals and release safety practices
Hiring process
Intro call with talent team (30 minutes)
Practical role interview focused on recent project work (60-90 minutes)
Final panel on collaboration, ownership, and client communication
#J-18808-Ljbffr