New York, United States | Posted on 11/13/2025
Title:
Senior Site Reliability Engineer (SRE)
Location:
Remote
AboutJanuary
AtJanuary, we’re transforming the lives of borrowers by bringing humanity to consumer finance. Our data-driven products empower financial institutions to streamline collections and help borrowers regain financial stability and control over their lives. We’re not just expanding access to credit — we’re restoring dignity and paving the way for millions to achieve financial freedom.
Aboutthe Role
As a
Senior Site Reliability Engineer (SRE) , you will establish SRE practices from the ground up — ensuring reliability, scalability, and performance as January scales from thousands to millions of borrowers. You’ll architect resilient infrastructure, design modern observability solutions, and build sustainable on-call processes that evolve with our rapid growth.
Your work will directly address scaling challenges including database optimization, async workflow infrastructure, and data pipeline reliability — enabling the engineering team to ship confidently and efficiently.
KeyResponsibilities
Lead
incident response
and develop sustainable on-call practices, including runbooks, blameless postmortems, and continuous improvement to reduce MTTR.
Build and maintain
self-service observability
tools (Datadog, Prometheus, ELK) for proactive monitoring and troubleshooting.
Create and maintain
Infrastructure as Code (IaC)
using
Terraform
or
CloudFormation
for consistent, secure AWS environments.
Partner with development teams to
architect resilient, scalable infrastructure
for critical components like databases, networking, async workflows, and data pipelines.
Design and implement robust
CI/CD pipelines
(GitHub Actions) with advanced deployment strategies (blue/green, canary).
Drive
best practices
in reliability and performance early in the design phase to future-proof January’s systems.
RequiredSkills & Experience
Proven experience leading
incident response
and postmortem processes for high-availability production systems.
Deep expertise in designing
highly available architectures
(EC2, Fargate, auto-scaling, health checks, graceful degradation).
Strong experience with
AWS cloud infrastructure
and
IaC tools
(Terraform, CloudFormation).
Hands-on experience with
CI/CD automation
using
GitHub Actions
or equivalent tools.
Proficiency in observability and monitoring stacks ( Datadog, Prometheus, ELK ).
Solid scripting/programming skills in
Python
(for automation, tooling, and debugging).
Excellent communication and documentation skills, with the ability to collaborate across engineering and platform teams.
Requirements
Cloud:
AWS
IaC:
Terraform, CloudFormation
CI/CD:
GitHub Actions
Languages:
Python
Infrastructure:
EC2, Fargate
AdditionalDetails
Remote role (NYC-based preferred for hybrid collaboration).
Opportunity to
build and own the entire SRE practice
for a growing FinTech startup.
Fast-paced, innovative environment working on AI-forward consumer finance products.
#J-18808-Ljbffr