Manager, Software Engineering (Resilience Engineering)
Affirm · Remote (United States) · Infrastructure Platform Eng
About this role
Affirm is hiring a manager-level Engineering Manager in the software engineering function as a remote position. The posting calls out experience with Python, Java, Kotlin, AWS. Compensation is listed at $225,000–$275,000 per year.
- Role
- Engineering Manager
- Function
- software engineering
- Level
- manager
- Track
- hybrid
- Employment
- Full-time
- Location
- Remote (United States)
- Work mode
- Remote
- Department
- Infrastructure Platform Eng
More roles at Affirm
Job description
from Affirm careersAffirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest.
We are seeking a seasoned Engineering Manager to lead our Resilience Engineering team. This role is critical in ensuring the safety and reliability of our production systems through proactive validation techniques, including production load testing and chaos engineering.
You will lead the development of systems and practices that allow engineers to safely test system behavior under stress and failure conditions in production, ensuring issues are discovered and mitigated before they impact real users.
What you’ll do
Leadership & Strategy
- Define and drive the vision for resilience engineering at Affirm, with a focus on production load testing and chaos engineering as first-class engineering practices.
- Lead and mentor a team of engineers building platforms and tooling for safe production experimentation.
- Partner with infrastructure, product, and security leadership to embed resilience validation into the software development lifecycle.
- Establish best practices for safely testing system limits and failure scenarios in production.
Systems & Operations
- Own the design and evolution of platforms that enable safe, controlled production load testing and fault injection.
- Ensure strong safeguards are in place, including isolation boundaries, approval workflows, and automated rollback mechanisms to protect real users.