Selby Jennings
Location
Job Type
Full Time
Date Posted
25 May 2025
Job Category
Industrial Engineers
Industry
Science, Technology, Engineering & Mathematics
Site Reliability Engineer
Primary Focus Areas: Cloud Infrastructure, System & Network Administration, Monitoring, Governance, Risk, and Compliance
Position Level: P4 - Advanced
Work Location: McLean, VA or Wilmington, NC
Looking for a skilled and driven Cloud based Site Reliability Engineer to help ensure the performance, scalability, and reliability of our AWS-based cloud systems. In this role, you'll work closely with engineers and ops teams to boost system stability and developer efficiency through automation, monitoring, and incident management.
Build and manage robust, scalable, and secure AWS-based cloud infrastructure.
Create and support monitoring systems, alert mechanisms, and dashboards to ensure uptime and service health.
Use infrastructure-as-code tools like Terraform (with Terragrunt), CDK, and CloudFormation to automate provisioning and configuration.
Set up and manage CI/CD workflows to facilitate efficient code deployment and enhance development processes.
Take ownership of incident resolution, conduct thorough root cause analysis, and develop long-term solutions to recurring problems.
Partner with engineering teams to fine-tune performance, bolster reliability, and manage cloud costs effectively.
Promote operational excellence and guide architectural decisions for infrastructure enhancements.
Develop and maintain disaster recovery strategies to guarantee system continuity in crisis scenarios.
Required: Bachelor's degree in Computer Science
Preferred: Master's degree in Computer Science or related field
Required: Minimum of 5 years of relevant experience
Preferred: 8 years of experience in a similar role
Required Skills:
Experience with Argo CD and Argo Workflows
Proficiency in infrastructure-as-code: Terraform and Terragrunt
Kubernetes and Linkerd knowledge
In-depth experience with AWS services (EKS, Fargate, Aurora)
Strong background in security and compliance
Containerization tools such as Docker
Monitoring and logging technologies
Scripting or programming language proficiency
Database administration
Source control using Git
Hands-on experience in incident response and management
Preferred Skills:
Familiarity with Datadog
Knowledge of Cloudflare services
Understanding of the mortgage industry
Advanced cloud security tools (e.g., GuardDuty, Security Hub)
Disaster recovery strategy experience
Experience with automation tools like Camunda
Required: AWS Certified Solutions Architect - Associate
Preferred: AWS Certified DevOps Engineer - Professional
Frequently Searched
Related Searches for Selby Jennings
Other Career Options for your interest
Popular Searches in Burgaw, NC