Fidelity TalentSource is your destination for discovering your next temporary role at Fidelity Investments. We are currently sourcing for a Site Reliability Engineer to work in Fidelity’s Enterprise Infrastructure Group in Westlake TX or Merrimack NH.
This role will provide a truly predictable customer experience. Under times of market volatility and high volumes, there is an increased expectation of a consistent service level. In Fidelity, we strive to meet this expectation by building reliability into our ecosystem. This will be achieved though defining & implementing practices in Resiliency Engineering, Automation, Observability & Chaos Testing while also engraining a proactive Culture that thinks reliability first design. Solve stack-wide engineering issues related to hardware, software, network, applications, and cloud service providers.
The Role
You will have the opportunity to lead all aspects of production support - readiness, availability and resiliency of critical Applications, Batches & Infrastructure representing various business units while being centrally aligned to the Production Services organization. Offer a plethora of opportunities to augment knowledge across multiple dimensions of Technology at the same time retaining key focus on Cloud Computing (AWS & Azure) & Enterprise tools/solutions like Jenkins, uDeploy, Docker, Kubernetes, Splunk, Datadog, etc.This role will provide a truly predictable customer experience. Under times of market volatility and high volumes, there is an increased expectation of a consistent service level. In Fidelity, we strive to meet this expectation by building reliability into our ecosystem. This will be achieved though defining & implementing practices in Resiliency Engineering, Automation, Observability & Chaos Testing while also engraining a proactive Culture that thinks reliability first design. Solve stack-wide engineering issues related to hardware, software, network, applications, and cloud service providers.
Team
We at the Production Services (EI&O-PS) in Fidelity are a centralized support services organization in the Enterprise Infrastructure & Operations group of Fidelity. EI&O-PS supports 3000+ applications across Fidelity business units providing roster based on call rotation (follow the sun model). Services provided by EI&O-PS include Platform, Application and Batch Support via Incident management, Change management, Environment management, Cloud, UI, Middle Tier & Database Services, Mainframe operations, Release Services & Performance Engineering services.The Expertise You Have
- Bachelor’s degree or equivalent experience or higher in a technology related field (e.g. Engineering, Computer Science, etc.) required, Master’s degree a plus
- 5-8+ years of hands-on experience deploying and/or supporting highly distributed multi-tiered systems at scale.
- Hands-on experience with Public Cloud environments, preferably AWS and Azure. Certifications a plus
- Exposure to basic OS level scripting languages such as Korn/Bash/Jscript
- Experience with container orchestration, preferably with Kubernetes
- Experience operating and implementing distributed & highly concurrent service-based
The Skills You Bring
- Ability to solve application issues on Unix/Linux with J2EE, WebSphere, Tomcat and SQL
- Familiarity with ITIL processes like Incident management, Change/Problem management
- Balancing delivery with ad hoc workloads and re-evaluating priorities
- Solid understanding of Cloud Computing and DevOps concepts including CI/CD pipelines
- Hands on experience with one or more observability tools (Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, etc.)
- Use Datadog, Catchpoint, Splunk & Grafana for Application Observability and monitoring of app & infrastructure
- Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale
- Proven experience in maintaining scalability and resiliency of complex environment.
- Proven experience in implementing advanced observability practices and techniques at scale.
- Provide enterprise Cloud and Platform Engineering support for production environments and ability to participate in on-call rotation to provide solutions.
- Experience in Cloud development (AWS and Azure) and migration skills; Experience with building and operating highly resilient platforms in public cloud environments
- Ability to triage, complete root cause analysis, and be decisive under pressure
- Experience managing and interpreting large datasets using query languages and visualization tools
- Proficient communication skills with an ability to reach both technical and non-technical audience
- Proven experience performing chaos testing to build confidence in the system's capability to withstand turbulent conditions in production
- Strong understanding in API testing tools (SoapUI, Postman)
- Experience managing systems using infrastructure as code tools (IAM, ARM, Terraform, Chef)
- Handle a huge fleet of on-prem servers (including security & patching oversight)
- Handle hundreds of SSL certificates for all applications in scope
- Use Ansible & Python for automating day-to-day activities, Web development with Django, JavaScript

