Stack Digital
Windows SRE
Job Location
London, United Kingdom
Job Description
Job Title: Windows SRE Work Arrangement: 100% Office Location: Ropemaker Place, 25 Ropemaker St, London EC2Y 9LY Duration of Assignment: 6 months Start Date: 02 Feb 2025 Number of Persons Required: 1 Role Description: Site Reliability Engineering (SRE) is responsible for delivering continuous improvement, automation, and self-service offerings to operational teams across Bank EMEA and Securities International. Ensure reliability and efficiency of infrastructure through the delivery of common, repeatable tools and processes that significantly reduce operational toil. Act as a member of the L3 Engineering team, providing subject matter expertise and ultimate escalation. Key Responsibilities: Develop software to make infrastructure services self-managing and self-service. Deliver continuous service improvement by implementing Infrastructure as Code. Eliminate manual, repetitive, automatable, tactical tasks that lack value. Develop proactive monitoring solutions that alert on symptoms rather than outages. Perform detailed root cause analysis (RCAs) on incidents and outages to prevent recurrence. Identify technical debt and collaborate with application teams to create remediation plans. Liaise with Infrastructure Control and IT Risk teams to address audit requests. Identify cost-saving and optimization opportunities across the group. Define SLOs (Service Level Objectives) to meet availability and latency goals. Maintain infrastructure to ensure high availability, reliability, security, and performance. Key Skills/Knowledge/Experience: Exceptional expertise in Microsoft Windows Server internals and related technologies. Proficiency in managing and maintaining Active Directory, DHCP, DNS, LDAP, and Kerberos. Extensive experience in hardware performance monitoring and tuning complex, low-latency systems. Knowledge of Agile, Site Reliability Engineering (SRE), and DevOps principles and practices. Proficient in scripting and programming languages such as PowerShell, Python, and C#. Strong understanding of Backup and Recovery processes and procedures. Advanced knowledge of Clustering, High-Availability, Replication, and Disaster Recovery techniques. Excellent performance tuning skills and deep knowledge of system internals, performance counters, and measurement tools. Familiarity with "Infrastructure as Code" principles and practices. Experience with CI/CD principles and tools such as Git, Ansible, Terraform, and TeamCity. Person Specification: Excellent communication and interpersonal skills. Ability to handle pressure during outages and systematically resolve issues. Strong problem-solving skills. Results-driven with a strong sense of accountability. Proactive and motivated approach. Ability to operate with urgency and prioritize work effectively. Structured and logical approach to tasks. Attention to detail and accuracy. Ability to perform well under pressure. Skill in managing constructive conflict effectively. Capable of managing large workloads and tight deadlines. Ability to explain complex technical concepts to non-technical audiences at all levels.
Location: London, GB
Posted Date: 1/20/2025
Location: London, GB
Posted Date: 1/20/2025
Contact Information
Contact | Human Resources Stack Digital |
---|