Overview
The SRE (Site Reliability Engineering) Practitioner course introduces ways to economically and reliably scale services in an organisation. It explores strategies to improve agility, cross-functional collaboration, and transparency of health of services towards building resiliency by design, automation and closed loop remediations.
The course aims to equip participants with the practices, methods, and tools to engage people across the organisation involved in reliability through the use of real-life scenarios and case stories. Upon completion of the course, participants will have tangible takeaways to leverage when back in the office such as implementing SRE models that fit their organisational context, building advanced observability in distributed systems, building resiliency by design and effective incident responses using SRE practices.
The course is developed by leveraging key SRE sources, engaging with thought-leaders in the SRE space and working with organisations embracing SRE to extract real-life best practices and has been designed to teach the key principles & practices necessary for starting SRE adoption.
Course Description & Learning Outcomes
SRE Anti-patterns ○ SRE in a distributed ecosystem ○ Avoiding SRE antipatterns
SLO is a proxy for customer happiness ○ What has changed with SLOs? ○ SLIs and system boundaries ○ Error Budgets, velocity and risk
Building secure and reliable systems ○ Non-Abstract Large Scale Design ○ Fault-tolerant designs ○ Designing for security, resiliency, scalability and changing landscapes
Full-stack observability ○ Pillars of Observability ○ Observability MELT ○ Using Open Telemetry
Platform Engineering and AIOps ○ Platform-centric approaches ○ Using DataOps and AIOps to improve resiliency ○ AIOps Simple Recipe
SRE & Incident Response Management ○ Incident Command Framework ○ OODA Loop ○ SRE and closed-loop remediation ○ AI/ML and Swarming for better incident management
Chaos Engineering ○ Chaos Engineering Defined ○ Myths of Chaos ○ Chaos Engineering Experiments and Resources ○ Game Day Basics and Exercises
SRE is the purest form of DevOps ○ Key Principles of SRE ○ Metrics for Success ○ SRE Execution Models ○ Culture and behavioural skills ○ Transformations and SRE
By completing this course, the following Learning Outcomes (LO) will be achieved:
LO1: Curate information for user guides and training materials of infrastructure administrative activities to meet Service Level Objectives.
LO2: Manage infrastructure configuration and support activities for secure and reliable systems.
LO3: Diagnose underlying technical problems causing disruptions guided by Observability.
LO4: Create plans for infrastructure upgrades and propose improvements based on user needs.
LO5: Manage technical issues within an agreed timeframe utilising Site Reliability Engineering, Incident Response Management and Problem Management.
LO6: Implement tests of infrastructure systems to evaluate the impact of potential upgrades and updates using Chaos Engineering.
Recommended Prerequisites
The target audience for this course are professionals including IT Operations, Site Reliability Engineers, IT Operations, Business Managers and Stakeholders, Change Agents, Consultants, DevOps Practitioners, IT Directors/Managers/Team Leaders, Product Owners, Scrum Masters, Software Engineers, System Integrators, Tool Providers
Pre-course instructions
Certificate of Attendance from Sapience Consulting: Upon meeting at least 75% attendance and passing the assessment(s), participants will receive a Certificate of Attendance from Sapience Consulting.
The following information are relevant for candidates who are seeking SSG-funding support for the course:
Assessments Candidates must pass all prescribed tests/assessments and attain 100% competency to be eligible for funding support. ○ Mode of Assessment: Written Assessment, Case Study Assessment.
Statement of Attainment (SOA) from SkillsFuture Singapore: After passing the assessment(s), you’ll receive a SkillsFuture Singapore Statement of Attainment (SOA) certifying that you have achieved the following Competency Standard(s): ○ ICT-OUS-3007-1.1 – Infrastructure Support-3.
Schedule
End Date: 25 Nov 2026, Wednesday
3 days, 9am – 6pm
Location: 243 Beach Rd, #02-01 , 189754Agenda
Day/Time | Agenda Activity/Description |
---|---|
Start Date: 23 November | 9am to 6pm |
End Date: 25 November | 9am to 6pm |
Pricing
Course fees: Eligible applicants may receive up to 70% off course fees through SkillsFuture Singapore (SSG) and SkillsFuture Credit (SFC) funding schemes.
Skills Covered
PROFICIENCY LEVEL GUIDE
Beginner: Introduce the subject matter without the need to have any prerequisites.
Proficient: Requires learners to have prior knowledge of the subject.
Expert: Involves advanced and more complex understanding of the subject.
- Cybersecurity (Proficiency level: Beginner)
Speakers
Trainer's Profile:
Feisal Ismail, Principal Consultant, Sapience Consulting Pte Ltd
Extensive technical and managerial experience supporting organisations across government, pharmaceutical, banking and financial industries. Well-versed in creating and leading high-performing teams through effective work methodologies that leverages on technology and best practices to achieve organisational objectives. Communicative, articulate and a firm believer in sharing both knowledge and experience. Highly organised and relish delivering superlative results in high-pressure environments. Believes in providing quality advice grounded in a sapient application of experience, best practices and an acute understanding of real-world realities. Clients can count on me – my word is my bond.