Site Reliability Engineer Intern at InterIntel Technologies

- Published 3 hours ago

InterIntel Technologies are offering a Site Reliability Engineer Internship in Nairobi! Assist in building and maintaining reliable, scalable, and efficient systems, ensuring top performance for our life-changing services. Learn more and apply today!

InterIntel Technologies is empowering Africa through smart, scalable technology. We build digital infrastructure—from payments to communications—helping African enterprises scale faster, serve better, and grow smarter.

WE ARE HIRING SITE RELIABITY ENGINEER INTERN

Reports to: SENIOR DEVEOPS ENGINEER

Job Summary

The Site Reliability Engineer intern will support in applying software engineering principles to IT operations to ensure the company's platforms are reliable, scalable, observable, and efficient. Their role focuses on automation, monitoring, incident management, infrastructure as code, and measurable reliability targets (SLIs/SLOs) to guarantee high availability and performance across all products.

Duties and Responsibilities

Assist in design, implement, and continuously improve system reliability, availability, and performance by assisting in defining and monitoring SLIs, SLOs, and error budgets across all assigned platforms.
Support in building and managing a robust monitoring and observability framework using Prometheus, Grafana, and Loki to track latency, traffic, errors, system health, and user impact.
Assist in automating infrastructure provisioning, scaling, and configuration management using Infrastructure as Code principles with Terraform and Kubernetes to ensure consistency, scalability, and disaster recovery readiness.
Participate in incident response processes, including detection, escalation, resolution, communication, and conducting blameless postmortems to prevent recurrence.
Assist in reduce manual operational workload through automation, scripting, and process optimization to improve efficiency and release velocity.
Support in ensuring high availability and performance of business-critical systems.
Collaborate with Engineering, Product, and DevOps teams to assist in improving deployment safety, capacity planning, cost optimization, and system scalability.
Support in ensuring high availability and performance of business-critical systems.
Assist in establishing alerting strategies and reliability standards that minimize alert fatigue while ensuring rapid detection and resolution of production issues.

Required Knowledge, Qualification and Experience

Bachelor's Degree in Computer Science, Information Technology, or a related field.
Some exposure in Kubernetes and Cloud networking.
some experience with monitoring and observability tools.
Good exposure managing production systems in cloud environments.
Some exposure in implementing and managing CI/CD pipelines and utilizing tools like Jenkins, GitLab CI/CD, or equivalent.
Some exposure with cloud platforms (AWS, Azure, Google Cloud) and containerization tools like Docker and Kubernetes.
Basic hands-on exposure to monitoring and metrics systems such as Prometheus.
Basic familiarity with dashboarding and visualization tools such as Grafana.
Foundational understanding of log aggregation systems such as Loki.
Familiarity with Linux environments and basic system commands.
Exposure to scripting concepts using Python, Bash, or similar languages.
Foundational knowledge of Artificial Intelligence (Al) and good exposure with Al agents; relevant certifications in Al or related disciplines will be an added advantage.

Deadline: Monday, March 9, 2026

Unlock Your Dream Career with Our Expert CV Makeover

Our professional CV revamp service will give your job search the edge it needs. Crafted by industry experts, your new CV will showcase your talents and land you more interviews.

Upgrade Your Career Today from as low as KES. 600

Related Jobs