Company introduction
A fast-growing cloud and observability platform is looking to expand its Site Reliability Engineering team in Singapore.
The organisation operates a highly distributed, cloud-native platform across multiple regions globally, supporting mission-critical environments with a strong focus on uptime, automation, and scalability.

You will be joining a lean regional team working closely with a larger engineering function based in China, supporting a modern infrastructure stack built on Kubernetes and multi-cloud architecture.

Job responsibilities

Reporting to the Chief Technology Officer, your role involves:

Owning the reliability, availability, and performance of a globally distributed cloud platform
Designing, building, and maintaining Kubernetes-based infrastructure across multiple cloud environments
Managing infrastructure through code using Terraform, ensuring scalability and consistency
Supporting and optimising multi-cloud environments (AWS and other cloud providers) across regions
Monitoring system health, troubleshoot incidents, and lead production issue resolution
Participating in on-call rotation and support high-availability environments
Working closely with engineering teams across regions to improve system resilience and automation
Performing deep troubleshooting across infrastructure, application, and database layers (including SQL where required)

Job requirements

As a successful candidate, you will have:

Strong hands-on experience with Kubernetes in production environments
Proven experience using Terraform for infrastructure as code in complex environments
Exposure to multi-cloud environments (Mostly AWS; experience with Alibaba Cloud, Tencent Cloud or Huawei Cloud is highly advantageous)
Experience with Cloudflare is also highly advantageous
Solid understanding of cloud infrastructure, networking, and distributed systems
Experience in handling production incidents and working in high-availability environments
Proficiency in scripting (Python or Go preferred)
Ability to troubleshoot across systems, including database-level debugging using SQL
Strong communication skills in both English and Chinese, to work with regional teams
Comfortable with shift work or on-call rotation when required

Why you should join them
You will get the opportunity to join a highly technical environment where SRE plays a critical role in keeping a global, multi-cloud platform reliable and scalable. This is a high-ownership role with direct impact on uptime, performance, and platform resilience, while giving you exposure to modern cloud-native technologies, Kubernetes, Terraform and distributed infrastructure across multiple regions. You will also work closely with teams across Singapore and China, making it a strong fit for engineers who enjoy solving complex infrastructure challenges in a fast-moving, international setup.

JL
Reg No. R1766249
BeathChapman Pte Ltd
Licence no. 16S8112

Share this job

Create Job Alert

Similar Jobs

Follow Us

Quick Links

Office

Subscribe to our Job Alerts

Job Alert Signup

Follow Us

Quick Links

Office Location

Subscribe to our Job Alerts