A fast-growing cloud and observability platform is looking to expand its Site Reliability Engineering team in Singapore.
The organisation operates a highly distributed, cloud-native platform across multiple regions globally, supporting mission-critical environments with a strong focus on uptime, automation, and scalability.
You will be joining a lean regional team working closely with a larger engineering function based in China, supporting a modern infrastructure stack built on Kubernetes and multi-cloud architecture.
Job responsibilities
Reporting to the Chief Technology Officer, your role involves:
- Owning the reliability, availability, and performance of a globally distributed cloud platform
- Designing, building, and maintaining Kubernetes-based infrastructure across multiple cloud environments
- Managing infrastructure through code using Terraform, ensuring scalability and consistency
- Supporting and optimising multi-cloud environments (AWS and other cloud providers) across regions
- Monitoring system health, troubleshoot incidents, and lead production issue resolution
- Participating in on-call rotation and support high-availability environments
- Working closely with engineering teams across regions to improve system resilience and automation
- Performing deep troubleshooting across infrastructure, application, and database layers (including SQL where required)
Job requirements
As a successful candidate, you will have:
- Strong hands-on experience with Kubernetes in production environments
- Proven experience using Terraform for infrastructure as code in complex environments
- Exposure to multi-cloud environments (Mostly AWS; experience with Alibaba Cloud, Tencent Cloud or Huawei Cloud is highly advantageous)
- Experience with Cloudflare is also highly advantageous
- Solid understanding of cloud infrastructure, networking, and distributed systems
- Experience in handling production incidents and working in high-availability environments
- Proficiency in scripting (Python or Go preferred)
- Ability to troubleshoot across systems, including database-level debugging using SQL
- Strong communication skills in both English and Chinese, to work with regional teams
- Comfortable with shift work or on-call rotation when required
Why you should join them
You will get the opportunity to join a highly technical environment where SRE plays a critical role in keeping a global, multi-cloud platform reliable and scalable. This is a high-ownership role with direct impact on uptime, performance, and platform resilience, while giving you exposure to modern cloud-native technologies, Kubernetes, Terraform and distributed infrastructure across multiple regions. You will also work closely with teams across Singapore and China, making it a strong fit for engineers who enjoy solving complex infrastructure challenges in a fast-moving, international setup.
JL
Reg No. R1766249
BeathChapman Pte Ltd
Licence no. 16S8112





