Senior System Administrator
Petaling Jaya, 10, MY, 47820
Job Summary
We are seeking a skilled and proactive L3 System Administrator to join our support team. This role provides advanced technical support within a secure, air-gapped sovereign cloud environment, bridging the gap between operation or engineering teams. You will be responsible for diagnosing complex issues, maintaining operational stability, and contributing to continuous improvement of support processes in a restricted system infrastructure.
Key Responsibilities
Advanced Incident Troubleshooting & Resolution
- Serve as the escalation point for complex incidents related to the infrastructure system
- Apply deep IT knowledge to perform thorough root cause analysis.
- Collaborate with team for the problem resolution
- Leverage diagnostic tools and scripting (e.g., Bash, Python) to resolve infrastructure, application, issues efficiently and in compliance with SOPs and security policies.
Operational Support, Monitoring & Maintenance
- Monitor and analyze system health across platforms using observability tools (e.g., Prometheus, Grafana)
- Perform regular maintenance tasks such as patching, configuration updates, and backup verifications aligned with disaster recovery (DR) protocols.
- Execute deployment activities securely within maintenance windows, applying change management best practices.
Knowledge Base, SOPs & Mentorship
- Maintain and expand documentation for advanced troubleshooting across systems including KMS, IAM, networking, containers, and DR processes.
- Continuously refine SOPs based on incident post-mortems, security reviews, and evolving infrastructure standards
- Mentor L1 and L2 engineers.
Compliance, Security & Governance
- Operate within stringent security protocols in air-gapped, ensuring zero compromise in data handling.
- Support identity and access management practices by enforcing policy compliance and responding to access-related incidents.
- Proactively identify anomalies and raise potential security concerns using defined channels, in alignment with threat management procedures.
Qualifications
- Bachelor's degree in Computer Science, Information Technology, or related field.
- 5 years of experience in infrastructure operations, with at least 2 years in a Level 2 support role.
- Proficiency in virtualization technologies and cloud-based or on-premise virtual machines.
- Strong knowledge of Linux and/or Windows systems administration.
- Proficient in using monitoring and observability tools such as Prometheus, Grafana, or any renowned alerting platforms
- Experience with logging and incident response systems (e.g., ServiceNow, Freshworks).
- Practical knowledge of backup and disaster recovery (DR) processes and technologies.
- Ability to troubleshoot hardware, virtual machines, and cloud environments.
- Excellent problem-solving and communication skills.
- Willingness to work odd hours or during maintenance windows as needed.
Skills & Abilities
- Experience working in air-gapped, high-security, or restricted environments.
- Familiarity with Kubernetes, containerization, or service orchestration.
- ITIL certification or familiarity with IT service management (ITSM) processes is an added advantage
- Certification in Linux or Windows Server operation system or in any cloud technologies
- Exposure to security auditing and compliance procedures.
Expected Minimum Years of Experience
5 years of experience in infrastructure operations, with at least 2 years in a Level 2 support role.