Job Description
Responsibilities
- Design, implement, and manage CI/CD pipelines for automated builds, testing, and deployments.
- Maintain and optimize infrastructure as code (IaC) using tools like Terraform, Ansible, or CloudFormation.
- Manage cloud infrastructure (AWS, Azure, or GCP) for high availability and scalability.
- Implement and monitor container orchestration platforms (Kubernetes, Docker, EKS, AKS, GKE).
- Ensure system reliability through logging, monitoring, and alerting solutions (Prometheus, Grafana, ELK/EFK, CloudWatch).
- Drive automation initiatives to reduce manual effort and improve system efficiency.
- Collaborate with development, QA, and security teams to enable DevSecOps practices.
- Troubleshoot production issues, perform root cause analysis, and apply permanent fixes.
- Contribute to disaster recovery and backup planning.
- Mentor junior engineers and share best practices within the team.
Required Skills & Qualifications
- 8-12 years of proven experience in DevOps, Site Reliability Engineering, or Cloud Infrastructure roles.
- Strong expertise in cloud platforms: AWS / Azure / GCP.
- Hands-on experience with CI/CD tools: Jenkins, GitLab CI, GitHub Actions, ArgoCD, etc.
- Proficiency in Infrastructure as Code (IaC): Terraform, Ansible, Helm.
- Solid understanding of Kubernetes & containerization.
- Knowledge of networking, load balancing, security, and firewalls in cloud environments.
- Expertise in monitoring, logging, and observability.
- Scripting/programming skills in Python, Bash, or Go.
- Familiarity with DevSecOps practices and security compliance standards.