Observability Lead
Micron · Hyderabad, India
About this role
Micron is hiring a senior-level Site Reliability Engineer in the software engineering function based in Hyderabad, India. The posting calls out experience with Python, Splunk, ServiceNow, DevOps and roughly 8+ years of relevant work.
- Role
- Site Reliability Engineer
- Function
- software engineering
- Level
- senior
- Track
- Individual contributor
- Employment
- Full-time
- Location
- Hyderabad, India
- Experience
- 8+ years
- Posted
- Apr 20, 2026
More roles at Micron
Job description
from Micron careersOur vision is to transform how the world uses information to enrich life for all.
Micron Technology is a world leader in innovating memory and storage solutions that accelerate the transformation of information into intelligence, inspiring the world to learn, communicate and advance faster than ever.
We are seeking a seasoned Observability Lead to drive the strategy, implementation, and evolution of observability and AIOps capabilities across our enterprise IT landscape. This role will be instrumental in shaping our monitoring, automation, and reliability engineering practices, ensuring seamless visibility into infrastructure, applications, and services.
Key Responsibilities:
- Lead Observability Strategy: Define and execute the observability roadmap aligned with business and IT goals, integrating AIOps and SRE principles.
- Tool Ownership & Integration: Manage and optimize observability tools including OpsRamp, Splunk, AppDynamics, NetBrain, ThousandEyes, and explore new platforms like BigPanda and ServiceNow AIOps.
- Automation Leadership: Drive automation of L1/L2 operational tasks using Python and PowerShell, improving efficiency and reducing manual intervention.
- SRE Adoption: Collaborate with cross-functional teams to implement Site Reliability Engineering (SRE) practices, including SLIs/SLOs, error budgets, and incident response automation.
- Monitoring & Dashboarding: Design and maintain comprehensive dashboards and alerting mechanisms for infrastructure, applications, and network performance.
- Incident & Problem Management: Partner with ITSM teams to enhance incident detection, root cause analysis, and resolution workflows.