Skip to main content
Close Search

Sr Specialist App/Prod Support

Chennai, India

Apply Now

build

Join us to create a new era of connectivity.

"AT&T allows me to work on projects that will be seen by millions of customers."

Megan T. — Sr. Specialist, Software Engineer

"I find it incredibly rewarding to be out and see customers enjoying a product I spent my time perfecting."

Technology

Connect communities with the power of technology.

Innovation is the power to Think Big

We live our values without question or compromise.

A Look at Our Design Team

A day in our UX/UI team.

Revolutionize Business in our Digital Team

Transform how employees and customers connect

Job Description:

Job Summary:

Our Digital Operations team is looking for a Site Reliability Engineer (SRE) who is passionate about the customer experience and has analytical & multi-tasking abilities to thrive in a fast-paced environment. The SRE is responsible for ensuring that, as new features and applications are introduced to production, essential aspects for reliability such as availability, resiliency, latency, efficiency, change management, monitoring, emergency response, and capacity planning are conducted alongside development of the new features/applications. The SRE will develop automation code & scripts to proactively address customer issues, reduce mean time to repair and improve application availability. The position also includes collaborating closely with feature delivery teams as a bridge between development and operations by applying a software engineering mindset to system administration. This position will split time between operations/on-call duties and guiding the development of systems and software that help increase site reliability and performance to deliver business value. The SRE will need intimate knowledge of the current state of datacenter and cloud infrastructure, CI/CD pipeline tools, Kubernetes, Site Reliability Engineering practices, and ability to implements the plan for desired future state. Attention to detail and strong analytical skills are required, along with a “Customer-First” attitude!

Resource should also be able to take up duty of the Incident Commander where in he/she drives the Outage call with the technical skills he/she possess.

Responsibilities and Day-to-Day View

Fix support escalation issues: Optimize on-call rotations and processes - Improve system reliability through the optimization of on-call processes. Add automation and context to alerts – leading to better real-time collaborative response from on-call responders. Additionally, update runbooks, tools and documentation to help prepare on-call teams for future incidents.

Document “tribal” knowledge - Gain exposure to systems in both staging and production, and take part in work with software development, support, IT operations and on-call duties – to build up historical knowledge over time. Instead of silo-ing this knowledge, ensure constant upkeep of documentation and runbooks to ensure that teams get the information they need right when they need it.

Conducting post-incident reviews - Thorough and transparent post-incident reviews to keep teams honest and ensure that everyone is conducting post-incident reviews, documenting their findings and taking action on their learnings. Take action items for building or optimizing parts of the SDLC or incident lifecycle to bolster reliability of the service.

• Develop automation for mission critical applications using scripts, programs

• Provide customer impact analysis and troubleshoot complex issues using domain knowledge of AT&T Sales & Ordering flows, applications, and downstream interfaces

• Support APIs in K8s environment

• Contribute to design and implementation of new system layers utilizing principles of high-complexity compute environments.

• Provide on-call support for Production customer facing issues

• Work with developers, environment teams to identify necessary resources and remove constraints to increase application availability.

Roles and Responsibilities:

• 16 x 7 Production support and second level trouble shooting of incidents for mission critical high-performance applications

• 16 x 7 second level outage response for mission critical high-performance applications

• 1 x 7 Application performance monitoring, troubleshooting and corrective actions for mission critical high-performance applications

Shift timing (if any):  Rotational shifts

Location: Hyderabad & Chennai @ Bangalore

Primary / Mandatory skills:

• Overall Experience: -7+ experience performing Production Support for Mission Critical, high performance applications
• 4+ years of experience using Docker, Kubernetes and Cloud environments preferably Azure
• Strong experience in Unix, Networking and troubleshooting knowledge, Docker, Kubernetes and Cloud environments
• Experience in Java, Python, Shell Scripts
• Experience in building and leveraging automated CI and CD pipelines using technologies such as Azure DevOps Server, Jenkins, Maven, Ansible, Chef, SonarQube, Puppet, etc
• Experience in Relational & NoSQL databases like Oracle & Cassandra. Excellent knowledge of SQL: Excellent written and verbal English communication skills to work in a Global team

Knowledge of Java, ReactJS, Spring & Spring Boot framework, microservices & RESTful API architecture

Secondary / Desired skills:
• Agile, Lean Agile and/or Scaled Agile methodologies
• Familiarity with version control systems (Git, Bitbucket) and modern version control for use in continuous deployments
• Experience with visualization tools like Kibana and Grafana (EFK stack experience preferred)

Additional information (if any): Willing to work in Shift Duties, Willingness to learn is very important as AT&T offers excellent environment to learn Digital Transformation skills such as cloud, Big data, AI, Full stack etc.
Education Qualification: Bachelor’s/ Master’s degree in computer science or related field
Certifications (if any specific): Any Certification related to Primary / Mandatory Skills
• Kubernetes Certified Engineer or equivalent certification
• Azure / AWS certification

Experience:
• 7+ years of experience performing Production Support for Mission Critical, high performance applications (eCommerce experience preferred)
• 4+ years of experience using Docker, Kubernetes, and Cloud environments preferably Azure
• Solid understanding and experience in Application Performance Monitoring tools like Dynatrace, AppDynamics, Introscope, etc.
• 4+ years of strong Unix, Networking and troubleshooting knowledge
• 4+ years of experience in Customer Experience Analytics tool like Quantum Metric or TeaLeaf
• 4+ years of experience in Relational & NoSQL databases like Oracle & Cassandra. Excellent knowledge of SQL.
• 4+ years of experience J2EE applications and an application server like WebLogic, WebSphere or JBoss
• 2+ years of experience in Java, Python, Shell scripting
• Experience with visualization tools like Kibana and Grafana (EFK stack experience preferred)
• Experience mentoring & training others
• Experience with Site Reliability Engineering preferred
• Experience working in a large scale technically diverse organization
• Experience with web-based applications, http, https, SSL/TLS
• Should have strong understanding of security principles

AT&T is leading the way to the future – for customers, businesses, and the industry. We're developing new technologies to make it easier for our customers to stay connected to their world. Together, we’ve built a premier integrated communications and entertainment company and an amazing place to work and grow. Team up with industry innovators every time you walk into work, creating the world you always imagined. Ready to #transformdigital with us? Apply now!

Weekly Hours:

40

Time Type:

Regular

Location:

Hyderabad, Andhra Pradesh, India

It is the policy of AT&T to provide equal employment opportunity (EEO) to all persons regardless of age, color, national origin, citizenship status, physical or mental disability, race, religion, creed, gender, sex, sexual orientation, gender identity and/or expression, genetic information, marital status, status with regard to public assistance, veteran status, or any other characteristic protected by federal, state or local law. In addition, AT&T will provide reasonable accommodations for qualified individuals with disabilities.



Job ID R-33578-2 Date posted 08/27/2024
Apply Now

Benefits

Invested in your satisfaction and continued success.

We take care of our own here (hint: that could be you). Our benefits and rewards mean we cover some of your biggest needs with some of the coolest offerings. We already think we’re a pretty great place to work. We’re just trying to rack up some bonus points.

Let’s start with the big one: Your work gets rewarded with competitive compensation and benefits. It really does pay to be on our team.

Compensation

When it comes to priorities, we know family tops the list. For the moments that matter the most, you'll be there for them, and we'll be here for you.

Family Leave

Paid Time Off

Our people have class. Literally. We can help you out on approved education costs with our tuition assistance plan.

Tuition
Assistance

Here’s another reason to breathe easy: You and your family get access to excellent medical, dental and vision insurance options.

Insurance Options

Wanna make your friends really jealous? You’ll get discounted access to the latest and greatest AT&T products and services — plus other awesome items, like tickets to live events.

Discounts

You strike us as an over-achiever (don’t worry, it’s a compliment). Our training and development programs are your ticket to expert status in your job.

Training & Development

When the day comes that you get some much needed R&R (not that you’d ever want to leave #LifeAtATT) you’ll know your future is set with the AT&T Retirement Savings Plan (ARSP).

Savings

Give back to your community and connect with colleagues through social and team-building events, and annual paid time off for volunteer efforts of your choice.

Community & Team Events

Wellness resources and incentives to help you prioritize your health and wellbeing and be your best self inside and outside of work.

Total Wellbeing

The Hiring Process

Step 1

Complete a quick application online and check your status often.

Step 2

Virtual or in-person
Interviews

Dress professionally and ensure good WiFi interviewing virtually.

Step 3

Conditional
Job Offer

After a background check, you're part of the team.

Step 4

Welcome! Onboarding
and Training Begins

Our training and certification programs set you up for success.

Here are similar jobs, or

New Search

Discover more at AT&T

Sign up for job alerts, updates and more.

Interested InSelect a job category from the list of options. Search for a location and select one from the list of suggestions. Finally, click “Add” to create your job alert.

Back to top