Production Support Engineer

A Production Support Engineer is responsible for maintaining the stability and performance of an organization's production systems. This role involves monitoring systems, responding to incidents, troubleshooting issues, and ensuring that services are restored quickly when problems occur. Production Support Engineers work closely with development and operations teams to identify and resolve any issues that arise in production environments. They are essential in industries such as technology, media, and finance, where uninterrupted service is critical.

Production Support Engineers often use a combination of technical skills and problem-solving abilities to diagnose and fix issues, whether they are related to hardware, software, or network systems. Their role is crucial in maintaining the reliability of systems that support key business functions.

Skills
Recruitment Bullet

Incident Management

Recruitment Bullet

Troubleshooting

Recruitment Bullet

Scripting

Responsibilities

  • Job Title: Production Support Engineer
  • Job Summary: We are seeking a highly skilled Production Support Engineer to join our dynamic IT team. In this role, you will be responsible for maintaining the stability and performance of our production systems, ensuring that our services remain available and reliable. You will monitor systems, respond to incidents, troubleshoot issues, and work closely with our development and operations teams to resolve problems quickly and efficiently. The ideal candidate will have a strong background in production support, experience with incident management, and a proactive approach to problem-solving.
  • Requirements:
    • Bachelor’s degree in Computer Science, Information Technology, or a related field.
    • 3+ years of experience in a production support or similar role.
    • Proficiency in using production support tools such as Nagios, Splunk, or Datadog.
    • Strong understanding of incident management processes and best practices.
    • Experience with scripting languages (e.g., Python, Bash) to automate tasks and improve efficiency.
    • Excellent troubleshooting skills with a strong attention to detail.
    • Ability to work in a fast-paced environment and manage multiple priorities effectively.
    • Strong communication skills to collaborate with cross-functional teams and provide clear updates to stakeholders.
  • Responsibilities:
    • Monitor production systems and applications to ensure high availability and performance.
    • Respond to incidents and alerts, diagnosing and resolving issues in a timely manner.
    • Collaborate with development and operations teams to implement fixes and improvements.
    • Conduct root cause analysis for critical incidents and implement preventive measures.
    • Automate repetitive tasks using scripting languages to improve efficiency and reduce manual work.
    • Maintain documentation of production support procedures, incidents, and resolutions.
    • Provide on-call support as needed, including after-hours and weekends, to address critical issues.
  • Must-Have Skills:
    • Production Support Tools: Proficiency in tools like Nagios, Splunk, Datadog, or similar for monitoring and managing production environments.
    • Incident Management: Experience in managing and resolving incidents, with a focus on minimizing downtime and impact on business operations.
    • Troubleshooting: Strong problem-solving abilities to diagnose and fix issues in production systems.
    • Scripting: Ability to write and use scripts in languages such as Python, Bash, or PowerShell to automate tasks and enhance system performance.
  • Soft Skills:
    • Problem-Solving: Ability to quickly analyze problems, identify root causes, and implement effective solutions.
    • Communication Skills: Strong verbal and written communication skills to effectively collaborate with technical and non-technical stakeholders.
    • Organizational Skills: Ability to manage multiple tasks and priorities in a fast-paced environment.
    • Time Management: Capable of managing time effectively to ensure that incidents are resolved quickly and efficiently.
    • Attention to Detail: Strong attention to detail to ensure that systems are functioning optimally and incidents are thoroughly investigated.
  • Hard Skills:
    • Production Support Tools: Expert knowledge of monitoring and support tools.
    • Incident Management: Deep understanding of incident management frameworks and best practices.
    • Troubleshooting: Proven ability to troubleshoot complex technical issues.
    • Scripting: Proficiency in scripting for automation and system management.

Can't find a role your hiring for?

Request a role and we'll help you connect with Whitecarrot.io team

Request a Role