- Production Support Engineer:
Product support engineers handle technical requests filed by end users of a company’. Their primary responsibility is troubleshooting and resolving errors, and throughout their work they must constantly log details for reports and to provide customers with updates.
- Production Support Importance:
One of the most important activities that any enterprise looks to achieve is keeping their systems running 24×7, also termed as High Availability. It is an extremely crucial link seen by business to ensure efficiency. So it becomes all the more important to run Production Support in an efficient way.
- 5 Keys to Unlocking Value From Production Support
1. Proactive Monitoring & Alerting
Proactive monitoring is important to make sure corrective steps are taken to prevent issues like not meeting SLAs (Service Level Agreements). More focus in this area will certainly reduce downstream issue resolution ticket volume. Monitoring does not necessarily means keeping eye on each and every job constantly. It is recommended to look for automations like developing an automated script which will send alerts to the support team periodically. By building checkpoints into the job stream, you’ll minimize your monitoring efforts.
2. Prioritize and Permanently Fix Issues
Applying a “permanent fix” is the mantra the team needs to adopt while fixing the abends. Temporary fixes create a system that is “patched together” and susceptible to inefficiencies. Another issue to implementing fixes is managing prioritization of failed job streams. Priority should be based on the criticality of the streams. Typically upstream applications need to be taken up on priority than down streams. Proactively extrapolating issues encountered to other applications and fixing those as part of preventive measure before even an abend happens, would be an approach that an experienced team would always take.
3. Provide Timely Status Updates
Effective communication involves all of the stakeholders. This is key to successfully managing production support. Open lines of communication automatically reduces pressure and gives more time to fix issues rather than going into explanatory meetings that can last hours.
A best practice in providing updates on critical abend situations is:
- Acknowledge the issue
- Analyze and communicate ETA
- Send status updates every 15-20mins incase issue fixing takes longer. Standard template(s) usage for these updates is recommended as the message is then clearly conveyed.
- Notify all stakeholders on the resolution
Reporting needs to be broken in two parts – operational reporting and performance reporting.
Operational reporting would typically include end of job stream statistics, all issues encountered during the run with their status and finally the SLA status (Met/Missed). It is recommended that such operational reports, weekly and daily are automated to bring in efficiencies.
The performance reporting would include metrics such as “quality of resolution” (first time right), job stream stability improvement suggestions/recommendations, number of jobs vs abend percentage vs number of resources (see below samples). Such a report helps in quantifying the value and ROI of the support service.
Production Support Performance Measurement – Sample Metric Reports :
5. Standard Operating Procedure (SOP) document
Having a detailed standard operating procedure ensures that support is carried out in an expected manner. It standardizes the processes and provides step by step instructions that enable each support team member to perform tasks consistently across the workforce. The SOP document must include:
- Escalation procedures
- A communication framework consisting of communication modes with other support teams and stakeholders
- A listing of critical jobs and job streams with associated SLAs
- The team composition information with associated contact information
- Key infrastructure teams that are needed such as DBAs, Administrators, and Development team leads
- References related to restore/recovery procedures which would come handy especially when there are major installs going
- And basically anything that could come handy during a crisis situation!
As part of the SOP it is important to document the exact shift handover process that is conducted between shifts and the adequate time allocated to these handovers. The recommendation is to follow a standard handover template which will ensure information is not lost in transit.
It’s also a good idea to build a knowledge base of the known issues and how those were handled. Using the SOP document and appropriate training becomes a critical aspect of overall production support management. It is the key to a motivated team ready to take on the rigors of production support.
A quick tip: It is very important to revisit SOP on regular basis to keep it current with the practices followed by the team.
Using the Five Keys to Achieve Return on Investment (ROI)
With experience and expertise in supporting mid-size to large production environments; Bitwise has laid down a production support efficiency framework that ensures definite ROI covering various aspects where the focus is generating value by “Doing More With Less”.
Bitwise Production Support Framework
1. Optimal Staffing Model:
Tailor-made staffing model meeting the specific needs of the client organization ensures cost reductions about 50% to 75%
2. Application Availability:
Innovations, optimizations and automations built through monitoring and altering process, job run times, schedules etc. can ensure meeting of SLA’s up to 99%
3. Application Stability:
Continuous focus on fixing the jobs permanently through corrective and preventive measures, proactive planning for system outages limit the abends and can increase the stability of the application from 98% to 99.5% over a period of time
4. Team Performance:
Team performance is achieved through persistent focus on incorporating the culture of first time right, collaborative approach, continual improvements and rewarding contributions. This ensures increase in team’s productivity to take on more responsibilities.
5. Proactive Risk Identification and Mitigation:
With insight into the environment and proactive approach of production support team ensures identification of risks and mitigation recommendations in terms of application availability, stability and performance.0