- ITIL Processes
IT service providers implement ITIL processes to ensure their services are delivered in a customer-focused, quality-driven and economical way.
ITIL Processes according to ITIL V3
ITIL V3 (ITIL 2011) organizes the ITIL processes around the five service lifecycle stages: Service Strategy, Service Design, Service Transition, Service Operation, and Continual Service Improvement . Each of the five stages is focused on a specific phase of the service lifecycle.
- Service Strategy
Process Objective: To decide on a strategy to serve customers. Starting from an assessment of customer needs and the market place, the Service Strategy process determines which services the IT organization is to offer and what capabilities need to be developed. Its ultimate goal is to make the IT organization think and act in a strategic manner.
- Service Design
Process Objective: To design new IT services. The scope of the process includes the design of new services, as well as changes and improvements to existing ones.
- Service Transition
Process Objective: To build and deploy IT services. Service Transition also makes sure that changes to services and Service Management processes are carried out in a coordinated way.
- Service Operation
Process Objective: To make sure that IT services are delivered effectively and efficiently. The Service Operation process includes fulfilling user requests, resolving service failures, fixing problems, as well as carrying out routine operational tasks.
- Continual Service Improvement – CSI
Process Objective: To use methods from quality management in order to learn from past successes and failures. The Continual Service Improvement process aims to continually improve the effectiveness and efficiency of IT processes and services, in line with the concept of continual improvement adopted in ISO 20000.
What is ITIL Incident Management?
- ITIL incident management (IM) is the practice of restoring services as quickly as possible after an incident. And it’s a main component of ITIL service support.
ITIL incident management is a reactive process. You can use IM to diagnose and escalate procedures to restore service. So, it’s not a proactive measure.
Common ITIL incident management activities include:
- Detecting and recording incident details
- Matching incidents against known problems
- Resolving incidents as quickly as possible
- Prioritizing incidents in terms of impact and urgency
- Escalating incidents to other teams to ensure timely resolution
§ Why Should I Implement ITIL Incident Management?
1. Maintaining Service Levels
2. Meeting Service Availability Requirements
3. Increasing Staff Efficiency and Productivity
4. Improving User Satisfaction
|12||Critical||An immediate and sustained effort using all available resources until resolved. On-call procedures activated, vendor support invoked.||Immediate action/resolution as soon as possible.|
|9-11||High||Technicians respond immediately, assess the situation, may interrupt other staff working low or medium priority jobs for assistance.||Action within 1 hour/resolution within 1 business day.|
|5-8||Medium||Respond using standard procedures and operating within normal supervisory management structures.||Action within 2 hours/resolution within 2 business days.|
|0-4||Low||Respond using standard operating procedures as time allows.||Action within 2 business days/resolution within 10 business days.|
- ITIL Problem management (PM) is one step ahead of Incident management which performs Root Cause Analysis (RCA) to identify, track and resolve recurring incidents permanently. Problem management prevents incidents from occurring and ultimately aims for no incidents. Problem management can be proactive as well as reactive. Businesses recommend proactive Problem management to prevent incidents and ITIL Problem management process follows specific steps such as:
- Problem detection
- Problem logging
- Investigation & diagnosis
- Resolution – workaround or permanent
P1 = 5-8 hrs
P2 = 8-12 hrs RCA (Root Cause Analysis)
P3 = 48 hrs
P4 = 7 Days
Problem – Network outage
Stakeholder – Employees
Impact – Productivity loss for an hour
Evidence – No internet connection
1. ITIL Change Management (CM)
IT and technology innovation leads to new changes within the organization. To remain competitive, it is crucial for businesses to adapt faster to the changing trends. However, it is important to not interrupt the current working state while implementing these changes. ITIL Change management helps businesses to deploy new changes without any disruption or downtime. ITIL change management follows a standard operating procedure to eliminate any unintended interruptions and includes change assessment, planning and approval.
Change management process is a gatekeeper which ensures minimum risk and impact to the ongoing Infrastructure & Operations. Change management includes pre-release activities such as roll out, back out planning and scheduling of changes. It performs quality control checks to ensure change and release activities are as per planned.
The primary objective of ITIL Change management is to mitigate risk and impact. Change management does the authorization to approve any change to be deployed. It protects the production environment while executing a new change. Following are the objectives of ITIL Change management process.
- Reduction of risk and impact
- Maintenance of current working state
- Communication and approval management
- Effective change planning with optimized resources
- Reduction in number of incidents due to change execution
Scenarios where ITIL Change Management is used:
- Implementing a new data center
- Deploying a bug fix to production environment
- Windows patch
- Replacing ERP service provider
- OS upgrade
There are different types of change requests, or change classes, that are typically managed in different ways:
- Standard changes are changes to a service or to the IT infrastructure where the implementation process and the risks are known upfront. These changes are managed according to policies that are the IT organization already has in place. Since these changes are subject to established policies and procedures, they are the easiest to prioritize and implement, and often don’t require approval from a risk management perspective.
- Normal changes are those that must go through the change process before being approved and implemented. If they are determined to be high-risk, a change advisory board must decide whether they will be implemented.
- Emergency changes arise when an unexpected error or threat occurs, such as when a flaw in the infrastructure related to services needs to be addressed immediately. A security threat is another example of an emergency situation that requires changes to be made immediately.
- Unplanned Down Time
It comes when problems occurs, because we are not planed or prepare for further incidents.
e.g. our college site or govt. Site when all are accessing at same time.
- Scheduled Downtime
It schedule for backup or data processing becase of high load of data.
e.g IRCTC at 12:00 am to 12:30 am midnight.
- Emergency Change:
RCA Opened to problem and problem to change management.
e.g critical situations.
- 4 Major parts of Production support Engineer
- Error Code and Message Type:
- Error 404
- Error 500
- Error 502
- Error 503
- Error 504
- service-level agreement (SLA)
A service-level agreement (SLA) is a contract between a service provider and its internal or external customers that documents what services the provider will furnish and defines the service standards the provider is obligated to meet.
v Why are SLAs important?
Service providers need SLAs to help them manage customer expectations and define the circumstances under which they are not liable for outages or performance issues. Customers can also benefit from SLAs in that they describe the performance characteristics of the service, which can be compared with other vendors’ SLAs, and also set forth the means for redressing service issues — via service credits, for example.
For a service provider, the SLA is typically one of two foundational agreements it has with customers. Many service providers establish a master services agreement to establish the general terms and conditions in which they will work with customers. The SLA is often incorporated by reference into the service provider’s master services agreement. Between the two service contracts, the SLA adds greater specificity regarding the services provided and the metrics that will be used to measure their performance.
Level of Support:
Production support is stream that is supporting the IT systems/applications/ softwares which are currently being used by the end users.
This job has a main objective of ensuring the application or software is up and running as expected .
Support person/team is responsible for receiving incidents and requests from end-users, analyzing these and either responding to the end user with a solution or escalating it to the other IT teams.
There are various levels of production support
Level 1 support – Initial helpdesk which deals the user issues with already scripted solutions and create an incident to assign it to other teams. They will good at handling customers in a polite way.
Level 2 Support – Technical support for the application or software. They know the flow of the application and do deep dive into the issue and fix it if they can if not escalate further. L2 engineers generally have 4 or more years of experience on a specific technology platform
Level 3 Support – They work in defects and enhancements, bug fixing break fix providers etc . These support leaders have specific, deep understanding and expertise in one or two technology platforms (for example, an Oracle database administrator or a Windows Admin). Will be good at technology with sound experience (8+)
Level 4: Product or Vendor support
L4 support refers to product or vendor support and often involves vendor product architects, engineers, software developers, hardware designers. L3 Team will initiate the call and engage L4.
- Monitoring Tools:
- New Relic:
- Instantly understand app performance, dependencies, and bottlenecks
- A complete view of your applications and operating environment
- Start seeing hidden errors in minutes
- Faster incident handling, less finger pointing
- Stay ahead of difficult-to-find issues
New Relic APM has unmatched capabilities for monitoring cloud-based applications, providing a rock-solid website safety net for our customers.”
- Nagios XI
Enterprise Server and Network Monitoring Software
Comprehensive application, service, and network monitoring in a central solution.
Comprehensive IT Infrastructure Monitoring
Proactive Planning & Awareness
Ease of Use
Transform your business with workflow orchestration
BMC Control-M simplifies application workflow orchestration. It makes it easy to define, schedule, manage and monitor workflows, ensuring visibility and reliability, and improving SLAs.
View completed in-process, and predictive job run-times on any device.
v BMC Control-M
Complex application workflows? Problem solved.
- Streamline the orchestration of business applications—delivering better apps faster—by embedding workflow orchestration into your CI/CD pipeline
- Extend Dev and Ops collaboration with a Jobs-as-Code approach
- Simplify workflows across hybrid and multi-cloud environmentswith native AWS and Azure integrations
- Deliver data-driven outcomes faster, managing big data workflows in a scalable way
- Take control of your file transfer operations with intelligent file movement and enhanced visibility.
- What is ticketing? What is a ticketing tool?
Although it can have several meanings, when it comes to customer service, ticketing or ticketing tools are those computer programs that are used for incident management and are ticket-based. … After receiving the call, you could open a “ticket” in the system in order to manage the incident.
Zendesk is a cloud-based help desk management solution offering customizable tools to build customer service portal, knowledge base and online communities. The solution offers a customizable front-end portal, live chat features and integration with applications like Salesforce and Google Analytics. Zendesk is used across a wide range of vertical markets including technology, government, media and retail, from small to large.
1. Jira Service desk Ticketing software Deliver a better service experience
Customers or employees can submit requests with an easy-to-use help center and add Confluence to Jira Service Desk to get an integrated knowledge base. Machine learning intelligently recommends the right service and learns from every interaction, so answers are easy to find.
- BMC Remedy
What is BMC Remedy Incident Management?
The Incident Management module is designed to support this goal. When dealing with incident requests, Incident Management is typically initiated in response to a customer call, a service request, or an automated event. … Integration with BMCAtrium Configuration Management Database (BMC Atrium CMDB).
BMC Helix ITSM is a powerful, people-centric solution that exploits emerging technologies such as AI and machine learning. When you move up from Remedy on-premises to BMC Helix ITSM you gain:
- Predictive service management through auto-classification, assignment, and routing of incidents
- Embedded multi-cloud capabilities to broker incidents, changes, and releases across cloud providers
- Integrations with leading agile DevOps tools such as Jira
- Cognitive email analysis and automated actions on behalf of the user
Operational and deployment efficiencies via containerization
Incident & Problem Management
Create and resolve incidents faster with intelligent, context-aware, and proactive incident matching.
- Integrate all IT service support functions including change, asset, service level, service request, identity, and knowledge management
- Gain direct visibility into business priorities through integration with a single CMDB
- Achieve lower call volumes with intelligent, omni-channel self-service via BMC Helix Digital Workplace
- Align to ITIL® best practices with expert services, comprehensive training, and out-of-the-box ITIL processes
- Project Architecture
2 Tire Architecture:
|Web||Client – Server N/W|
- L1 team checks the status of the deployed applications, make calls to the product users to get clarifications / convey resolutions on the production issues. They also create weekly reports needed by different teams.
- L2 team were responsible to identify the root cause of the issues, work on the tickets created and classify the tickets (such as code fix required, deployment issues, etc.).
- L3 team were fixing the code fix issues and giving support for any critical issues when needed – constituted of developers of the application.
A production support team member gets to learn a lot about entire flow of application.One can learn the “domain” in depth because the work involves a lot of exploration of Functional Solution Documents.
1. what is the process of PAN generation ?
2. How to Hotlist/De-Hotlist process?
3. New user creation, deletion, updation?
4. Generation PIN Customer Authentication file (CAF Process) ?
5. Monitoring & Analyzed the Server and Client side performance.
6. CPU Utilization and Memory Utilization and there alerts?
7. Troubleshooting of customer issues facing?
8. Service delivery within SLA.
Hotlisting is of two types
- Temporary nature
- Permanent nature
In case hotlisting is done as a temporary measure – like card not traceable; PIN number missing etc, such hotlisting can be revoked and the card can be made active once again
In case hotlisting is done as a permanent measure – somebody is withdrawing money from your account using skimmed card
However, bankers now opt for the permanent option only for obvious reasons
Once the card has been blocked and hotlisted, you have to get one duplicate card and nowadays getting duplicate card is very simple and non personalised card is availabale within five to ten minutes
Some banks charge for issue of such new cards
Can anyone share the live issues of application and production support on an e-banking project?
The term e-Banking is very broad. There are wide range of processes like CORE banking(Retail/Corporate), Cards/Loans, Internet Banking, Alerting system, Forex and many more. With my experiences would like to quote a very few on Internet banking perspective.
> User profile and access related
- user asking access for particular transactions but different access level provisioned
- Invalid limits for transactions or number of beneficiaries addition
- For corporate accounts, issues/inquiries of multi level work flow(approval process)
- In case, user has multiple accounts across differnt locations, mismatch in accesses on specific accounts
> Transaction related
- Transactions failure case like amount getting debited but not received by beneficiary/ amount stuck with parking accounts etc.
- Failures in NEFT/RTGS batch messaging
- Transactions failing in core interfacing application
- Issues in reversal of transactions that failed
- Transaction failing/being stuck due to invalid details from end user or invalid application configuration
- Errors in processing the flat files from interfacing applications
- Batches going to zombie state or not scaling up for the incoming data
- Incoming files getting corrupted during data transmission
- Invalid data coming in through incoming data
- Issues in reading/loading the incoming data
- Compute power not scaling with the incoming load
- Improper Load balancing/scaling configurations that affect application performance
- Disk usage/Disk IO/DB response time going to alarming state and affecting application availability
- Application nodes going out of sync
- Infra power/cooling system outage
Application Support service delivery process:
Stability of the application
This can be determined based on following:
- No. of issues reported daily
- No. of on-going enhancements
- No. of pending critical issues
- No. of downtime reported on a particular frequency
Once above factors are understood and discussed, One can determine the following for Application Support project initiation:
- No. of Support Analysts required
- Level of Support Analysts
- Skill sets of Support Analysts
- Support coverage hours
- SLA – TAT (Turn around time) of different service types in Application Support service