What is triage in IT?
Triage is a term referring to the assignment of priority levels to tasks or individuals to determine the most effective order in which to deal with them.
Triage originated in a military medical context and is now widely used in information technology (IT) and business environments, where it is an integral part of business process management (BPM). An IT operations department constantly triages issues to decide which problems are most urgent.
How triage in IT works
The fundamental steps in triaging an IT incident vary. However, the basic process involves the following steps:
- Assessment. The initial assessment of the incident identifies the problem.
- Categorization. The incident is categorized in terms of the type of incident and the severity.
- Prioritization. It’s placed in the remediation lineup according to its severity rating and overall importance to the business operations and functionality.
- Assignment. The appropriate person is assigned to fix the issue.
- Closure. A report is filed on the incident.
Triage can be handled manually in situations where the volume of incidents is relatively small. In larger organizations and environments where there are lots of incidents, specialized trouble-ticket systems automate the triage process. Systems such as Atlassian’s Jira can be programmed to assign severity levels and route tickets to specific IT staff.
Why triage is important?
IT departments face a number of problems each day. As a result, management must prioritize issues that present the greatest threat to the organization’s ability to conduct its business and serve customers.
Establishing levels of severity in relation to the overall business and identifying the severity level of each item early in the process ensure the most important problems are solved first. Linking the triage process to the employees who have the requisite skills to address specific issues also moves the process along.
Typically, IT departments have a multilevel triage arrangement. For example, tier 1 issues are the simplest, least critical and easiest to fix. As such, they can be assigned to anyone on the IT staff. They can also wait longer to be acted on.
Incidents rated tier 2 or tier 3 are more complex and significant in their impact on IT and business operations. They must be addressed faster and require a higher level of experience and expertise from the technicians and engineers assigned to them.
Steps to establishing a triage capability
Establishing a triage capability requires addressing the roles of triage, how it will operate, the levels of severity, the need for automated triage technology and funding. Additional characteristics to consider for a triage process include the following:
- Types of events. Clearly identifying what events are covered by the system is critical. These can range from simple password resets to enlisting software engineering teams to troubleshoot complex system issues.
- Skill levels. IT help desks must have several levels of expertise, from employees with limited skills to those with years of experience.
- Channels of support. Identifying who helps with certain issues beyond IT personnel is important. These can include vendors, carriers, consultants and others with the expertise and resources to support remediation.
- Communications. Standard ways of communicating must be established among team members, employees, vendors and other relevant entities.
- Service-level agreements (SLAs). The requirement and performance metrics of all existing SLAs with both customers and vendors must be factored into the triage process.
Examples of triage in IT
Triage is used in various ways in IT, including the following:
- IT operations. Top-priority issues must be dealt with as they arise. Less important issues are attended to when no top-priority issues remain. However, that time might never come, and the least urgent problems might never be dealt with unless they are reassessed at a higher-priority level.
- Email. Messages are designated as urgent when they need an immediate response, less important messages are deferred to a specified future time and others are simply deleted. Email triage applications and mobile apps are available to facilitate the process.
- Agile software development (ASD). Development team requirements are typically triaged at the start of each iteration. An ASD iteration is a short development cycle, so it’s crucial to deal with high-priority requirements quickly to ensure that they are fixed for the next iteration.
- Software testing. Bug triage identifies the code errors that need immediate attention and those that can wait. Bug reports found in software testing that are assessed as low priority might be tolerated indefinitely.
Why automated triage is preferred
In a busy data center, help desks handle hundreds of trouble tickets daily, addressing a number of defects and issues. Automated systems streamline the ticket process and increase help desk efficiency.
These automation systems collect data on the root cause of an event and assign testers or a testing team to analyze it. They provide useful performance data, such as how long it takes to process a ticket to completion and how many triage team members were needed to resolve specific issues. This data is important for use by business analysts and project managers to project staffing needs and secure resources to remediate incidents in a timely fashion.
By contrast, a manual ticketing triage system is prone to human error. Tickets can easily be assigned to someone not trained in the issue or forgotten altogether.
The history of triage
The term triage comes from the French verb trier, meaning to separate out or to sort.
It was first used for medical purposes during World War I, when it referred to the way medics prioritized the treatment of soldiers wounded in battle. Triage is still used in healthcare today, particularly in emergency rooms, to determine the order of treatment for patients.
The term was adopted for business purposes in the 1990s as a way to allocate limited budgetary and other resources to competing needs. IT managers and administrators use the term both as a way to allocate limited general resources and as a methodology for deciding what issues to fix first when systems have multiple problems at the same time.
Learn everything you need to know about incident response plans, teams and tools.