Clustering of Similar Incident Tickets Using Natural Language Processing

As businesses increasingly rely on digital tools for operational efficiency and value creation, Software Asset Management (SAM) becomes an important business practice. This thesis explores the use of natural language processing (NLP) and clustering algorithms to identify recurring issues affecting s...

Full description

Bibliographic Details
Main Author: Chen, Jackie
Other Authors: Lykouris, Thodoris
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/155983
Description
Summary:As businesses increasingly rely on digital tools for operational efficiency and value creation, Software Asset Management (SAM) becomes an important business practice. This thesis explores the use of natural language processing (NLP) and clustering algorithms to identify recurring issues affecting software applications with the objectives to assess the technical health of applications and to identify opportunities to address software issues that repeatedly plague users. Using a dataset of incident tickets from a business unit of a pharmaceutical company, various machine learning models were designed and tested to identify recurring issues affecting the business' applications. Through a dashboard that visualizes the outputs of the models, the business is provided with insights into recurring issues affecting their digital tools. As validated through user feedback and visual inspection, the model outputs indicate promising results in the clustering of incident tickets, offering valuable insights to users to understand and address recurrent software problems. However, it is important to acknowledge the inherent challenges of unsupervised machine learning. While the results can help enhance business operations, caution is advised regarding the implications to users and the business when models produce unexpected results. This project is another example of the balance between leveraging machine learning for problem-solving and understanding the limitations of the models.