Text Analytics to Inform Deviation Root Cause Analysis in Biomanufacturing

In biomanufacturing, product quality and safety are critical and there are many controls in place to ensure that processes are followed within the prescribed operating limits. However, deviations from these processes inevitably occur, sometimes requiring in-depth investigations to determine the caus...

Full description

Bibliographic Details
Main Author: Nersesian, Lois E.
Other Authors: Levi, Retsef
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/144849
Description
Summary:In biomanufacturing, product quality and safety are critical and there are many controls in place to ensure that processes are followed within the prescribed operating limits. However, deviations from these processes inevitably occur, sometimes requiring in-depth investigations to determine the cause and prevent recurrence. Understanding quality trends on the manufacturing line is also critical in preventing quality issues. At Amgen, a leading biotechnology company, results of such investigations are stored long-term but only in a partially structured manner, making it hard to leverage this historical data to enhance deviation investigation efficiency and study long term quality trends. The goal of this project is to use these historical records to draw insights into the investigation process and help increase the efficiency and accuracy of future deviation investigations and overall quality assurance. To achieve this, we use natural language processing tools to derive information from text describing deviations and causal factors. Several methods are explored, namely, unsupervised clustering using machine learning and natural language processing to identify and cluster similar causal factors, explicit text extraction which identifies known key terms such as equipment mentioned in the text, and process-dependent step classification which leverages reference documents describing the manufacturing process to assign records to process steps. The outputs of these methods are presented in a proof-of-concept tool which can be used to assist investigators. Our results indicate that all these methods have benefits and drawbacks but can be used together for maximal insights. Based on the status of each method, we suggest that Amgen work to create a tool to present potential causal factors to investigators immediately, incorporating clustering and text extraction methods after minor refinement, and continue to explore the potential of process-driven methodologies.