Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study

Objectives: Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value represe...

Full description

Bibliographic Details
Main Authors: Mujtaba, Ghulam, Shuib, Liyana, Raj, Ram Gopal, Rajandram, Retnagowri, Shaikh, Khairunisa
Format: Article
Published: Elsevier 2018
Subjects:
_version_ 1825721642259578880
author Mujtaba, Ghulam
Shuib, Liyana
Raj, Ram Gopal
Rajandram, Retnagowri
Shaikh, Khairunisa
author_facet Mujtaba, Ghulam
Shuib, Liyana
Raj, Ram Gopal
Rajandram, Retnagowri
Shaikh, Khairunisa
author_sort Mujtaba, Ghulam
collection UM
description Objectives: Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction. Methods: For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall. Results: From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier. Conclusion: Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques.
first_indexed 2024-03-06T05:53:27Z
format Article
id um.eprints-21197
institution Universiti Malaya
last_indexed 2024-03-06T05:53:27Z
publishDate 2018
publisher Elsevier
record_format dspace
spelling um.eprints-211972019-05-09T07:12:01Z http://eprints.um.edu.my/21197/ Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study Mujtaba, Ghulam Shuib, Liyana Raj, Ram Gopal Rajandram, Retnagowri Shaikh, Khairunisa QA75 Electronic computers. Computer science R Medicine Objectives: Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction. Methods: For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall. Results: From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier. Conclusion: Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques. Elsevier 2018 Article PeerReviewed Mujtaba, Ghulam and Shuib, Liyana and Raj, Ram Gopal and Rajandram, Retnagowri and Shaikh, Khairunisa (2018) Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study. Journal of Forensic and Legal Medicine, 57. pp. 41-50. ISSN 1752-928X, DOI https://doi.org/10.1016/j.jflm.2017.07.001 <https://doi.org/10.1016/j.jflm.2017.07.001>. https://doi.org/10.1016/j.jflm.2017.07.001 doi:10.1016/j.jflm.2017.07.001
spellingShingle QA75 Electronic computers. Computer science
R Medicine
Mujtaba, Ghulam
Shuib, Liyana
Raj, Ram Gopal
Rajandram, Retnagowri
Shaikh, Khairunisa
Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study
title Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study
title_full Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study
title_fullStr Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study
title_full_unstemmed Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study
title_short Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study
title_sort prediction of cause of death from forensic autopsy reports using text classification techniques a comparative study
topic QA75 Electronic computers. Computer science
R Medicine
work_keys_str_mv AT mujtabaghulam predictionofcauseofdeathfromforensicautopsyreportsusingtextclassificationtechniquesacomparativestudy
AT shuibliyana predictionofcauseofdeathfromforensicautopsyreportsusingtextclassificationtechniquesacomparativestudy
AT rajramgopal predictionofcauseofdeathfromforensicautopsyreportsusingtextclassificationtechniquesacomparativestudy
AT rajandramretnagowri predictionofcauseofdeathfromforensicautopsyreportsusingtextclassificationtechniquesacomparativestudy
AT shaikhkhairunisa predictionofcauseofdeathfromforensicautopsyreportsusingtextclassificationtechniquesacomparativestudy