Is automatic detection of hidden knowledge an anomaly?

Abstract Background The quantity of documents being published requires researchers to specialize to a narrower field, meaning that inferable connections between publications (particularly from different domains) can be missed. This has given rise to automatic literature based discovery (LBD). Howeve...

Full description

Bibliographic Details
Main Author:	Judita Preiss
Format:	Article
Language:	English
Published:	BMC 2019-05-01
Series:	BMC Bioinformatics
Subjects:	Literature based discovery Anomaly detection Unified medical language system
Online Access:	http://link.springer.com/article/10.1186/s12859-019-2815-4

_version_	1818503286186049536
author	Judita Preiss
author_facet	Judita Preiss
author_sort	Judita Preiss
collection	DOAJ
description	Abstract Background The quantity of documents being published requires researchers to specialize to a narrower field, meaning that inferable connections between publications (particularly from different domains) can be missed. This has given rise to automatic literature based discovery (LBD). However, unless heavily filtered, LBD generates more potential new knowledge than can be manually verified and another form of selection is required before the results can be passed onto a user. Since a large proportion of the automatically generated hidden knowledge is valid but generally known, we investigate the hypothesis that non trivial, interesting, hidden knowledge can be treated as an anomaly and identified using anomaly detection approaches. Results Two experiments are conducted: (1) to avoid errors arising from incorrect extraction of relations, the hypothesis is validated using manually annotated relations appearing in a thesaurus, and (2) automatically extracted relations are used to investigate the hypothesis on publication abstracts. These allow an investigation of a potential upper bound and the detection of limitations yielded by automatic relation extraction. Conclusion We apply one-class SVM and isolation forest anomaly detection algorithms to a set of hidden connections to rank connections by identifying outlying (interesting) ones and show that the approach increases the F 1 measure by a factor of 10 while greatly reducing the quantity of hidden knowledge to manually verify. We also demonstrate the statistical significance of this result.
first_indexed	2024-12-10T21:22:01Z
format	Article
id	doaj.art-58618d737e5641a3892fd8f544b67620
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-10T21:22:01Z
publishDate	2019-05-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-58618d737e5641a3892fd8f544b676202022-12-22T01:33:06ZengBMCBMC Bioinformatics1471-21052019-05-0120S10758010.1186/s12859-019-2815-4Is automatic detection of hidden knowledge an anomaly?Judita Preiss0University of Salford, The School of Computing, Science & EngineeringAbstract Background The quantity of documents being published requires researchers to specialize to a narrower field, meaning that inferable connections between publications (particularly from different domains) can be missed. This has given rise to automatic literature based discovery (LBD). However, unless heavily filtered, LBD generates more potential new knowledge than can be manually verified and another form of selection is required before the results can be passed onto a user. Since a large proportion of the automatically generated hidden knowledge is valid but generally known, we investigate the hypothesis that non trivial, interesting, hidden knowledge can be treated as an anomaly and identified using anomaly detection approaches. Results Two experiments are conducted: (1) to avoid errors arising from incorrect extraction of relations, the hypothesis is validated using manually annotated relations appearing in a thesaurus, and (2) automatically extracted relations are used to investigate the hypothesis on publication abstracts. These allow an investigation of a potential upper bound and the detection of limitations yielded by automatic relation extraction. Conclusion We apply one-class SVM and isolation forest anomaly detection algorithms to a set of hidden connections to rank connections by identifying outlying (interesting) ones and show that the approach increases the F 1 measure by a factor of 10 while greatly reducing the quantity of hidden knowledge to manually verify. We also demonstrate the statistical significance of this result.http://link.springer.com/article/10.1186/s12859-019-2815-4Literature based discoveryAnomaly detectionUnified medical language system
spellingShingle	Judita Preiss Is automatic detection of hidden knowledge an anomaly? BMC Bioinformatics Literature based discovery Anomaly detection Unified medical language system
title	Is automatic detection of hidden knowledge an anomaly?
title_full	Is automatic detection of hidden knowledge an anomaly?
title_fullStr	Is automatic detection of hidden knowledge an anomaly?
title_full_unstemmed	Is automatic detection of hidden knowledge an anomaly?
title_short	Is automatic detection of hidden knowledge an anomaly?
title_sort	is automatic detection of hidden knowledge an anomaly
topic	Literature based discovery Anomaly detection Unified medical language system
url	http://link.springer.com/article/10.1186/s12859-019-2815-4
work_keys_str_mv	AT juditapreiss isautomaticdetectionofhiddenknowledgeananomaly

Is automatic detection of hidden knowledge an anomaly?

Similar Items