Positive and unlabeled learning for anomaly detection
Anomaly detection is of great interest to big data applications but still remains a challenging problem for machine learning-based methods. For unsupervised learning, the performance may not be satisfactory due to the lack of label information while for supervised learning, it is difficult to acquir...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/75883 |
Summary: | Anomaly detection is of great interest to big data applications but still remains a challenging problem for machine learning-based methods. For unsupervised learning, the performance may not be satisfactory due to the lack of label information while for supervised learning, it is difficult to acquire labeled anomaly data for training which is usually rare and diversely distributed.
To address the challenge, we propose a hybrid solution by applying Positive and Unlabeled (PU) Learning for anomaly detection problem. As a semi-supervised method, only normal (positive) data and unlabeled data (could be positive or negative) are required by the proposed method for anomaly detection. We start by using a linear model to extract the most reliable negative instances followed by an iterative self-learning process to update the classifier with different speeds based on the estimated positive class prior. Our proposed method is verified on several benchmark datasets and outperforms existing methods under different experiment settings. |
---|