HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional Attributes
Additive key performance indicators (KPIs) (such as page view (PV), revenue, and error count) with multi-dimensional attributes (such as ISP, Province, and DataCenter) are common and important in monitoring metrics in Internet companies. When an anomaly happens to an overall KPI, it is critical but...
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2018-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8288614/ |
_version_ | 1818603454844633088 |
---|---|
author | Yongqian Sun Youjian Zhao Ya Su Dapeng Liu Xiaohui Nie Yuan Meng Shiwen Cheng Dan Pei Shenglin Zhang Xianping Qu Xuanyou Guo |
author_facet | Yongqian Sun Youjian Zhao Ya Su Dapeng Liu Xiaohui Nie Yuan Meng Shiwen Cheng Dan Pei Shenglin Zhang Xianping Qu Xuanyou Guo |
author_sort | Yongqian Sun |
collection | DOAJ |
description | Additive key performance indicators (KPIs) (such as page view (PV), revenue, and error count) with multi-dimensional attributes (such as ISP, Province, and DataCenter) are common and important in monitoring metrics in Internet companies. When an anomaly happens to an overall KPI, it is critical but challenging to localize the root cause, which is one (or more) combination of attribute values in multiple dimensions. For example, is the total PV decrease caused by the PV decrease from “Beijing”or “China Mobile in Beijing”, or “Beijing and Shanghai”? However, this task is very challenging for two major reasons. First, the PVs of different combinations are interdependent; thus, the PV anomalies at the root cause can cause the changes of many other PVs at different aggregation levels. Second, there could be tens of thousands of combinations to investigate in multi-dimensional attribute space. It is a difficulty to find the root cause from a huge search space. To address the first challenge, our approach HotSpot uses a novel potential score based on the ripple effect for anomaly propagation that we reveal. To address the second challenge, HotSpot adopts the Monte Carlo Tree Search algorithm and a hierarchical pruning strategy. Using the real-world data from a top global search engine, we show that HotSpot achieves a great improvement on effectiveness and robustness, i.e., 95% of all types of root cause cases using HotSpot (compared with only 15% using existing approaches) achieves an F-score over 90%. Operational experiences show that HotSpot can reduce the localization time from more than 1 h in manual efforts to less than 20 s. |
first_indexed | 2024-12-16T13:23:26Z |
format | Article |
id | doaj.art-5b0cf8b9121e42cdbba61b7d560cf337 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-16T13:23:26Z |
publishDate | 2018-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-5b0cf8b9121e42cdbba61b7d560cf3372022-12-21T22:30:17ZengIEEEIEEE Access2169-35362018-01-016109091092310.1109/ACCESS.2018.28047648288614HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional AttributesYongqian Sun0https://orcid.org/0000-0003-0266-7899Youjian Zhao1Ya Su2Dapeng Liu3Xiaohui Nie4Yuan Meng5Shiwen Cheng6Dan Pei7Shenglin Zhang8Xianping Qu9Xuanyou Guo10Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, ChinaTsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, ChinaTsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, ChinaDepartment of Intelligent Operation, Baidu, Inc., Beijing, ChinaTsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, ChinaTsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, ChinaTsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, ChinaTsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, ChinaSchool of Software, Nankai University, Tianjin, ChinaDepartment of Intelligent Operation, Baidu, Inc., Beijing, ChinaDepartment of Intelligent Operation, Baidu, Inc., Beijing, ChinaAdditive key performance indicators (KPIs) (such as page view (PV), revenue, and error count) with multi-dimensional attributes (such as ISP, Province, and DataCenter) are common and important in monitoring metrics in Internet companies. When an anomaly happens to an overall KPI, it is critical but challenging to localize the root cause, which is one (or more) combination of attribute values in multiple dimensions. For example, is the total PV decrease caused by the PV decrease from “Beijing”or “China Mobile in Beijing”, or “Beijing and Shanghai”? However, this task is very challenging for two major reasons. First, the PVs of different combinations are interdependent; thus, the PV anomalies at the root cause can cause the changes of many other PVs at different aggregation levels. Second, there could be tens of thousands of combinations to investigate in multi-dimensional attribute space. It is a difficulty to find the root cause from a huge search space. To address the first challenge, our approach HotSpot uses a novel potential score based on the ripple effect for anomaly propagation that we reveal. To address the second challenge, HotSpot adopts the Monte Carlo Tree Search algorithm and a hierarchical pruning strategy. Using the real-world data from a top global search engine, we show that HotSpot achieves a great improvement on effectiveness and robustness, i.e., 95% of all types of root cause cases using HotSpot (compared with only 15% using existing approaches) achieves an F-score over 90%. Operational experiences show that HotSpot can reduce the localization time from more than 1 h in manual efforts to less than 20 s.https://ieeexplore.ieee.org/document/8288614/Anomaly localizationmulti-dimensional attributeshuge search spacepotential scoreMonte Carlo Tree Search (MTCS)hierarchical pruning |
spellingShingle | Yongqian Sun Youjian Zhao Ya Su Dapeng Liu Xiaohui Nie Yuan Meng Shiwen Cheng Dan Pei Shenglin Zhang Xianping Qu Xuanyou Guo HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional Attributes IEEE Access Anomaly localization multi-dimensional attributes huge search space potential score Monte Carlo Tree Search (MTCS) hierarchical pruning |
title | HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional Attributes |
title_full | HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional Attributes |
title_fullStr | HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional Attributes |
title_full_unstemmed | HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional Attributes |
title_short | HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional Attributes |
title_sort | hotspot anomaly localization for additive kpis with multi dimensional attributes |
topic | Anomaly localization multi-dimensional attributes huge search space potential score Monte Carlo Tree Search (MTCS) hierarchical pruning |
url | https://ieeexplore.ieee.org/document/8288614/ |
work_keys_str_mv | AT yongqiansun hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes AT youjianzhao hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes AT yasu hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes AT dapengliu hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes AT xiaohuinie hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes AT yuanmeng hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes AT shiwencheng hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes AT danpei hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes AT shenglinzhang hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes AT xianpingqu hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes AT xuanyouguo hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes |