Opprentice: Towards Practical and Automatic Anomaly Detection Through Machine Learning

被引:159
作者
Liu, Dapeng [1 ,4 ]
Zhao, Youjian [1 ,4 ]
Xu, Haowen [1 ,4 ]
Sun, Yongqian [1 ,4 ]
Pei, Dan [1 ,4 ]
Luo, Jiao [2 ]
Jing, Xiaowei [3 ]
Feng, Mei [3 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Baidu, Beijing, Peoples R China
[3] PetroChina, Beijing, Peoples R China
[4] Tsinghua Natl Lab Informat Sci & Technol, Beijing, Peoples R China
来源
IMC'15: PROCEEDINGS OF THE 2015 ACM CONFERENCE ON INTERNET MEASUREMENT CONFERENCE | 2015年
基金
中国国家自然科学基金;
关键词
Anomaly Detection; Tuning Detectors; Machine Learning; CRITERIA;
D O I
10.1145/2815675.2815679
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Closely monitoring service performance and detecting anomalies are critical for Internet-based services. However, even though dozens of anomaly detectors have been proposed over the years, deploying them to a given service remains a great challenge, requiring manually and iteratively tuning detector parameters and thresholds. This paper tackles this challenge through a novel approach based on supervised machine learning. With our proposed system, Opprentice (Operators' apprentice), operators' only manual work is to periodically label the anomalies in the performance data with a convenient tool. Multiple existing detectors are applied to the performance data in parallel to extract anomaly features. Then the features and the labels are used to train a random forest classifier to automatically select the appropriate detector-parameter combinations and the thresholds. For three different service KPIs in a top global search engine, Opprentice can automatically satisfy or approximate a reasonable accuracy preference (recall >= 0.66 and precision >= 0.66). More importantly, Opprentice allows operators to label data in only tens of minutes, while operators traditionally have to spend more than ten days selecting and tuning detectors, which may still turn out not to work in the end.
引用
收藏
页码:211 / 224
页数:14
相关论文
共 48 条
[1]  
[Anonymous], P INFOCOM
[2]  
[Anonymous], 2010, P 2010 ACM C EM NETW, DOI DOI 10.1145/1921168.1921179
[3]   Impact of Response Latency on User Behavior in Web Search [J].
Arapakis, Ioannis ;
Bai, Xiao ;
Barla Cambazoglu, B. .
SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, :103-112
[4]  
Ashfaq A. B., 2010, Communications (ICC), 2010 IEEE international conference on, P1
[5]  
Balachander K., 2003, P 3 ACM SIGCOMM C IN, P234, DOI [DOI 10.1145/948205.948236, 10.1145/948205.948236]
[6]   Modeling Web Quality-of-Experience on Cellular Networks [J].
Balachandran, Athula ;
Aggarwal, Vaneet ;
Halepovic, Emir ;
Pang, Jeffrey ;
Seshan, Srinivasan ;
Venkataraman, Shobha ;
Yan, He .
PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING (MOBICOM '14), 2014, :213-224
[7]   Developing a Predictive Model of Quality of Experience for Internet Video [J].
Balachandran, Athula ;
Sekar, Vyas ;
Akella, Aditya ;
Seshan, Srinivasan ;
Stoica, Ion ;
Zhang, Hui .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2013, 43 (04) :339-350
[8]  
Barford P, 2002, IMW 2002: PROCEEDINGS OF THE SECOND INTERNET MEASUREMENT WORKSHOP, P71, DOI 10.1145/637201.637210
[9]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[10]  
Box G. E. P., 1970, Time series analysis, forecasting and control