Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data

被引:0
作者
Hongqin Fan
Osmar R. Zaïane
Andrew Foss
Junfeng Wu
机构
[1] Missouri Western State University,Department of Engineering Technology
[2] University of Alberta,Department of Computing Science
来源
Knowledge and Information Systems | 2009年 / 19卷
关键词
Outlier Detection; Mining Algorithm; Synthetic Dataset; Engineering Data; Local Outlier;
D O I
暂无
中图分类号
学科分类号
摘要
One of the common endeavours in engineering applications is outlier detection, which aims to identify inconsistent records from large amounts of data. Although outlier detection schemes in data mining discipline are acknowledged as a more viable solution to efficient identification of anomalies from these data repository, current outlier mining algorithms require the input of domain parameters. These parameters are often unknown, difficult to determine and vary across different datasets containing different cluster features. This paper presents a novel resolution-based outlier notion and a nonparametric outlier-mining algorithm, which can efficiently identify and rank top listed outliers from a wide variety of datasets. The algorithm generates reasonable outlier results by taking both local and global features of a dataset into account. Experiments are conducted using both synthetic datasets and a real life construction equipment dataset from a large road building contractor. Comparison with the current outlier mining algorithms indicates that the proposed algorithm is more effective and can be integrated into a decision support system to serve as a universal detector of potentially inconsistent records.
引用
收藏
页码:31 / 51
页数:20
相关论文
共 21 条
  • [1] Fisher D(1993)Applying AI clustering to engineering tasks IEEE Intell Syst 8 51-60
  • [2] Xu L(1972)Robust estimates, residuals, and outlier detection with multi-response data Biomet J Int Biomet Soc 28 81-124
  • [3] Carmes JR(1999)Chameleon: hierarchical clustering using dynamic modeling IEEE Comput 32 68-75
  • [4] Chen J(2004)Detecting Semantic anomalies in truck weigh-in-motion traffic data using data mining J Comput Civil Eng ASCE 18 291-300
  • [5] Shiavi R(2006)Capabilities of outlier detection schemes in large datasets, framework and methodologies Knowl Inf Syst 00 1-41
  • [6] Biswas G(undefined)undefined undefined undefined undefined-undefined
  • [7] Weinberg J(undefined)undefined undefined undefined undefined-undefined
  • [8] Gnanadesikan R(undefined)undefined undefined undefined undefined-undefined
  • [9] Kettenring JR(undefined)undefined undefined undefined undefined-undefined
  • [10] Karypis G(undefined)undefined undefined undefined undefined-undefined