Extending Statistical Data Quality Improvement with Explicit Domain Models

被引:0
作者
Solomakhina, Nina [1 ]
Hubauer, Thomas [2 ]
Lamparter, Steffen [2 ]
Roshchin, Mikhail [2 ]
Grimm, Stephan [2 ]
机构
[1] Vienna Univ Technol, A-1060 Vienna, Austria
[2] Siemens AG, Munich, Germany
来源
2014 12TH IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN) | 2014年
关键词
data quality; industrial control; knowledge-based methods; time series; statistics;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Automatic processing of data for the purpose of determining operating states and identifying faults has become essential for many modern industrial systems. Typical sources of this data include hundreds of sensors mounted at the industrial machinery measuring qualities such as temperature, vibration, pressure, and many more. However, sensors are complex technical devices, which means that they can fail and their readings may contain noise or imprecise values. Such low quality data makes it hard to solve the original task of assessing system and process status. We present an approach which brings together several well-known techniques from computer science and statistics and enhances monitoring of technical systems by improving results of detection and correction of data quality issues in sensor data. The application domain and the dependencies between its objects are represented as a knowledge-based model, while statistics identifies data anomalies, such as outlying or missing values, in sensor measurement data. Combining information from the knowledge-based model and statistical computations allows to validate and improve data analysis results. We demonstrate the proposed approach on a real-world industrial use case from the power generation domain. Our evaluation shows that the combined solution improves precision indexes while maintaining high accuracy and recall values.
引用
收藏
页码:720 / +
页数:2
相关论文
共 24 条
[1]  
[Anonymous], 2009, W3C recommendation
[2]  
[Anonymous], TECH REP
[3]  
[Anonymous], P SIGMOD
[4]  
[Anonymous], 2006, Introduction to Time Series and Forecasting
[5]  
Botoeva E, 2010, LECT NOTES ARTIF INT, V6304, P21, DOI 10.1007/978-3-642-15431-7_3
[6]   Tractable reasoning and efficient query answering in description logics:: The DL-Lite family [J].
Calvanese, Diego ;
De Giacomo, Giuseppe ;
Lembo, Domenico ;
Lenzerini, Maurizio ;
Rosati, Riccardo .
JOURNAL OF AUTOMATED REASONING, 2007, 39 (03) :385-429
[7]   The SSN ontology of the W3C semantic sensor network incubator group [J].
Compton, Michael ;
Barnaghi, Payam ;
Bermudez, Luis ;
Garcia-Castro, Raul ;
Corcho, Oscar ;
Cox, Simon ;
Graybeal, John ;
Hauswirth, Manfred ;
Henson, Cory ;
Herzog, Arthur ;
Huang, Vincent ;
Janowicz, Krzysztof ;
Kelsey, W. David ;
Le Phuoc, Danh ;
Lefort, Laurent ;
Leggieri, Myriam ;
Neuhaus, Holger ;
Nikolov, Andriy ;
Page, Kevin ;
Passant, Alexandre ;
Sheth, Amit ;
Taylor, Kerry .
JOURNAL OF WEB SEMANTICS, 2012, 17 :25-32
[8]  
Dereszynski E., 2007, CORR
[9]  
Elnahrawy Eiman., 2003, P ACM WSNA03, P78
[10]  
Hoaglin D., 1983, Understanding Robust and Exploratory Data Analysis, V3