Toward Formal Data Set Verification for Building Effective Machine Learning Models

被引:1
作者
Lopez, Jorge [1 ]
Labonne, Maxime [1 ]
Poletti, Claude [1 ]
机构
[1] Airbus Def & Space, Issy Les Moulineaux, France
来源
PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1: | 2021年
关键词
Machine Learning; Data Set Collection; Formal Verification; Trusted Artificial Intelligence;
D O I
10.5220/0010676500003064
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to properly train a machine learning model, data must be properly collected. To guarantee a proper data collection, verifying that the collected data set holds certain properties is a possible solution. For example, guaranteeing that that data set contains samples across the whole input space, or that the data set is balanced w.r.t. different classes. We present a formal approach for verifying a set of arbitrarily stated properties over a data set. The proposed approach relies on the transformation of the data set into a first order logic formula, which can be later verified w.r.t. the different properties also stated in the same logic. A prototype tool, which uses the z3 solver, has been developed; the prototype can take as an input a set of properties stated in a formal language and formally verify a given data set w.r.t. to the given set of properties. Preliminary experimental results show the feasibility and performance of the proposed approach, and furthermore the flexibility for expressing properties of interest.
引用
收藏
页码:249 / 256
页数:8
相关论文
共 12 条
[1]  
Barrett C., 2018, Handbook of model checking, P305, DOI DOI 10.1007/978-3-319-10575-8_11
[2]   Satisfiability modulo theories [J].
Barrett, Clark ;
Sebastiani, Roberto ;
Seshia, Sanjit A. ;
Tinelli, Cesare .
Frontiers in Artificial Intelligence and Applications, 2009, 185 (01) :825-885
[3]   Julia: A Fresh Approach to Numerical Computing [J].
Bezanson, Jeff ;
Edelman, Alan ;
Karpinski, Stefan ;
Shah, Viral B. .
SIAM REVIEW, 2017, 59 (01) :65-98
[4]   Automatic Derivation and Validation of a Cloud Dataset for Insider Threat Detection [J].
Carvallo, Pamela ;
Cavalli, Ana R. ;
Kushik, Natalia .
ICSOFT: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES, 2017, :480-487
[5]   Z3: An efficient SMT solver [J].
de Moura, Leonardo ;
Bjorner, Nikolaj .
TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, 2008, 4963 :337-340
[6]  
Finkbeiner B., 2006, MANY SORTED LOGIC
[7]  
Lemaître G, 2017, J MACH LEARN RES, V18
[8]  
Lopez J., 2021, DSVERIF A FORMAL DAT
[9]  
Manna Z, 2003, LECT NOTES COMPUT SC, V2757, P381
[10]  
Manzano M., 1993, Many-sorted logic and its applications, P3