Good edit similarity learning by loss minimization

被引:24
作者
Bellet, Aurelien [1 ]
Habrard, Amaury [1 ]
Sebban, Marc [1 ]
机构
[1] Lab Hubert Curien, UMR 5516, F-42000 St Etienne, France
关键词
Similarity learning; Edit distance; Good similarity function; Loss minimization; DISTANCE;
D O I
10.1007/s10994-012-5293-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Similarity functions are a fundamental component of many learning algorithms. When dealing with string or tree-structured data, measures based on the edit distance are widely used, and there exist a few methods for learning them from data. However, these methods offer no theoretical guarantee as to the generalization ability and discriminative power of the learned similarities. In this paper, we propose an approach to edit similarity learning based on loss minimization, called GESL. It is driven by the notion of (I mu,gamma,tau)-goodness, a theory that bridges the gap between the properties of a similarity function and its performance in classification. Using the notion of uniform stability, we derive generalization guarantees that hold for a large class of loss functions. We also provide experimental results on two real-world datasets which show that edit similarities learned with GESL induce more accurate and sparser classifiers than other (standard or learned) edit similarities.
引用
收藏
页码:5 / 35
页数:31
相关论文
共 37 条
[1]  
[Anonymous], 2006, DISTANCE METRIC LEAR
[2]  
[Anonymous], 2003, 9 ACM SIGKDD INTCONF, DOI DOI 10.1145/956750.956759
[3]  
[Anonymous], 2006, P 23 INT C MACHINE L
[4]  
Bach F., 2010, SPARSE METHODS MACHI
[5]  
Balcan M., 2008, COLT, P287
[6]  
Bellet A, 2011, LECT NOTES ARTIF INT, V6911, P188, DOI 10.1007/978-3-642-23780-5_22
[7]   An Experimental Study on Learning with Good Edit Similarity Functions [J].
Bellet, Aurelien ;
Sebban, Marc ;
Habrard, Amaury .
2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011), 2011, :126-133
[8]   Learning state machine-based string edit kernels [J].
Bellet, Aurelien ;
Bernard, Marc ;
Murgue, Thierry ;
Sebban, Marc .
PATTERN RECOGNITION, 2010, 43 (06) :2330-2339
[9]   Learning probabilistic models of tree edit distance [J].
Bernard, Marc ;
Boyer, Laurent ;
Habrard, Amaury ;
Sebban, Marc .
PATTERN RECOGNITION, 2008, 41 (08) :2611-2629
[10]  
Bian Wei, 2011, IJCAI 2011 P 22 INT, P1186, DOI [DOI 10.5591/978-1-57735-516-8/IJCAI11-202, 10.5591/978-1-57735, DOI 10.5591/978-1-57735]