Improving relevance in a content pipeline via syntactic generalization

被引:5
作者
Galitsky, Boris [1 ]
机构
[1] Knowledge Trail Inc, San Jose, CA 95127 USA
关键词
Content pipeline; Relevance of text classification; Machine learning of syntactic parse trees; Personalized recommendation; TEXT; GRAPHS;
D O I
10.1016/j.engappai.2016.11.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This is a report from the field on a linguistic-based relevance technology based on learning of parse trees for processing, classification and delivery of a stream of texts. We describe the content pipeline for eBay entertainment domain which employs this technology, and show that text processing relevance is the main bottleneck for its performance. A number of components of the content pipeline such as content mining, aggregation, deduplication, opinion mining, integrity enforcing need to rely on domain-independent efficient text classification, entity extraction and relevance assessment operations. Text relevance assessment is based on the operation of syntactic generalization (SG) which finds a maximum common sub-tree for a pair of parse trees for sentences. Relevance of two portions of texts is then defined as a cardinality of this sub-tree. SG is intended to substitute keyword-based analysis for more accurate assessment of relevance which takes phrase-level and sentence-level information into account. In the partial case where short expression are commonly used terms such as Facebook likes, SG ascends to the level of categories and a reasoning technique is required to map these categories in the course of relevance assessment. A number of content pipeline components employ web mining which needs SG to compare web search results. We describe how SG works in a number of components in the content pipeline including personalization and recommendation, and provide the evaluation results for eBay deployment. Content pipeline support is implemented as an open source contribution OpenNLP.Similarity and is available at https://github.com/ bgalitsky/relevance-based-on-pars-trees.
引用
收藏
页码:1 / 26
页数:26
相关论文
共 70 条
[1]  
Abney S.P., 1991, Principle-Based Parsing Studies in Linguistics and Philosophy, P257, DOI [DOI 10.1007/978-94-011-3474-310, 10.1007/978-94-011-3474-3_10, 10.1007/978-94-011-3474-310]
[2]   REASONING WITH INCONSISTENT ONTOLOGIES THROUGH ARGUMENTATION [J].
Alejandro Gomez, Sergio ;
Ivan Chesnevar, Carlos ;
Ricardo Simari, Guillermo .
APPLIED ARTIFICIAL INTELLIGENCE, 2010, 24 (1-2) :102-148
[3]  
Aleman-Meza B., 2003, Proceedings of the first International Workshop on Semantic Web and Databases, Co-located with the International Conference on Very Large Data Bases, P33
[4]   A logic programming framework for possibilistic argumentation:: Formalization and logical properties [J].
Alsinet, Teresa ;
Chesnevar, Carlos I. ;
Godo, Lluis ;
Simari, Guillermo R. .
FUZZY SETS AND SYSTEMS, 2008, 159 (10) :1208-1228
[5]   A corpus-based semantic kernel for text classification by using meaning values of terms [J].
Altinel, Berna ;
Ganiz, Murat Can ;
Diri, Banu .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 43 :54-66
[6]  
[Anonymous], KERNEL METHODS RELAT
[7]  
[Anonymous], P 22 INT JOINT C ART
[8]  
[Anonymous], 2009, TECHNICAL REPORT
[9]  
[Anonymous], 2014, SEMEVAL 2014 TASK 9
[10]  
Antoniou G., 2001, ACM Transactions on Computational Logic, V2, P255, DOI [10.1145/371316.371517, DOI 10.1145/371316.371517]