Understanding Query Interfaces by Statistical Parsing

被引:5
作者
Su, Weifeng [1 ]
Wu, Hejun [2 ]
Li, Yafei
Zhao, Jing
Lochovsky, Frederick H. [3 ]
Cai, Hongmin [4 ]
Huang, Tianqiang [5 ]
机构
[1] PKU HKUST Shenzhen Hong Kong Inst, Shenzhen Key Lab Intelligent Media & Speech, Hong Kong, Hong Kong, Peoples R China
[2] Sun Yat Sen Univ, Guangzhou, Guangdong, Peoples R China
[3] Hong Kong Univ Sci & Technol, Hong Kong, Hong Kong, Peoples R China
[4] S China Univ Technol, Guangzhou, Guangdong, Peoples R China
[5] Fujian Normal Univ, Fuzhou, Peoples R China
关键词
Algorithms; Performance; Experimentation; Query interface; maximum entropy; SEARCH INTERFACES; WEB; CLASSIFICATION;
D O I
10.1145/2460383.2460387
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Users submit queries to an online database via its query interface. Query interface parsing, which is important for many applications, understands the query capabilities of a query interface. Since most query interfaces are organized hierarchically, we present a novel query interface parsing method, StatParser (Statistical Parser), to automatically extract the hierarchical query capabilities of query interfaces. StatParser automatically learns from a set of parsed query interfaces and parses new query interfaces. StatParser starts from a small grammar and enhances the grammar with a set of probabilities learned from parsed query interfaces under the maximum-entropy principle. Given a new query interface, the probability-enhanced grammar identifies the parse tree with the largest global probability to be the query capabilities of the query interface. Experimental results show that StatParser very accurately extracts the query capabilities and can effectively overcome the problems of existing query interface parsers.
引用
收藏
页码:1 / 22
页数:22
相关论文
共 35 条
  • [1] Barbosa L., 2007, Proceedings of the International Conference on World Wide Web (WWW), DOI 10.1145/1242572.1242632
  • [2] Benslimane SM, 2007, INFORMATICA-LITHUAN, V18, P511
  • [3] Bergman M. K., 2001, Journal of Electronic Publishing, V7, DOI 10.3998/3336451.0007.104
  • [4] Borthwick A, 1999, THESIS NEW YORK U NE
  • [5] Chang KCC, 2004, SIGMOD REC, V33, P61, DOI 10.1145/1031570.1031584
  • [6] Charniak E, 2000, 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, pA132
  • [7] DRAGUT E., 2006, P 22 INT C DAT ENG I, P679
  • [8] Dragut Eduard C., 2012, DEEP WEB QUERY INTER
  • [9] Dragut EduardConstantin., 2009, PVLDB, V2, P325, DOI DOI 10.14778/1687627.1687665
  • [10] FEINER A., 2003, ANN MATH ARTIF INTEL, V39, P19