Can Back-of-the-Book Indexes be Automatically Created?
被引:13
|
作者:
Wu, Zhaohui
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Comp Sci & Engn, University Pk, PA 16802 USAPenn State Univ, Comp Sci & Engn, University Pk, PA 16802 USA
Wu, Zhaohui
[1
]
Li, Zhenhui
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Informat Sci & Technol, University Pk, PA 16802 USAPenn State Univ, Comp Sci & Engn, University Pk, PA 16802 USA
Li, Zhenhui
[2
]
Mitra, Prasenjit
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Comp Sci & Engn, University Pk, PA 16802 USA
Penn State Univ, Informat Sci & Technol, University Pk, PA 16802 USAPenn State Univ, Comp Sci & Engn, University Pk, PA 16802 USA
Mitra, Prasenjit
[1
,2
]
Giles, C. Lee
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Comp Sci & Engn, University Pk, PA 16802 USA
Penn State Univ, Informat Sci & Technol, University Pk, PA 16802 USAPenn State Univ, Comp Sci & Engn, University Pk, PA 16802 USA
Giles, C. Lee
[1
,2
]
机构:
[1] Penn State Univ, Comp Sci & Engn, University Pk, PA 16802 USA
[2] Penn State Univ, Informat Sci & Technol, University Pk, PA 16802 USA
来源:
PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13)
|
2013年
关键词:
Back-of-the-Book Index;
Book Index;
Term Informativeness;
D O I:
10.1145/2505515.2505627
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
Automatic creation of back-of-the-book indexes remains one of the few manual tasks related to publishing. Inspired by how human indexers work on back-of-the-book indexes creation, we present a new domain-independent, corpus-free and training-free automation approach. Given a book, the index terms will be sequentially selected according to an indexability score encoded by the structure information residing in a book as well as a novel context-aware term informativeness measurement utilizing the power of the web knowledge base such as Wikipedia. By extensive experiments on books from various domains, we show our approach to be a more effective and practical than ones that used previous keyword extraction and supervised learning.