The automatic construction of large-scale corpora for summarization research

被引:35
|
作者
Marcu, D [1 ]
机构
[1] Univ So Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
来源
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 1999年
关键词
D O I
10.1145/312624.312668
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Summarization research is notorious for its lack of adequate corpora: today, there exist only a few small collections of texts whose units have been manually annotated for textual importance. Given the cost and tediousness of the annotation process, it is very unlikely that we will ever manually annotate for textual importance sufficiently large corpora of texts. To circumvent this problem, we have developed an algorithm that constructs such corpora automatically. Our algorithm takes as input an (Abstract, Text) tuple and generates the corresponding Extract, i.e., the set of clauses (sentences) in the Text that were used to write the Abstract. The performance of the algorithm is shown to be close to that of humans by means of an empirical experiment. The experiment also suggests extraction strategies that could improve the performance of automatic summarization systems.
引用
收藏
页码:137 / 144
页数:8
相关论文
共 50 条
  • [31] LMGQS: A Large-scale Dataset for Query-focused Summarization
    Xu, Ruochen
    Wang, Song
    Liu, Yang
    Wang, Shuohang
    Xu, Yichong
    Iter, Dan
    He, Pengcheng
    Zhu, Chenguang
    Zeng, Michael
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 14764 - 14776
  • [32] MEDIASUM: A Large-scale Media Interview Dataset for Dialogue Summarization
    Zhu, Chenguang
    Liu, Yang
    Mei, Jie
    Zeng, Michael
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 5927 - 5934
  • [33] NEWSFARM: A Large-Scale Chinese Corpus of Long News Summarization
    Zang, Shunan
    Zhang, Chuang
    Liu, Xiaojun
    Chen, Xiaojun
    Zhang, Peng
    Liu, Jie
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2260 - 2272
  • [34] Automatic multi-documents text summarization by a large-scale sparse multi-objective optimization algorithm
    H. Abo-Bakr
    S. A. Mohamed
    Complex & Intelligent Systems, 2023, 9 : 4629 - 4644
  • [35] Large-scale construction project management
    Wood, Julie
    PROCEEDINGS OF THE INSTITUTION OF CIVIL ENGINEERS-CIVIL ENGINEERING, 2021, 174 (03) : 103 - 103
  • [36] KEGG OC: a large-scale automatic construction of taxonomy-based ortholog clusters
    Nakaya, Akihiro
    Katayama, Toshiaki
    Itoh, Masumi
    Hiranuka, Kazushi
    Kawashima, Shuichi
    Moriya, Yuki
    Okuda, Shujiro
    Tanaka, Michihiro
    Tokimatsu, Toshiaki
    Yamanishi, Yoshihiro
    Yoshizawa, Akiyasu C.
    Kanehisa, Minoru
    Goto, Susumu
    NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D353 - D357
  • [37] Research on Robust Decentralized Connective Stabilization for Expanding Construction of Large-Scale Systems
    Li, Xiaohua
    Wu, Wenbo
    Jing, Yuanwei
    2010 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-5, 2010, : 2975 - +
  • [38] Research and Implementation of Suppression Method of Dust Pollution Environment in Large-Scale Construction
    Cao, Peng
    Zhang, Bingtao
    EKOLOJI, 2019, 28 (107): : 2813 - 2823
  • [39] Research on the Construction Management and Sustainable Development of Large-Scale Scientific Facilities in China
    Xi Guiquan
    Cong Lin
    Jin Xuehui
    2018 2ND INTERNATIONAL WORKSHOP ON RENEWABLE ENERGY AND DEVELOPMENT (IWRED 2018), 2018, 153
  • [40] MATERIALS MANAGEMENT IN LARGE-SCALE CONSTRUCTION PROJECTS - SOME CONCERNS AND RESEARCH ISSUES
    SILVER, EA
    ENGINEERING COSTS AND PRODUCTION ECONOMICS, 1989, 15 : 223 - 229