The automatic construction of large-scale corpora for summarization research

被引:35
|
作者
Marcu, D [1 ]
机构
[1] Univ So Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
来源
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 1999年
关键词
D O I
10.1145/312624.312668
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Summarization research is notorious for its lack of adequate corpora: today, there exist only a few small collections of texts whose units have been manually annotated for textual importance. Given the cost and tediousness of the annotation process, it is very unlikely that we will ever manually annotate for textual importance sufficiently large corpora of texts. To circumvent this problem, we have developed an algorithm that constructs such corpora automatically. Our algorithm takes as input an (Abstract, Text) tuple and generates the corresponding Extract, i.e., the set of clauses (sentences) in the Text that were used to write the Abstract. The performance of the algorithm is shown to be close to that of humans by means of an empirical experiment. The experiment also suggests extraction strategies that could improve the performance of automatic summarization systems.
引用
收藏
页码:137 / 144
页数:8
相关论文
共 50 条
  • [41] Research on Application of Project Information Portal in Large-scale Tobacco Construction Project
    Sun, Jide
    Wang, Dawei
    INTERNATIONAL CONFERENCE ON ENGINEERING AND BUSINESS MANAGEMENT (EBM2011), VOLS 1-6, 2011, : 1401 - 1404
  • [42] A large-scale mobile application knowledge graph for the research of cybersecurity: Construction and application
    Li, Weizhuo
    Zhou, Heng
    Tan, Yiming
    Luo, Weiqi
    Ji, Qiu
    Bian, Yuyang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 149
  • [43] Learning Large-Scale Automatic Image Colorization
    Deshpande, Aditya
    Rock, Jason
    Forsyth, David
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 567 - 575
  • [44] LARGE-SCALE AUTOMATIC STORAGE AND SAMPLING UNIT
    TRENT, TL
    WALLACE, JJ
    NUCLEONICS, 1954, 12 (11): : 70 - 71
  • [45] A Framework for Large-Scale Automatic Fluency Assessment
    Silva, Warley Almeida
    Carchedi, Luiz Carlos
    Gomes Junior, Jorao
    de Souza, Joao Victor
    Barrere, Eduardo
    de Souza, Jairo Francisco
    INTERNATIONAL JOURNAL OF DISTANCE EDUCATION TECHNOLOGIES, 2021, 19 (03) : 70 - 88
  • [46] Automatic Monitoring of Large-Scale Computing Infrastructure
    Kim, Bockjoo
    Bourilkov, Dimitri
    26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS, CHEP 2023, 2024, 295
  • [47] MCLS: A Large-Scale Multimodal Cross-Lingual Summarization Dataset
    Shi, Xiaorui
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 273 - 288
  • [48] β-Cores: Robust Large-Scale Bayesian Data Summarization in the Presence of Outliers
    Manousakas, Dionysis
    Mascolo, Cecilia
    WSDM '21: PROCEEDINGS OF THE 14TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2021, : 940 - 948
  • [49] Construction of mathematic model for automatic summarization
    Wang, ZQ
    Wang, YC
    Gao, K
    Liu, CH
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 2278 - 2283
  • [50] A LARGE-SCALE CHINESE LONG-TEXT EXTRACTIVE SUMMARIZATION CORPUS
    Chen, Kai
    Fu, Guanyu
    Chen, Qingcai
    Hu, Baotian
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7828 - 7832