Deep-Learning-Based Pre-Training and Refined Tuning for Web Summarization Software

被引:0
作者
Liu, Mingyue [1 ]
Ma, Zhe [2 ]
Li, Jiale [3 ]
Wu, Ying Cheng [4 ]
Wang, Xukang [5 ]
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14850 USA
[2] Univ Southern Calif, Ming Hsieh Dept Elect & Comp Engn, Los Angeles, CA 90007 USA
[3] NYU, Tandon Sch Engn, New York, NY 10012 USA
[4] Univ Washington, Sch Law, Seattle, WA 98195 USA
[5] Sage IT Consulting Grp, Shanghai 200060, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Pre-training; deep learning; web information extraction;
D O I
10.1109/ACCESS.2024.3423662
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the digital age, the rapid growth of web information has made it increasingly challenging for individuals and organizations to effectively explore and extract valuable insights from the vast amount of information available. This paper presents a novel approach to automated web text summarization that combines advanced natural language processing techniques with recent breakthroughs in deep learning. we propose a dual-faceted technique that leverages extensive pre-training on a broad dataset outside the domain, followed by a unique refined tuning process. We introduce a carefully curated dataset that captures the heterogeneous nature of web articles and propose an innovative pre-training and tuning approach that establishes a new state-of-the-art in news summarization. Through extensive experiments and rigorous comparisons against existing models, we demonstrate the superiority of our method, particularly highlighting the crucial role of the refined tuning process in achieving these results. Through rigorous experimentation against state-of-the-art models, we demonstrate the superior performance of our approach, highlighting the significance of their refined tuning process in achieving these results.
引用
收藏
页码:92120 / 92129
页数:10
相关论文
共 29 条
  • [1] Brown TB, 2020, ADV NEUR IN, V33
  • [2] Chung HW, 2022, Arxiv, DOI [arXiv:2210.11416, 10.48550/arXiv.2210.11416]
  • [3] Nguyen DQ, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING: SYSTEM DEMONSTRATIONS, P9
  • [4] Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, 10.48550/arXiv.1810.04805, DOI 10.48550/ARXIV.1810.04805]
  • [5] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [6] Hwang SJ, 2024, Arxiv, DOI arXiv:2310.19680
  • [7] Lewis M., 2020, P 58 ANN M ASS COMP, P7871, DOI DOI 10.18653/V1/2020.ACL-MAIN.703
  • [8] Li J., 2022, arXiv
  • [9] Lin C.-Y., 2004, PROC TEXT SUMMARIZAT, P74
  • [10] Liu Y, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3730