Title generation of knowledge points for classroom teaching

被引:0
作者
Xiao S. [1 ]
Zhao H. [2 ]
机构
[1] School of Software, Xinjiang University, Urumqi
[2] School of Information Science and Engineering, Xinjiang University, Urumqi
来源
Qinghua Daxue Xuebao/Journal of Tsinghua University | 2024年 / 64卷 / 05期
关键词
classroom teaching; TextRank; title generation; topic information; UniLM;
D O I
10.16511/j.cnki.qhdxxb.2023.26.059
中图分类号
学科分类号
摘要
[Objective] In the digital age, brief titles are critical for efficient reading. However, headline generation technology is mostly used in news rather than in other domains. Generating key points in classroom scenarios can enhance comprehension and improve learning efficiency. Traditional extractive algorithms such as Lead-3 and the original TextRank algorithm fail to effectively capture the critical information of an article. They merely rank sentences based on factors such as position or text similarity, overlooking keyword. To address this issue, herein, an improved TextRank algorithm—text ranking combining keywords and sentence positions (TKSP)—is proposed. Extractive models extract information without expanding on the original text, while generative models generate brief and coherent headlines, they sometimes misunderstand the source text, resulting in inaccurate and repetitive headings. To address this issue, TKSP is combined with the UniLM generative model (UniLM-TK model) to incorporate text topic information. [Methods] Courses are collected from a MOOC platform, and audio are extracted from teaching videos. Speech-to-text conversion are performed using an audio transcription tool. The classroom teaching text are organized, segmented based on knowledge points, and manually titled to generate a dataset. Thereafter> an improved TextRank algorithm—TKSP—proposed here is used to automatically generate knowledge points. First, the algorithm applies the Word2Vec word vector model to textrank. TKSP considers four types of sentence critical influences: (1) Sentence position factor: The first paragraph serves as a general introduction to the knowledge point, leading to higher weight. Succeeding sentences have decreasing weights based on their position. (2) keyword number factor: Sentences with keyword contain valuable information, and their importance increases with the number of keyword present. The TextRank algorithm generates a keyword list from the knowledge content. Sentence weights are adjusted based on the number of keyword, assigning higher weights to sentences with more keyword. (3) keyword importance factor: keyword weight reflects keyword importance arranged in descending order. Accordingly, sentence weights are adjusted; the sentence with the first keyword has the highest weight, while sentences with the second and third keyword have lower weights. (4) Sentence importance factor: The first sentence with a keyword serves as a general introduction, more relevant to the knowledge point. The sentence weight is the highest for this sentence and decreases with subsequent occurrences of the keyword. These four influencing factors of sentence weight are integrated to establish the sentence weight calculation formula. Based on the weight value of the sentence, the top-ranked sentence is chosen to create the text title. Herein, the combined TKSP algorithm and UniLM model, called the UniLM-TK model, is proposed. The TKSP algorithm is employed to extract critical sentences, and the textrank algorithm is employed to extract a topic word from the knowledge text. These are separately embedded into the model input sequence, which undergoes transformer block processing. The critical sentence captures text context using self-attention, while the topic word incorporates topic information through cross-attention. The final attention formula is established by weighting and summing these representations. The attention mechanism output is further processed by a feedforward network to extract high-level features. The focused sentences extracted by TKSP can effectively reduce the extent of model computation and data processing difficulty, allowing the model to focus more on extracting and generating focused information. [Results] The TKSP algorithm outperformed classical extractive algorithms (namely maximal marginal relevance, latent Dirichlet allocation, Lead-3, and textrank) in ROUGE-1, ROUGE-2, and ROUGE-L metrics, achieving optimal performances of 51. 20%, 33. 42%, and 50. 48%, respectively. In the ablation experiments of the UniLM-TK model, the optimal performance was achieved by extracting seven key sentences, with specific indicator performances of 73.29%, 58.12%, and 72.87%, respectively. Comparing the headings generated by the UniLM-TK model and GPT3. 5 API, the headings generated by UniLM-TK were brief, clear, accurate, and more readable in summarizing the text topic. Experiments were performed for real headings using a large-scale Chinese scientific literature dataset to compare the UniLM-TK and ALBERT models; the UniLM-TK model improved the ROUGE-1, ROUGE-2, and ROUGE-L metrics by 6.45%, 3.96%, and 9. 34%, respectively. [Conclusions] The effectiveness of the TKSP algorithm is demonstrated by comparing it with other extractive methods and proving that the headings generated by UniLM-TK exhibit better accuracy and readability. © 2024 Tsinghua University. All rights reserved.
引用
收藏
页码:770 / 779
页数:9
相关论文
共 20 条
  • [1] JIAO L Y, GUO Y, LIU Y, Et al., A sequence model for single document headline generation [Jj, Journal of Chinese Information Processing, 35, 1, pp. 64-71, (2021)
  • [2] ZHANG X, MAO X J, ZHAO R M, Et al., Study on extractive summarization with global information _}j, Computer Science, 50, 4, pp. 188-195, (2023)
  • [3] CHENG K, LI C Y, JIA X X, Et al., News summarization extracting method based on improved MMR algorithm [j], Journal of Applied Sciences, 39, 3, pp. 443-455, (2021)
  • [4] VO T., An approach of syntactical text graph representation learning for extractive summarization LJl, International Journal of Intelligent Robotics and Applications, 7, 1, pp. 190-204, (2023)
  • [5] RAKROUKI M A, ALHARBE N, KHAYYAT M, Et al., TG-SMR
  • [6] A text summarization algorithm based on topic and graph models, Computer Systems Science and Engineering, 45, 1, pp. 395-408, (2023)
  • [7] MALARSELVI G, PANDIAN A., Multi-layered network model for text summarization using feature representation, Soft Computing, 27, 1, pp. 311-322, (2023)
  • [8] BELWAL R C, RAI S W, GUPTA A., Extractive text summarization using clustering-ba.sed topic modeling _}j, Soft Computing, 27, 7, pp. 3965-3982, (2023)
  • [9] FENG H., Research and application of bidirectional LSTM based on attention in text title generation [D], (2020)
  • [10] GAN C M, TANG H, YANG H L, Et al., Abstractive text summarization method incorporating convolutional shrinkage gating [J/OL], Computer Engineering