Analysis of the Mutual Relevance of Topical Corpus Documents in the Problem of Assessing the Proximity of Text to the Semantic Standard

被引:3
|
作者
Mikhaylov, D., V [1 ]
Emelyanov, G. M. [1 ]
机构
[1] Yaroslav Wise Novgorod State Univ, Veliky Novgorod 173003, Russia
基金
俄罗斯基础研究基金会;
关键词
pattern recognition; data mining; information theory; linguistic representation of expert knowledge; lossless-in-sense text compression; vector model;
D O I
10.1134/S1054661821030172
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The article is devoted to the problem of the unity and integrity of the image of a semantic standard, allocated by phrases for a topical text. Herewith, the proximity of the text to the standard is assessed without searching for paraphrases, and the base for assessing the proximity of the text to the standard is the division of words of each of its phrases into classes according to the value of the TF-IDF measure relative to the texts of the corpus, previously formed by the expert. The analyzed texts are abstracts of scientific articles together with their titles. The core of the problem is as follows: for each phrase, the maximum proximity to the standard is achieved with respect to its corpus document and, as a consequence, it is required to assess the mutual relevance of such documents for different phrases of the analyzed text. In this study, this problem is solved by introducing the distances between the vectors of the values of the TF-IDF measure of the words of a separate phrase with respect to different documents in the corpus. In this case, the distance between documents, relative to which the closest proximity to the standard of phrases of the analyzed text was achieved, should be minimal. Using the Euclidean metric and Manhattan distance as an example, this study illustrates the application of the proposed approach to the problem of choosing a higher-level text for the given one in the hierarchy being formed in terms of semantic standard complementarity.
引用
收藏
页码:588 / 594
页数:7
相关论文
共 28 条
  • [1] Analysis of the Mutual Relevance of Topical Corpus Documents in the Problem of Assessing the Proximity of Text to the Semantic Standard
    D. V. Mikhaylov
    G. M. Emelyanov
    Pattern Recognition and Image Analysis, 2021, 31 : 588 - 594
  • [2] Reference-Corpus Formation for Estimating the Closeness of Topical Texts to the Semantic Standard
    D. V. Mikhaylov
    G. M. Emelyanov
    Pattern Recognition and Image Analysis, 2022, 32 : 755 - 762
  • [3] Reference-Corpus Formation for Estimating the Closeness of Topical Texts to the Semantic Standard
    Mikhaylov, D. V.
    Emelyanov, G. M.
    PATTERN RECOGNITION AND IMAGE ANALYSIS, 2022, 32 (04) : 755 - 762
  • [4] THE SEMANTIC RELATIONS IN THE BILINGUAL DICTIONARY BASED ON THE TEXT CORPUS: THE PROBLEM OF THE TYPOLOGY
    Bralewski, Dariusz
    ROCZNIKI HUMANISTYCZNE, 2014, 62 (08): : 149 - 181
  • [5] A taxonomy generation tool for semantic visual analysis of large corpus of documents
    Belen Carrion
    Teresa Onorati
    Paloma Díaz
    Vasiliki Triga
    Multimedia Tools and Applications, 2019, 78 : 32919 - 32937
  • [6] A taxonomy generation tool for semantic visual analysis of large corpus of documents
    Carrion, Belen
    Onorati, Teresa
    Diaz, Paloma
    Triga, Vasiliki
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (23) : 32919 - 32937
  • [7] WordNet-based lexical semantic classification for text corpus analysis
    Jun Long
    Lu-da Wang
    Zu-de Li
    Zu-ping Zhang
    Liu Yang
    Journal of Central South University, 2015, 22 : 1833 - 1840
  • [8] WordNet-based lexical semantic classification for text corpus analysis
    Long Jun
    Wang Lu-da
    Li Zu-de
    Zhang Zu-ping
    Yang Liu
    JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2015, 22 (05) : 1833 - 1840
  • [9] Semantic Network Analysis Pipeline-Interactive Text Mining Framework for Exploration of Semantic Flows in Large Corpus of Text
    Cenek, Martin
    Bulkow, Rowan
    Pak, Eric
    Oyster, Levi
    Ching, Boyd
    Mulagada, Ashika
    APPLIED SCIENCES-BASEL, 2019, 9 (24):
  • [10] Assessing sentiment of text by semantic dependency and contextual valence analysis
    Shaikh, Mostafa Al Masum
    Prendinger, Helmut
    Mitsuru, Ishizuka
    AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2007, 4738 : 191 - +