Semantic Network Analysis Pipeline-Interactive Text Mining Framework for Exploration of Semantic Flows in Large Corpus of Text

被引:1
|
作者
Cenek, Martin [1 ,4 ]
Bulkow, Rowan [2 ]
Pak, Eric [3 ]
Oyster, Levi [3 ]
Ching, Boyd [3 ]
Mulagada, Ashika [1 ]
机构
[1] Univ Portland, Comp Sci, Portland, OR 90203 USA
[2] Resource Data Inc, Anchorage, AK 99503 USA
[3] Univ Alaska Anchorage, Comp Sci, Anchorage, AK 99508 USA
[4] 5000 N Willamette Blvd, Portland, OR 97203 USA
来源
APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 24期
关键词
semantic concept; text mining; computational linguistics; language processing; natural language processing; interactive visualization; MODEL;
D O I
10.3390/app9245302
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Historical topic modeling and semantic concepts exploration in a large corpus of unstructured text remains a hard, opened problem. Despite advancements in natural languages processing tools, statistical linguistics models, graph theory and visualization, there is no framework that combines these piece-wise tools under one roof. We designed and constructed a Semantic Network Analysis Pipeline (SNAP) that is available as an open-source web-service that implements work-flow needed by a data scientist to explore historical semantic concepts in a text corpus. We define a graph theoretic notion of a semantic concept as a flow of closely related tokens through the corpus of text. The modular work-flow pipeline processes text using natural language processing tools, statistical content narrowing, creates semantic networks from lexical token chaining, performs social network analysis of token networks and creates a 3D visualization of the semantic concept flows through corpus for interactive concept exploration. Finally, we illustrate the framework's utility to extract the information from a text corpus of Herman Melville's novel Moby Dick, the transcript of the 2015-2016 United States (U.S.) Senate Hearings on Environment and Public Works, and the Australian Broadcast Corporation's short news articles on rural and science topics.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Triplet Embedding Convolutional Recurrent Neural Network for Long Text Semantic Analysis
    Liu, Jingxuan
    Zhu, Ming
    Ouyang, Huajiang
    Sun, Guozi
    Li, Huakang
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2022, 2022, 13724 : 607 - 615
  • [42] Analysis of the Mutual Relevance of Topical Corpus Documents in the Problem of Assessing the Proximity of Text to the Semantic Standard
    D. V. Mikhaylov
    G. M. Emelyanov
    Pattern Recognition and Image Analysis, 2021, 31 : 588 - 594
  • [43] English text quality analysis based on recurrent neural network and semantic segmentation
    Luo, Xiaoyu
    Chen, Zhibin
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 112 : 507 - 511
  • [44] Analysis of the Mutual Relevance of Topical Corpus Documents in the Problem of Assessing the Proximity of Text to the Semantic Standard
    Mikhaylov, D., V
    Emelyanov, G. M.
    PATTERN RECOGNITION AND IMAGE ANALYSIS, 2021, 31 (03) : 588 - 594
  • [45] The interpretation of dream meaning: Resolving ambiguity using Latent Semantic Analysis in a small corpus of text
    Altszyler, Edgar
    Ribeiro, Sidarta
    Sigman, Mariano
    Fernandez Slezak, Diego
    CONSCIOUSNESS AND COGNITION, 2017, 56 : 178 - 187
  • [46] Modeling of Process Dynamics by Sequence of Homogenous Semantic Networks on the Base of Text Corpus Sequence Analysis
    Kharlamov, Alexander A.
    Yermolenko, Tatyana V.
    Zhonin, Andrey A.
    SPEECH AND COMPUTER, 2014, 8773 : 300 - 307
  • [47] SDbQfSum: Query-focused summarization framework based on diversity and text semantic analysis
    Mohamed, Muhidin
    Oussalah, Mourad
    Chang, Victor
    EXPERT SYSTEMS, 2024, 41 (01)
  • [48] TALS: A Framework For Text Analysis, Fine-Grained Annotation, Localisation and Semantic Segmentation
    Jaradat, Shatha
    Dokoohaki, Nima
    Wara, Ummul
    Goswami, Mallu
    Hammar, Kim
    Matskin, Mihhail
    2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 2, 2019, : 201 - 206
  • [49] SimText: a text mining framework for interactive analysis and visualization of similarities among biomedical entities
    Macnee, Marie
    Perez-Palma, Eduardo
    Schumacher-Bass, Sarah
    Dalton, Jarrod
    Leu, Costin
    Blankenberg, Daniel
    Lal, Dennis
    BIOINFORMATICS, 2021, 37 (22) : 4285 - 4287
  • [50] Identification of Key State Information of Substation Equipment Based on Text Mining and Semantic Analysis Technology
    Wang, Hongwu
    Wu, Zengming
    Yang, Teng
    PROCEEDINGS OF THE 3RD INTERNATIONAL SYMPOSIUM ON NEW ENERGY AND ELECTRICAL TECHNOLOGY, 2023, 1017 : 683 - 689