Semantic rule-based information extraction for meteorological reports

被引:3
作者
Cui, Mengmeng [1 ]
Huang, Ruibin [1 ]
Hu, Zhichen [1 ]
Xia, Fan [2 ]
Xu, Xiaolong [1 ]
Qi, Lianyong [3 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Sch Software, Nanjing, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Reading Acad, Nanjing, Peoples R China
[3] Qufu Normal Univ, Sch Informat Sci & Engn, Qufu, Peoples R China
基金
中国国家自然科学基金;
关键词
Information extraction; Named entity recognition; Domain text; Meteorological reports;
D O I
10.1007/s13042-023-01885-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Meteorological reports are one of the most important means of recording the weather conditions of a place over a period of time, and the existence of a large number of meteorological reports creates a huge demand for text processing and information extraction. However, valuable data and information are still buried deep in the mountain of meteorological reports, and there is an urgent need for an automated information extraction technique to help people integrate data from multiple meteorological reports and perform data analysis for a more comprehensive understanding of a specific meteorological topic or domain. Named entity recognition (NER) technique can extract useful entity information from meteorological reports. By analyzing the characteristics of nested entities in meteorological reports, this paper further proposes to introduce Multi-Conditional Random Fields (Multi-CRF), which uses each layer of CRF to output the recognition results of each type of entities, which helps to solve the problem of identifying nested entities in meteorological reports. The experimental results show that our model achieves state-of-the-art results. The final recognition results provide effective data support for automatic text verification recognition in the meteorological domain and provide important practical value for the construction of knowledge graphs of related meteorological reports.
引用
收藏
页码:177 / 188
页数:12
相关论文
共 38 条
  • [1] Akhundova N, 2021, 2021 IEEE 15 INT C A, P1
  • [2] Multi-information Source HIN for Medical Concept Embedding
    Cao, Yuwei
    Peng, Hao
    Yu, Philip S.
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 396 - 408
  • [3] Caruana R, 2001, ADV NEUR IN, V13, P402
  • [4] Cui YM, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P657
  • [5] Pre-Training With Whole Word Masking for Chinese BERT
    Cui, Yiming
    Che, Wanxiang
    Liu, Ting
    Qin, Bing
    Yang, Ziqing
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3504 - 3514
  • [6] de CastroJunior S.L, 2020, 2020 15 IB C INF SYS, P1
  • [7] Devlin J., 2019, P NAACL HLT 2019 MIN
  • [8] Feilmayr C., 2011, Proceedings of the 2011 22nd International Conference on Database and Expert Systems Applications (DEXA 2011), P217, DOI 10.1109/DEXA.2011.79
  • [9] A Collective Entity Linking Method Based on Graph Embedding Algorithm
    Feng, Haojun
    Duan, Li
    Zhang, Biying
    Liu, Jiangzhou
    [J]. 2020 5TH INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE 2020), 2020, : 1479 - 1482
  • [10] Gan T, 2019, I COMP CONF WAVELET, P262, DOI [10.1109/ICCWAMTIP47768.2019.9067673, 10.1109/iccwamtip47768.2019.9067673]