DAT-MT Accelerated Graph Fusion Dependency Parsing Model for Small Samples in Professional Fields

被引:0
作者
Li, Rui [1 ]
Shu, Shili [1 ]
Wang, Shunli [1 ]
Liu, Yang [1 ]
Li, Yanhao [1 ]
Peng, Mingjun [2 ]
机构
[1] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & Re, Wuhan 430072, Peoples R China
[2] Wuhan Geomat Inst, Wuhan 430079, Peoples R China
关键词
small sample; specific professional fields; graph fusion; double-array trie; multi-threading; dependency parsing;
D O I
10.3390/e25101444
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The rapid development of information technology has made the amount of information in massive texts far exceed human intuitive cognition, and dependency parsing can effectively deal with information overload. In the background of domain specialization, the migration and application of syntactic treebanks and the speed improvement in syntactic analysis models become the key to the efficiency of syntactic analysis. To realize domain migration of syntactic tree library and improve the speed of text parsing, this paper proposes a novel approach-the Double-Array Trie and Multi-threading (DAT-MT) accelerated graph fusion dependency parsing model. It effectively combines the specialized syntactic features from small-scale professional field corpus with the generalized syntactic features from large-scale news corpus, which improves the accuracy of syntactic relation recognition. Aiming at the problem of high space and time complexity brought by the graph fusion model, the DAT-MT method is proposed. It realizes the rapid mapping of massive Chinese character features to the model's prior parameters and the parallel processing of calculation, thereby improving the parsing speed. The experimental results show that the unlabeled attachment score (UAS) and the labeled attachment score (LAS) of the model are improved by 13.34% and 14.82% compared with the model with only the professional field corpus and improved by 3.14% and 3.40% compared with the model only with news corpus; both indicators are better than DDParser and LTP 4 methods based on deep learning. Additionally, the method in this paper achieves a speedup of about 3.7 times compared to the method with a red-black tree index and a single thread. Efficient and accurate syntactic analysis methods will benefit the real-time processing of massive texts in professional fields, such as multi-dimensional semantic correlation, professional feature extraction, and domain knowledge graph construction.
引用
收藏
页数:19
相关论文
共 54 条
[1]   Syntactic-Semantic Similarity Based on Dependency Tree Kernel [J].
Alian, Marwah ;
Awajan, Arafat .
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) :10937-10948
[2]  
Anderson M., 2020, arXiv, DOI [10.18653/v1/2020.iwpt-1.2, DOI 10.18653/V1/2020.IWPT-1.2]
[3]   AN EFFICIENT IMPLEMENTATION OF TRIE STRUCTURES [J].
AOE, JI ;
MORIMOTO, K ;
SATO, T .
SOFTWARE-PRACTICE & EXPERIENCE, 1992, 22 (09) :695-721
[4]   On the adequacy of lightweight thread approaches for high-level parallel programming models [J].
Castello, Adrian ;
Mayo, Rafael ;
Sala, Kevin ;
Beltran, Vicenc ;
Balaji, Pavan ;
Pena, Antonio J. .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 84 :22-31
[5]  
Che W., 2020, arXiv
[6]  
Che W., 2010, P COL 2010 DEM VOL B
[7]   A Theory-based Deep-Learning Approach to Detecting Disinformation in Financial Social Media [J].
Chung, Wingyan ;
Zhang, Yinqiang ;
Pan, Jia .
INFORMATION SYSTEMS FRONTIERS, 2023, 25 (02) :473-492
[8]   Multitask Pointer Network for multi-representational parsing [J].
Fernandez-Gonzalez, Daniel ;
Gomez-Rodriguez, Carlos .
KNOWLEDGE-BASED SYSTEMS, 2022, 236
[9]   An architecture for encoding sentence meaning in left mid-superior temporal cortex [J].
Frankland, Steven M. ;
Greene, Joshua D. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (37) :11732-11737
[10]  
Goldberg Y., 2010, P HUM LANG TECHN 201