Neural Architecture Comparison for Bibliographic Reference Segmentation: An Empirical Study

被引:0
作者
Cuellar Hidalgo, Rodrigo [1 ]
Pinto Elias, Raul [2 ]
Torres-Moreno, Juan-Manuel [3 ]
Vergara Villegas, Osslan Osiris [4 ]
Reyes Salgado, Gerardo [5 ]
Magadan Salazar, Andrea [2 ]
机构
[1] Biblioteca Daniel Cosio Villegas, Colegio Mexico, Carretera Picacho Ajusco 20, Mexico City 14110, Mexico
[2] Tecnol Nacl Mexico CENIDET, Cuernavaca 62490, Mexico
[3] Univ Avignon, Lab Informat Avignon, 339 Chemin Meinajaries, F-84911 Avignon 9, France
[4] Univ Autonoma Ciudad Juarez, Ind & Mfg Engn Dept, Ciudad Juarez 32310, Mexico
[5] Univ Rey Juan Carlos, Dept Informat & Estadist, Ave Alcalde de Mostoles, Madrid 28933, Spain
关键词
reference mining; BiLSTM; transformers; byte-pair encoding; Conditional Random Fields; EXTRACTION;
D O I
10.3390/data9050071
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the realm of digital libraries, efficiently managing and accessing scientific publications necessitates automated bibliographic reference segmentation. This study addresses the challenge of accurately segmenting bibliographic references, a task complicated by the varied formats and styles of references. Focusing on the empirical evaluation of Conditional Random Fields (CRF), Bidirectional Long Short-Term Memory with CRF (BiLSTM + CRF), and Transformer Encoder with CRF (Transformer + CRF) architectures, this research employs Byte Pair Encoding and Character Embeddings for vector representation. The models underwent training on the extensive Giant corpus and subsequent evaluation on the Cora Corpus to ensure a balanced and rigorous comparison, maintaining uniformity across embedding layers, normalization techniques, and Dropout strategies. Results indicate that the BiLSTM + CRF architecture outperforms its counterparts by adeptly handling the syntactic structures prevalent in bibliographic data, achieving an F1-Score of 0.96. This outcome highlights the necessity of aligning model architecture with the specific syntactic demands of bibliographic reference segmentation tasks. Consequently, the study establishes the BiLSTM + CRF model as a superior approach within the current state-of-the-art, offering a robust solution for the challenges faced in digital library management and scholarly communication.
引用
收藏
页数:24
相关论文
empty
未找到相关数据