An Efficient Framework for Algorithmic Metadata Extraction over Scholarly Documents Using Deep Neural Networks

被引：0

作者：

Raghavendra Nayaka P. ^{[1
]}

Ranjan R. ^{[2
]}

机构：

[1] School of C& IT, REVA University, Karnataka, Bengaluru

[2] School of CSA, REVA University, Karnataka, Bengaluru

来源：

SN Computer Science | / 4卷 / 4期

关键词：

Algorithm extraction; Deep neural networks; Machine learning; Metadata extraction; Scholarly data;

D O I：

10.1007/s42979-023-01776-3

中图分类号：

学科分类号：

摘要：

The conventional text documents have made it possible to efficiently retrieve large amounts of text data with the development of various search engines. However, these traditional search approaches frequently have lower accuracy in retrieval, particularly when documents have certain characteristics that call for more in-depth semantic extraction. A search engine for algorithms called Algorithm Seer has recently been developed. The normal search engine collects the deep textual metadata and pseudo-codes from research papers. However, such a system is unable to accommodate user searches that attempt to identify algorithm-specific information, such as the data sets on which algorithms operate their effectiveness, runtime complication, etc. A number of improvements to the previously suggested algorithm search engine are given in this study. We provide various ways to identify automatically and extract pseudo-codes and phrases which transmit metadata utilizing various machine learning methods. Around the 89,000 text lines are used for conducting the experiments; we provided new properties to extract algorithmic pseudo-codes. These characteristics include feature groups with a focus on content, font style, and structure. Our suggested pseudo-code extraction method outperforms current strategies by 28% and obtains a 94.23% Classification Accuracy. Additionally, we suggest a technique for extracting phrases linked to algorithms utilizing deep neural networks, which achieves an 82% of accuracy compared to recent rule-based provides 23.5% and support vector machine provides 21.5%. © 2023, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.

引用

共 48 条

[1]

Al Zaidy R.A., Giles C.L., A machine learning approach for semantic structuring of scientific charts in scholarly documents, Twenty-Ninth IAAI Conference, (2017)

[2]

Altinel B., Ganiz M.C., Semantic text classification: a survey of past and recent advances, Inf Process Manage, 54, 6, pp. 1129-1153, (2018)

[3]

Ramanaidu S., Thompson N.R., Enhancing search: Events and their discourse context, International Conference on Intelligent Text Processing and Computational Linguistics, pp. 318-334, (2013)

[4]

Arshad N., Bakar A., Soroya S., Safder I., Haider S., Hassan S., Aljohani N., Alelyani S., Nawaz R., Extracting scientific trends by mining topics from Call for Paper, Library Hitch, (2019)

[5]

Azad H.K., Deepak A., Query expansion techniques for information retrieval: a survey, Inf Process Manage, 56, 5, pp. 1698-1735, (2019)

[6]

Al Zadran G.C.L., Extracting semantic relations for scholarly knowledge base construction, 2018 IEEE 12Th International Conference on Semantic Computing (ICSC), pp. 56-63, (2018)

[7]

Batista-Navarro R.T., Kontonatsios G., Mihaly C., Thompson P., Nawaz R., Mihaly L., Ramanaidu S., Facilitating the analysis of discourse phenomena in an interoperable NLP platform, International Conference on Intelligent Text Processing and Computational Linguistics, pp. 559-571, (2013)

[8]

Bakar A., Sederma H.U., Mining algorithmic complexity in full-text scholarly documents, ICADL Poster Proceedings the University of Waikato, (2018)

[9]

Bhatia'S M., Summarizing figures, tables, and algorithms in scientific publications to augment search results, Trans Inform Syst (TOIS), 30, 1, (2013)

[10]

Curves parathion for line graphs in scholarly documents, . In: Proceedings of the 16Th ACM/IEEE-CS on Joint Conference on Digital Libraries, pp. 277-278, (2016)

← 1 2 3 4 5 →