Natural language processing for the Turkish Academic texts in the engineering field and development of a decision support system: the case of TUBITAK project proposals

被引:3
作者
Kat, Bora [1 ]
机构
[1] Sci & Technol Res Council Turkiye TUBITAK, TR-06530 Ankara, Turkiye
来源
JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY | 2023年 / 38卷 / 03期
关键词
Key term extraction; Feature extraction; Natural language processing; Supervised machine learning; Na?ve Bayes classifier; Conceptual similarity; Decision support system; IMPACT;
D O I
10.17341/gazimmfd.1132053
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Purpose:This study proposes a decision support system (as illustrated in Figure A) based on NLP applications and machine learning algorithm. Three modules (key term extraction, similarity detection and subfield assignment) are developed that would automatically index academic engineering documents, calculate their conceptual similarities and assign them to the most appropriate subfield over 31 subfields. Theory and Methods:Tailored preprocessing procedures are applied to the texts and the initial key terms are extracted. After a post-processing step, final versions of the term-frequency vectors are obtained. These vectors are used in the proposed similarity detection algorithm and as an input to the Naive Bayes classifiers.Results: The proposals submitted to TUBITAK Academic Research Funding Program Directorate (ARDEB) are analyzed as a case study. The results indicate that the proposed similarity algorithm correctly detects almost all of the revised proposals while the accuracy of the Naive Bayes classifier is more than 80% over a sample of 1255 proposals. The accuracy level exceeds 95% based on the best three predictions.Conclusion: NLP studies conducted in this study and the proposed algorithms are the first attempt to classify Turkish academic texts. Current study focuses on engineering; further studies on classifying other disciplines are needed. Moreover, the success of the machine learning in classification would pave the way for other applications such as reviewer identification.
引用
收藏
页码:1879 / 1892
页数:14
相关论文
共 39 条
[1]  
Arik S, 2015, LECT NOTES COMPUTER, V9489, P168
[2]  
Aydin G., 2021, FNRAT UNIVERSITESI M, V33, P22
[3]  
Aydin G., 2018, ARXIV PREPRINT ARXIV, P31
[4]  
Boynukalnn Z, 2012, EMOTION AN
[5]   The Impact of Data Preprocessing On the Performance of Naive Bayes Classifier [J].
Chandrasekar, Priyanga ;
Qian, Kai .
PROCEEDINGS 2016 IEEE 40TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE WORKSHOPS (COMPSAC), VOL 2, 2016, :618-619
[6]  
Chong M., 2010, P 4 INT PLAG C IPC 2
[7]  
Çoban Ö, 2015, SIG PROCESS COMMUN, P2388, DOI 10.1109/SIU.2015.7130362
[8]  
Deniz A, 2017, 2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), P655, DOI 10.1109/UBMK.2017.8093491
[9]  
Dharmadhikari S.C., 2011, ADV COMPUTING, V2, P14
[10]  
Gokcen H., 2014, J INFORM TECHNOLOGIE, V7, P25