Reference Metadata Extraction from Korean Research Papers

被引:0
作者
Seol, Jae-Wook [1 ]
Choi, Won-Jun [1 ]
Jeong, Hee-Seok [1 ]
Hwang, Hye-Kyong [1 ]
Yoon, Hwa-Mook [1 ]
机构
[1] Korea Inst Sci & Technol Informat, Seoul, South Korea
来源
MINING INTELLIGENCE AND KNOWLEDGE EXPLORATION, MIKE 2018 | 2018年 / 11308卷
关键词
Reference extraction; Metadata extraction; Conditional random fields;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A large amount of research papers are published in various fields and the ability to accurately extract metadata from a list of references is becoming increasingly important. Moreover, metadata extraction is crucial for measuring the influence of a particular study or researcher. However, it is difficult to automatically extract data from most lists of references because they consist of unstructured strings with bibliographies structured in various formats depending on the proceedings. Thus, this paper presents an effective and accurate method for extracting metadata, such as author name, title, publication year, volume, issue, page numbers, and journal name from heterogeneous references using the conditional random fields model. To conduct an experiment measuring the effectiveness of the proposed model, 1,415 references from 93 different academic papers published in Korea were used and a high accuracy of 97.10% was obtained.
引用
收藏
页码:42 / 52
页数:11
相关论文
共 12 条
[1]  
[Anonymous], 2001, PROC 18 INT C MACH L
[2]   FLUX-CiM: Flexible Unsupervised Extraction of Citation Metadata [J].
Cortez, Eli ;
da Silva, Altigran S. ;
Goncalves, Marcos Andre ;
Mesquita, Filipe ;
de Moura, Edleno S. .
PROCEEDINGS OF THE 7TH ACM/IEE JOINT CONFERENCE ON DIGITAL LIBRARIES: BUILDING & SUSTAINING THE DIGITAL ENVIRONMENT, 2007, :215-+
[3]  
Day MY, 2005, PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, P50
[4]  
Gao LC, 2009, ACM-IEEE J CONF DIG, P73
[5]  
Hetzner Erik, 2008, Joint Conference on Digital Libraries (JCDL 2008), P280, DOI 10.1145/1378889.1378937
[6]   A trigram hidden Markov model for metadata extraction from heterogeneous references [J].
Ojokoh, Bolanle ;
Zhang, Ming ;
Tang, Jian .
INFORMATION SCIENCES, 2011, 181 (09) :1538-1551
[7]   Evaluating temporal relations in clinical text: 2012 i2b2 Challenge [J].
Sun, Weiyi ;
Rumshisky, Anna ;
Uzuner, Ozlem .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (05) :806-813
[8]  
taku910.github, CRF
[9]   CERMINE - automatic extraction of metadata and references from scientific literature [J].
Tkaczyk, Dominika ;
Szostek, Pawel ;
Dendek, Piotr Jan ;
Fedoryszak, Mateusz ;
Bolikowski, Lukasz .
2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, :217-221
[10]  
Xiaoyu Tang, 2010, Proceedings 2010 IEEE 2nd Symposium on Web Society (SWS 2010), P346, DOI 10.1109/SWS.2010.5607427