Information Extraction Based on Multi-turn Question Answering for Analyzing Korean Research Trends

被引：0

作者：

Jo, Seongung ^{[1
]}

Oh, Heung-Seon ^{[1
]}

Im, Sanghun ^{[1
]}

Kim, Gibaeg ^{[1
]}

Kim, Seonho ^{[2
]}

机构：

[1] KOREATECH, Sch Comp Sci & Engn, Cheonan 31253, South Korea

[2] Korea Inst Sci & Technol Informat KISTI, Daejeon 34141, South Korea

来源：

CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 74卷 / 02期

基金：

新加坡国家研究基金会;

关键词：

Natural language processing; information extraction; question answering; multi-turn; Korean research trends; INTELLIGENCE;

D O I：

10.32604/cmc.2023.031983

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Analyzing Research and Development (R&D) trends is important because it can influence future decisions regarding R&D direction. In typical trend analysis, topic or technology taxonomies are employed to compute the popularities of the topics or codes over time. Although it is simple and effective, the taxonomies are difficult to manage because new technologies are introduced rapidly. Therefore, recent studies exploit deep learning to extract pre-defined targets such as problems and solutions. Based on the recent advances in question answering (QA) using deep learning, we adopt a multi-turn QA model to extract problems and solutions from Korean R&D reports. With the previous research, we use the reports directly and analyze the difficulties in handling them using QA style on Information Extraction (IE) for sentence-level benchmark dataset. After investigating the characteristics of Korean R&D, we propose a model to deal with multiple and repeated appearances of targets in the reports. Accordingly, we propose a model that includes an algorithm with two novel modules and a prompt. A newly proposed methodology focuses on reformulating a question without a static template or pre-defined knowledge. We show the effectiveness of the proposed model using a Korean R&D report dataset that we constructed and presented an in-depth analysis of the benefits of the multi-turn QA model.

引用

页码：2967 / 2980

页数：14

共 29 条

[1] Deriving technology intelligence from patents: Preposition-based semantic analysis [J].

An, Jaehyeong ;

Kim, Kyuwoong ;

Mortara, Letizia ;

Lee, Sungjoo .

JOURNAL OF INFORMETRICS, 2018, 12 (01) :217-236

[2]

Bajaj P., 2016, PROC CEUR WORKSHOP P, V1773

[3]

Brown TB, 2020, ADV NEUR IN, V33

[4] A pattern-first pipeline approach for entity and relation extraction [J].

Chen, Zheng ;

Guo, Changyu .

NEUROCOMPUTING, 2022, 494 :182-191

[5]

Clark K, 2020, ICLR, DOI DOI 10.48550/ARXIV.2003.10555

[6]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[7]

Fisch A., 2019, P 2 WORKSH MACH READ, P1, DOI [DOI 10.18653/V1/2020.EMNLP-MAIN.687, 10.18653/v1/D19-5801, DOI 10.18653/V1/D19-5801]

[8] SpanBERT: Improving Pre-training by Representing and Predicting Spans [J].

Joshi, Mandar ;

Chen, Danqi ;

Liu, Yinhan ;

Weld, Daniel S. ;

Zettlemoyer, Luke ;

Levy, Omer .

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 :64-77

[9] TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension [J].

Joshi, Mandar ;

Choi, Eunsol ;

Weld, Daniel S. ;

Zettlemoyer, Luke .

PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :1601-1611

[10] Visualization of patent analysis for emerging technology [J].

Kim, Young Gil ;

Suh, Jong Hwan ;

Park, Sang Chan .

EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (03) :1804-1812

← 1 2 3 →