A Machine Learning Based Natural Language Question and Answering System for Healthcare Data Search using Complex Queries

被引:0
作者
Yeo, Hangu [1 ]
机构
[1] IBM TJ Watson Res, Dept Next Generat Applicat, Yorktown Hts, NY 10598 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
Big Data; machine learning; healthcare; natural language processing; complex query; query decomposition; multiclass classification;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Number of use cases in healthcare are well suited as Big Data applications. In healthcare, large volumes of data are coming in and stored as unstructured big data or as structured data in relational database. In any case, Big Data is coming to embrace SQL as a common tool for querying. Developing a question and answering tool for the users that are lack of specialized skillsets and use natural languages for complex queries is a challenge that need to identify significant details, draw inferences and evaluate hypothesis as how domain experts do those. Although NLIDB systems are developed to translate a natural language queries into a database language for non-technical end users, most of the questions addressed by the systems are factoid questions and answering complex queries remains as an open research problem. The proposed auxiliary system is machine learning based and extends existing NLIDB system to help it answer the complex queries. The auxiliary system mimics the way human experts reach the answers to the complex queries. Instead of building a set of simple conditional statements as rules and invoke them as a sequence of chained actions, the proposed system decomposes complex queries into multiple simple factoid sub-queries with the goal of generating answers to each sub-query with the existing NLIDB system from the data explicitly stored in the database. The underlying NLIDB system takes sub-queries as input queries in parallel and produces query results from the data stored in the relational database. The answers to the sub-queries and the desired output labels are used to train the model and the multiclass classifier produced from the training is used to predict and answer valid input queries.
引用
收藏
页码:2467 / 2474
页数:8
相关论文
共 13 条
[1]  
Al-Harbi Omar, 2012, INT J COMPUTER SCI I, V9
[2]  
Androutsopoulos I., 1994, J NATURAL LANGUAGE E
[3]  
[Anonymous], 1998, Introduction to expert systems
[4]  
Hajishirzi Hannaneh, 2012, P 25 INT FLOR ART IN
[5]   Semantic Decomposition for Question Answering [J].
Hartrumpf, Sven .
ECAI 2008, PROCEEDINGS, 2008, 178 :313-+
[6]  
HIRST G, 1988, J ARTIFICIAL INTELLI, V34, P131
[7]  
Katz B., 2005, P AAAI 2005 WORKSHOP, P35
[8]  
Lacatusu Finley, 2006, P LANG RES EV C LREC
[9]  
Lally Adam, 2014, RC25489 IBM TJ WATS
[10]   Constructing an Interactive Natural Language Interface for Relational Databases [J].
Li, Fei ;
Jagadish, H. V. .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (01) :73-84