Accelerating Natural Language Processing for Applications in Pharmaceutical Research

被引：0

作者：

Torfs, Bert ^{[1
]}

机构：

[1] Johnson & Johnson PRD, RnDIT, Syst Engn Technol Off, B-2340 Beerse, Belgium

来源：

WMSCI 2008: 12TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL I, PROCEEDINGS | 2008年

关键词：

NLP; Natural Language Processing; GRID; Indexing; Fact Extraction;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Extracting facts from text documents using NLP (Natural Language Processing) techniques is frequently used in enterprizes all over the world. This Study describes a technique for accelerating the NLP operations using a computer GRID. The attempt was made to tune this acceleration for a specific use case; namely to enable Pharmaceutical scientists with an ad hoc fact extraction, without the need for additional IT resources. The NLP operations, POS (Parts of Speech) tagging and NP (Noun Phrase) chunking, are followed by an indexing step and ends with a search step, that takes a query as input. The NLP operations for a certain (limited) set of input documents are executed in a node of the GRID. The index is created node-local and the search is performed over that index, compared to other approaches that index the complete corpus first and then start a search step. The main challenges were to chain the different NLP operations together, making use of open source packages, and to submit the NLP operations on the nodes of the GRID. It is seen that this approach accelerates the NLP operations and that it enables fact extraction from a continuously growing corpus.

引用

页码：145 / 149

页数：5