Text segmentation based on document understanding for information retrieval

被引:0
作者
Prince, Violaine [1 ]
Labadie, Alexandre [1 ]
机构
[1] LIRMM, 161 Rue Ada, F-34392 Montpellier 5, France
来源
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS | 2007年 / 4592卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information retrieval needs to match relevant texts with a given query. Selecting appropriate parts is useful when documents are 4 long, and only portions are interesting to the user. In this paper, we 9 describe a method that extensively uses natural language techniques for text segmentation based on topic change detection. The method requires a NLP-parser and a semantic representation in Roget-based vectors. We have run the experiment on French documents, for which we have the appropriate tools, but the method could be transposed to any other language with the same requirements. The article sketches an overview of the NL understanding environment functionalities, and the algorithms related to our text segmentation method. An experiment in text segmentation is also presented and its result in an information retrieval task is shown.
引用
收藏
页码:295 / +
页数:3
相关论文
共 50 条
[31]   Text Retrieval from Document Images Based on Word Shape Analysis [J].
Chew Lim Tan ;
Weihua Huang ;
Sam Yuan Sung ;
Zhaohui Yu ;
Yi Xu .
Applied Intelligence, 2003, 18 :257-270
[32]   Text retrieval from document images based on word shape analysis [J].
Tan, CL ;
Huang, WH ;
Sung, SY ;
Yu, ZH ;
Xu, Y .
APPLIED INTELLIGENCE, 2003, 18 (03) :257-270
[33]   Research on Media Text Translation Based on Information Retrieval [J].
Zhang, Jiuquan ;
Meng, Yan .
INTERNATIONAL JOURNAL OF E-COLLABORATION, 2025, 21 (01)
[34]   CHINESE TEXT SEGMENTATION FOR TEXT RETRIEVAL - ACHIEVEMENTS AND PROBLEMS [J].
WU, ZM ;
TSENG, G .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1993, 44 (09) :532-542
[35]   Text Information Retrieval Based on Concept Semantic Similarity [J].
Lv, Gang ;
Zheng, Cheng ;
Zhang, Li .
2009 FIFTH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRID (SKG 2009), 2009, :356-+
[36]   Document Expansion for Text-Based Image Retrieval at CLEF 2009 [J].
Min, Jinming ;
Wilkins, Peter ;
Leveling, Johannes ;
Jones, Gareth J. F. .
MULTILINGUAL INFORMATION ACCESS EVALUATION II: MULTIMEDIA EXPERIMENTS, PT II, 2010, 6242 :172-176
[37]   TRIE: End-to-End Text Reading and Information Extraction for Document Understanding [J].
Zhang, Peng ;
Xu, Yunlu ;
Cheng, Zhanzhan ;
Pu, Shiliang ;
Lu, Jing ;
Qiao, Liang ;
Niu, Yi ;
Wu, Fei .
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :1413-1422
[38]   Integrating text retrieval and image retrieval in XML document searching [J].
Tjondronegoro, D. ;
Zhang, J. ;
Gu, J. ;
Nguyen, A. ;
Geva, S. .
ADVANCES IN XML INFORMATION RETRIEVAL AND EVALUATION, 2006, 3977 :511-524
[39]   Syllable-based Chinese text/spoken document retrieval using text/speech queries [J].
Bai, BR ;
Chen, BL ;
Wang, HM .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2000, 14 (05) :603-616
[40]   Research of Information Retrieval Based on Web Page Segmentation [J].
Yu, Yangxin .
PROGRESS IN INDUSTRIAL AND CIVIL ENGINEERING, PTS. 1-5, 2012, 204-208 :4928-4931