News Media Analysis Using Focused Crawl and Natural Language Processing: Case of Lithuanian News Websites

被引:0
作者
Krilavicius, Tomas [1 ]
Medelis, Zygimantas [2 ]
Kapociute-Dzikiene, Jurgita [1 ]
Zalandauskas, Tomas [1 ]
机构
[1] Balt Inst Adv Technol, Saultekio 15, Vilnius, Lithuania
[2] UAB Tokenmill, Kaunas, Lithuania
来源
INFORMATION AND SOFTWARE TECHNOLOGIES | 2012年 / 319卷
关键词
Information Retrieval; Natural Language Processing; stemming; focused crawl; Lithuanian language; MODELS;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The amount of information that is created, used or stored is growing exponentially and types of data sources are diverse. Most of it is available as an unstructured text. Moreover, considerable part of it is available on-line, usually accessible as Internet resources. It is too expensive or even impossible for humans to analyze all the resources for a required information. Classical Information Technology techniques are not sufficient to process such amounts of information and render it in a form convenient for further analysis. Information Retrieval (IR) and Natural Language Processing (NLP) provide a number of instruments for information analysis and retrieval. In this paper we present a combined application of NLP and IR for Lithuanian media analysis. We demonstrate that a combination of IR and NLP tools with appropriate changes can be successfully applied to Lithuanian media texts.
引用
收藏
页码:48 / +
页数:4
相关论文
共 39 条
[1]  
AbdelRahman S., 2010, INT J COMPUTER SCI I, V7, P27
[2]  
[Anonymous], 2007, TECHNICAL REPORT
[3]  
[Anonymous], 2011, Text Processing with GATE (Version 6)
[4]  
[Anonymous], 2008, Introduction to information retrieval
[5]  
[Anonymous], 1982, COLING 1982
[6]  
[Anonymous], 2000, P KDD WORKSHOP TEXT
[7]  
[Anonymous], 2012, LT LANG PACK
[8]  
[Anonymous], Snowball: A language for stemming algorithms
[9]  
Baeza-Yates R, 1999, MODERN INFORM RETRIE, V463
[10]  
Balcas J., 2012, THESIS UAB TOKENMILL