Web Mining for Open Source Intelligence

被引:1
作者
Best, Clive
机构
来源
PROCEEDINGS OF THE 12TH INTERNATIONAL INFORMATION VISUALISATION | 2008年
关键词
Web Mining; information extraction; multilinguality; media monitoring; visualisation;
D O I
10.1109/IV.2008.86
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web Mining for Open Source Intelligence is the retrieval, extraction and analysis of information from online Internet sites. There are two separate applications areas this paper will review, namely live news-monitoring and targeted topic based data mining. Most newspapers and news agencies have web sites with live updates on unfolding events, opinions and perspectives on world events. Most governments monitor news reports to feel the pulse of public opinion, and for early warning of emerging crises. The Joint Research Centre has developed significant experience in Internet content monitoring through its work on media monitoring (EMM) for the European Commission. EMM forms the core of the Commission's daily press monitoring service. Intelligence services and law enforcement agencies also require specific site monitoring and topic monitoring, and EMM technology has been applied to the udder Internet for this purpose. The software extracts and downloads all the textual content from monitored sites and applies information extraction techniques. These tools help analysts process large amounts of documents to derive structured data. Lastly the visualisation of the extracted data is important for analysts to identify patterns and trends derived from both news reports and web mining.
引用
收藏
页码:321 / 325
页数:5
相关论文
共 11 条
[1]  
[Anonymous], 22173 EUR EN EUR COM
[2]  
BEST C, 2007, OPEN SOURCE INTELLIG
[3]  
BEST C, 2005, GEOINFORMATION DISAS, P683
[4]  
BRUNO P, 2005, J CORELA COGNITION R
[5]  
BRUNO P, 2007, P INT C REC ADV NAT
[6]  
JAKUB P, 2007, LECT NOTES COMPUTER, V4439, P287
[7]  
POULIQUEN B, 2006, P 5 INT C LANG RES E, P53
[8]  
RALF S, 2004, INFORM SOC 2004 IS 2, P2
[9]  
Steinberger R, 2005, ITI 2005: Proceedings of the 27th International Conference on Information Technology Interfaces, P27
[10]  
Tanev H., 2007, P WORKSH MULT MULT I