Toward Scalable Internet Traffic Measurement and Analysis with Hadoop

被引:1
作者
Lee, Yeonhee [1 ]
Lee, Youngseok [1 ]
机构
[1] Chungnam Natl Univ, Dept Comp Engn, Daejon, South Korea
关键词
Hadoop; Hive; MapReduce; NetFlow; pcap; packet; traffic measurement; analysis;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Internet traffic measurement and analysis has long been used to characterize network usage and user behaviors, but faces the problem of scalability under the explosive growth of Internet traffic and high-speed access. Scalable Internet traffic measurement and analysis is difficult because a large data set requires matching computing and storage resources. Hadoop, an open-source computing platform of MapReduce and a distributed file system, has become a popular infrastructure for massive data analytics because it facilitates scalable data processing and storage services on a distributed computing system consisting of commodity hardware. In this paper, we present a Hadoop-based traffic monitoring system that performs IP, TCP, HTTP, and NetFlow analysis of multi-terabytes of Internet traffic in a scalable manner. From experiments with a 200-node testbed, we achieved 14 Gbps throughput for 5 TB files with IP and HTTP-layer analysis MapReduce jobs. We also explain the performance issues related with traffic analysis MapReduce jobs.
引用
收藏
页码:6 / 13
页数:8
相关论文
共 18 条
[1]  
[Anonymous], SNORT LIGHTWEIGHT IN
[2]  
*CAIDA, CAIDA CORALREEF SOFT
[3]  
CHO K, 2008, OBSERVING SLOW CRUST
[4]  
*CISC, 2012, CICS VIS NETW IND FO
[5]  
*CNU, CNU PROJ TRAFF AN HA
[6]  
Dean J., 2004, MAPREDUCE SIMPLIFIED
[7]  
FINAMORE A, 2010, 8 INT C WIR WIR INT
[8]  
FULLMER M, 2000, OSU FLOW TOOLS PACKA
[9]  
FUSCO F, 2010, HIGH SPEED NETWORK T
[10]  
Ghemawat Sanjay, 2003, ACM SOSP