Information Extraction from Spam Emails using Stylistic and Semantic Features to Identify Spammers

被引:0
|
作者
Halder, Soma [1 ]
Tiwari, Richa [1 ]
Sprague, Alan [1 ]
机构
[1] Univ Alabama Birmingham, Birmingham, AL 35229 USA
来源
2011 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI) | 2011年
关键词
Spam; semantics; stylistics; natural language processing; IP address;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional anti spamming methods filter spam emails and prevent them from entering the inbox but take no measure to trace spammers and penalize them. We use natural language processing techniques to cluster spam emails from the same spammer based on the content and the style of the email. Spam emails from different sources are studied with features like stylistic, semantic and combination of both. Three sets of clustering are performed: clustering based on stylistic feature, clustering based on semantic feature and clustering based on combined feature. These clusters are then compared and evaluated. We notice that spam emails from the same sources have similarities and cluster together. These emails have URLs of the WebPages that the spammer is trying to promote. Clusters are mapped to the internet protocol (IP) of these URLs and the whois information of the IP addresses' help to get information about the source of spam.
引用
收藏
页码:104 / 107
页数:4
相关论文
共 38 条
  • [1] Semantic Structuring of and Information Extraction from Medical Documents Using the UMLS
    Denecke, K.
    METHODS OF INFORMATION IN MEDICINE, 2008, 47 (05) : 425 - 434
  • [2] Information Extraction Using Distant Supervision and Semantic Similarities
    Park, Youngmin
    Kang, Sangwoo
    Seo, Jungyun
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2016, 16 (01) : 11 - 18
  • [3] Information extraction using semantic patterns for populating clinical data models
    Meng, F
    Chen, AA
    Son, RY
    Taira, RK
    Churchill, BM
    Kangarloo, H
    METMBS '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES, 2004, : 10 - 16
  • [4] Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques
    Qiu, Qinjun
    Xie, Zhong
    Wu, Liang
    Tao, Liufeng
    EARTH SCIENCE INFORMATICS, 2020, 13 (04) : 1393 - 1410
  • [5] Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques
    Qinjun Qiu
    Zhong Xie
    Liang Wu
    Liufeng Tao
    Earth Science Informatics, 2020, 13 : 1393 - 1410
  • [6] Optimizing emotion–cause pair extraction task by using mutual assistance single-task model, clause position information and semantic features
    Jiawen Shi
    Hong Li
    Jiale Zhou
    Zhicheng Pang
    Chiyu Wang
    The Journal of Supercomputing, 2022, 78 : 4759 - 4778
  • [7] Temporal information extraction from mental health records to identify duration of untreated psychosis
    Viani, Natalia
    Kam, Joyce
    Yin, Lucia
    Bittar, Andre
    Dutta, Rina
    Patel, Rashmi
    Stewart, Robert
    Velupillai, Sumithra
    JOURNAL OF BIOMEDICAL SEMANTICS, 2020, 11 (01)
  • [8] Temporal information extraction from mental health records to identify duration of untreated psychosis
    Natalia Viani
    Joyce Kam
    Lucia Yin
    André Bittar
    Rina Dutta
    Rashmi Patel
    Robert Stewart
    Sumithra Velupillai
    Journal of Biomedical Semantics, 11
  • [9] Spatiotemporal and semantic information extraction from Web news reports about natural hazards
    Wang, Wei
    Stewart, Kathleen
    COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2015, 50 : 30 - 40
  • [10] Optimizing emotion-cause pair extraction task by using mutual assistance single-task model, clause position information and semantic features
    Shi, Jiawen
    Li, Hong
    Zhou, Jiale
    Pang, Zhicheng
    Wang, Chiyu
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (04) : 4759 - 4778