E-mail Address Categorization based on Semantics of Surnames

被引:0
|
作者
Veluru, Suresh [1 ]
Rahulamathavan, Yogachandran [1 ]
Viswanath, P.
Longley, Paul [2 ]
Rajarajan, Muttukrishnan [1 ]
机构
[1] City Univ London, Sch Engn & Math Sci, Informat Secur Grp, London EC1V 0HB, England
[2] UCL, Dept Geog, London, England
来源
2013 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING (CIDM) | 2013年
基金
英国工程与自然科学研究理事会;
关键词
Vector space model; latent semantic analysis; surnames; average link clustering method; suffix tree;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Surname (family name) analysis is used in geography to understand population origins, migration, identity, social norms and cultural customs. Some of these are supposedly evolved over generations. Surnames exhibit good statistical properties that can be used to extract information in names data set such as automatic detection of ethnic or community groups in names. An e-mail address, often contains surname as a substring. This containment may be full or partial. An e-mail address categorization based on semantics of surnames is the objective of this paper. This is achieved in two phases. First phase deals with surname representation and clustering. Here, a vector space model is proposed where latent semantic analysis is performed. Clustering is done using the method called average-linkage method. In the second phase, an email is categorized as belonging to one of the categories (discovered in first phase). For this, substring matching is required, which is done in an efficient way by using suffix tree data structure. We perform experimental evaluation for the 500 most frequently occurring surnames in India and United Kingdom. Also, we categorize the e-mail addresses that have these surnames as substrings.
引用
收藏
页码:222 / 229
页数:8
相关论文
共 6 条
  • [1] Project evaluation by e-mail communication pattern
    Noda, Jugo
    Saga, Ryosuke
    Tsuji, Hiroshi
    HUMAN-COMPUTER INTERACTION, PT 4, PROCEEDINGS: HCI APPLICATIONS AND SERVICES, 2007, 4553 : 702 - +
  • [2] Prediction of reply to question e-mail from the mail written before
    Ishigaki, K
    Muraoka, Y
    CPSN '05: Proceedings of the 2005 International Conference on Computers for People with Special Needs, 2005, : 84 - 87
  • [3] An Exploratory Study into Automated Real-Time Categorisation of Engineering E-Mail
    Gopsill, James A.
    Payne, Stephen J.
    Hicks, Ben J.
    2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 4806 - 4811
  • [4] A Novel Spam Classification System for E-Mail Using a Gradient Fuzzy Guideline-Based Spam Classifier (GFGSC)
    Subramaniam, Vinoth Narayanan Arumugam
    Annamalai, Rajesh
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2023, 20 (03) : 398 - 406
  • [5] Plagiarism detection in students’ programming assignments based on semantics: multimedia e-learning based smart assessment methodology
    Farhan Ullah
    Junfeng Wang
    Muhammad Farhan
    Sohail Jabbar
    Zhiming Wu
    Shehzad Khalid
    Multimedia Tools and Applications, 2020, 79 : 8581 - 8598
  • [6] Plagiarism detection in students' programming assignments based on semantics: multimedia e-learning based smart assessment methodology
    Ullah, Farhan
    Wang, Junfeng
    Farhan, Muhammad
    Jabbar, Sohail
    Wu, Zhiming
    Khalid, Shehzad
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (13-14) : 8581 - 8598