DENDROID: A text mining approach to analyzing and classifying code structures in Android malware families

被引:155
|
作者
Suarez-Tangil, Guillermo [1 ]
Tapiador, Juan E. [1 ]
Pens-Lopez, Pedro [1 ]
Blasco, Jorge [1 ]
机构
[1] Univ Carlos III Madrid, Dept Comp Sci, Comp Secur COSEC Lab, Madrid 28911, Spain
关键词
Malware analysis; Software similarity and classification; Text mining; Information retrieval; Smartphones; Android OS;
D O I
10.1016/j.eswa.2013.07.106
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid proliferation of smartphones over the last few years has come hand in hand with and impressive growth in the number and sophistication of malicious apps targetting smartphone users. The availability of reuse-oriented development methodologies and automated malware production tools makes exceedingly easy to produce new specimens. As a result, market operators and malware analysts are increasingly overwhelmed by the amount of newly discovered samples that must be analyzed. This situation has stimulated research in intelligent instruments to automate parts of the malware analysis process. In this paper, we introduce DENDROID, a system based on text mining and information retrieval techniques for this task. Our approach is motivated by a statistical analysis of the code structures found in a dataset of ANDROID OS malware families, which reveals some parallelisms with classical problems in those domains. We then adapt the standard Vector Space Model and reformulate the modelling process followed in text mining applications. This enables us to measure similarity between malware samples, which is then used to automatically classify them into families. We also investigate the application of hierarchical clustering over the feature vectors obtained for each malware family. The resulting dendo-grams resemble the so-called phylogenetic trees for biological species, allowing us to conjecture about evolutionary relationships among families. Our experimental results suggest that the approach is remarkably accurate and deals efficiently with large databases of malware instances. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1104 / 1117
页数:14
相关论文
共 42 条
  • [31] Creating and Analyzing Source Code Repository Models A Model-based Approach to Mining Software Repositories
    Scheidgen, Markus
    Smidt, Martin
    Fischer, Joachim
    MODELSWARD: PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON MODEL-DRIVEN ENGINEERING AND SOFTWARE DEVELOPMENT, 2017, : 329 - 336
  • [32] Analyzing Stakeholder's Response to Indian Government's EV Policy Through a Text Mining Approach
    Mukundan, R.
    Chaudhari, C. V.
    Dohale, V. D.
    Ambilkar, P. P.
    2019 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT (IEEM), 2019, : 711 - 715
  • [33] An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records
    Koshman, Varvara
    Funkner, Anastasia
    Kovalchuk, Sergey
    PHEALTH 2021, 2021, 285 : 94 - 99
  • [34] An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records
    Koshman, Varvara
    Funkner, Anastasia
    Kovalchuk, Sergey
    JOURNAL OF PERSONALIZED MEDICINE, 2022, 12 (01):
  • [35] A Text Mining-Based Approach for Analyzing Information Retrieval in Spanish: Music Data Collection as a Case Study
    Ramos-Gonzalez, Juan
    Martin-Gomez, Lucia
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2019, 801 : 259 - 266
  • [36] Analyzing VR Game User Experience by Genre: A Text-Mining Approach on Meta Quest Store Reviews
    Yoon, Dong-Min
    Han, Seung-Hyun
    Park, Inyoung
    Chung, Tae-Sung
    ELECTRONICS, 2024, 13 (19)
  • [37] An exploratory text-mining approach to analyzing DEI-related issues in eight leading architecture & design firms' publications
    Mohammed, Hassnaa
    More, Prathamesh Pravin
    Saudagar, Onkar Vishnu
    DESIGN JOURNAL, 2025,
  • [38] An Automated Text Mining Approach for Classifying Mental-Ill Health Incidents from Police Incident Logs for Data-Driven Intelligence
    Haleem, Muhammad Salman
    Han, Liangxiu
    Harding, Peter J.
    Ellison, Mark
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 2279 - 2284
  • [39] Core Values in Nursing Care Based on the Experiences of Nurses Engaged in Neonatal Nursing: A Text-mining Approach for Analyzing Reflection Records
    Watanabe, Hiromi
    Okuda, Reiko
    Hagino, Hiroshi
    YONAGO ACTA MEDICA, 2018, 61 (01) : 40 - 48
  • [40] Analyzing one-day tour trends during COVID 19 disruption - applying push and pull theory and text mining approach
    Roy, Gobinda
    Sharma, Swati
    TOURISM RECREATION RESEARCH, 2021, 46 (02) : 288 - 303