Automatic Detection of Sources and Sinks in Arbitrary Java']Java Libraries

被引:4
作者
Sas, Darius [1 ]
Bessi, Marco [2 ]
Fontana, Francesca Arcelli [1 ]
机构
[1] Univ Milano Bicocca, Dipartimento Informat Sistemist & Comunicaz, Milan, Italy
[2] CAST Software Italia, Milan, Italy
来源
2018 IEEE 18TH INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM) | 2018年
关键词
!text type='Java']Java[!/text; Static Analysis; Sources; Sink; Machine Learning;
D O I
10.1109/SCAM.2018.00019
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In the last decade, data security has become a primary concern for an increasing amount of companies around the world. Protecting the customer's privacy is now at the core of many businesses operating in any kind of market. Thus, the demand for new technologies to safeguard user data and prevent data breaches has increased accordingly. In this work, we investigate a machine learning-based approach to automatically extract sources and sinks from arbitrary Java libraries. Our method exploits several different features based on semantic, syntactic, intra-procedural dataflow and class-hierarchy traits embedded into the bytecode to distinguish sources and sinks. The performed experiments show that, under certain conditions and after some preprocessing, sources and sinks across different libraries share common characteristics that allow a machine learning model to distinguish them from the other library methods. The prototype model achieved remarkable results of 86% accuracy and 81% F-measure on our validation set of roughly 600 methods.
引用
收藏
页码:103 / 112
页数:10
相关论文
共 19 条
  • [1] AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION
    ALTMAN, NS
    [J]. AMERICAN STATISTICIAN, 1992, 46 (03) : 175 - 185
  • [2] [Anonymous], 2009, SIGKDD Explorations, DOI DOI 10.1145/1656274.1656278
  • [3] [Anonymous], 2012, TRUST TRUSTWORTHY CO
  • [4] Arzt S, 2014, ACM SIGPLAN NOTICES, V49, P259, DOI [10.1145/2594291.2594299, 10.1145/2666356.2594299]
  • [5] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
  • [8] LATTICE MODEL OF SECURE INFORMATION-FLOW
    DENNING, DE
    [J]. COMMUNICATIONS OF THE ACM, 1976, 19 (05) : 236 - 243
  • [9] Enck W., 2010, Communications of the ACM, V10, P1, DOI DOI 10.1145/2494522
  • [10] Greedy function approximation: A gradient boosting machine
    Friedman, JH
    [J]. ANNALS OF STATISTICS, 2001, 29 (05) : 1189 - 1232