Is Encrypted ClientHello a Challenge for Traffic Classification?

被引:9
作者
Shamsimukhametov, Danil [1 ,2 ]
Kurapov, Anton [1 ,2 ]
Liubogoshchev, Mikhail [1 ,2 ]
Khorov, Evgeny [1 ]
机构
[1] Russian Acad Sci, Inst Informat Transmiss Problems, Wireless Networks Lab, Moscow 127051, Russia
[2] Moscow Inst Phys & Technol, Sch Radio Engn & Comp Technol, Dolgoprudnyi 141701, Russia
基金
俄罗斯科学基金会;
关键词
Protocols; Servers; Cryptography; Classification algorithms; Security; Quality of service; Encryption; TLS; encrypted ClientHello; encrypted SNI; encrypted traffic classification; neural networks; Random Forest;
D O I
10.1109/ACCESS.2022.3191431
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although the widely-used Transport Layer Security (TLS) protocol hides application data, an unencrypted part of the TLS handshake, specifically the server name indication (SNI), is a backdoor for encrypted traffic classification frameworks. The recently developed Encrypted ClientHello (ECH) amendment to the TLS protocol aims to protect the privacy-sensitive content of the ClientHello message, including SNI. Conversely, ECH can be a game-changer in the early detection of encrypted traffic. The paper shows that the performance of the state-of-the-art traffic classification algorithms degrades significantly with the introduction of the ECH. Hence, novel approaches to real-time traffic classification are required. The paper develops two novel traffic classification algorithms to address this challenge. The first one uses unencrypted bytes of the TLS Hello messages as independent features of the Random Forest algorithm. It is extremely lightweight and suits throughput-focused traffic classification. It is faster than state-of-the-art algorithms by three times and achieves higher classification quality. The second algorithm augments the approach of the first one by focusing on the particular metadata of the handshake. This way, it efficiently extracts data from the exchange and achieves the highest classification quality in all the considered scenarios. It has a three times lower error rate than state-of-the-art algorithms and provides a reliable classification of ECH traffic.
引用
收藏
页码:77883 / 77897
页数:15
相关论文
共 73 条
  • [1] Anderson B., 2017, P IEEE C COMM NETW S, P1
  • [2] TLS Beyond the Browser: Combining End Host and Network Data to Understand Application Behavior
    Anderson, Blake
    McGrew, David
    [J]. IMC'19: PROCEEDINGS OF THE 2019 ACM INTERNET MEASUREMENT CONFERENCE, 2019, : 379 - 392
  • [3] Deciphering malware's use of TLS (without decryption)
    Anderson, Blake
    Paul, Subharthi
    McGrew, David
    [J]. JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2018, 14 (03): : 195 - 211
  • [4] Machine Learning for Encrypted Malware Traffic Classification: Accounting for Noisy Labels and Non-Stationarity
    Anderson, Blake
    McGrew, David
    [J]. KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 1723 - 1732
  • [5] Identifying Encrypted Malware Traffic with Contextual Flow Data
    Anderson, Blake
    McGrew, David
    [J]. AISEC'16: PROCEEDINGS OF THE 2016 ACM WORKSHOP ON ARTIFICIAL INTELLIGENCE AND SECURITY, 2016, : 35 - 46
  • [6] [Anonymous], 2018, 8472 RFC
  • [7] [Anonymous], Https encryption on the web
  • [8] [Anonymous], 2021, SPLITCAP
  • [9] [Anonymous], 2020, SCAPY PACKET CRAFTIN
  • [10] [Anonymous], ISCX VPN-nonVPN Encrypted Network Traffic Dataset