NetClone: Fast, Scalable, and Dynamic Request Cloning for Microsecond-Scale RPCs

被引:0
作者
Kim, Gyuyeong [1 ]
机构
[1] Sungshin Womens Univ, Seoul, South Korea
来源
PROCEEDINGS OF THE 2023 ACM SIGCOMM 2023 CONFERENCE, SIGCOMM 2023 | 2023年
基金
新加坡国家研究基金会;
关键词
Programmable switches; in-network computing; microsecond-scale RPCs; tail latency; CHOICES; POWER; TIME;
D O I
10.1145/3603269.3604820
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Spawning duplicate requests, called cloning, is a powerful technique to reduce tail latency by masking service-time variability. However, traditional client-based cloning is static and harmful to performance under high load, while a recent coordinator-based approach is slow and not scalable. Both approaches are insufficient to serve modern microsecond-scale Remote Procedure Calls (RPCs). To this end, we present NetClone, a request cloning system that performs cloning decisions dynamically within nanoseconds at scale. Rather than the client or the coordinator, NetClone performs request cloning in the network switch by leveraging the capability of programmable switch ASICs. Specifically, NetClone replicates requests based on server states and blocks redundant responses using request fingerprints in the switch data plane. To realize the idea while satisfying the strict hardware constraints, we address several technical challenges when designing a custom switch data plane. NetClone can be integrated with emerging innetwork request schedulers like RackSched. We implement a NetClone prototype with an Intel Tofino switch and a cluster of commodity servers. Our experimental results show that NetClone can improve the tail latency of microsecond-scale RPCs for synthetic and real-world application workloads and is robust to various system conditions.
引用
收藏
页码:195 / 207
页数:13
相关论文
共 44 条
[1]  
Ananthanarayanan Ganesh, 2013, Proceedings of NSDI '13: 10th USENIX Symposium on Networked Systems Design and Implementation. NSDI '13, P185
[2]  
[Anonymous], 2004, Linux J
[3]   Attack of the Killer Microseconds [J].
Barroso, Luiz ;
Marty, Mike ;
Patterson, David ;
Ranganathan, Parthasarathy .
COMMUNICATIONS OF THE ACM, 2017, 60 (04) :47-54
[4]   The IX Operating System: Combining Low Latency, High Throughput, and Efficiency in a Protected Dataplane [J].
Belay, Adam ;
Prekas, George ;
Primorac, Mia ;
Klimovic, Ana ;
Grossman, Samuel ;
Kozyrakis, Christos ;
Bugnion, Edouard .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2017, 34 (04)
[5]   Programming Protocol-Independent Packet Processors [J].
Bosshart, Pat ;
Daly, Dan ;
Gibb, Glen ;
Izzard, Martin ;
McKeown, Nick ;
Rexford, Jennifer ;
Schlesinger, Cole ;
Talayco, Dan ;
Vahdat, Amin ;
Varghese, George ;
Walker, David .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2014, 44 (03) :87-95
[6]   Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN [J].
Bosshart, Pat ;
Gibb, Glen ;
Kim, Hun-Seok ;
Varghese, George ;
McKeown, Nick ;
Izzard, Martin ;
Mujica, Fernando ;
Horowitz, Mark .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2013, 43 (04) :99-110
[7]  
Bramson M, 2010, PERF E R SI, V38, P275, DOI 10.1145/1811099.1811071
[8]  
Cho Inho, 2020, P USENIX OSDI US
[9]   dRMT: Disaggregated Programmable Switching [J].
Chole, Sharad ;
Fingerhut, Andy ;
Ma, Sha ;
Sivaraman, Anirudh ;
Vargaftik, Shay ;
Berger, Alon ;
Mendelson, Gal ;
Alizadeh, Mohammad ;
Chuang, Shang-Tse ;
Keslassy, Isaac ;
Orda, Ariel ;
Edsall, Tom .
SIGCOMM '17: PROCEEDINGS OF THE 2017 CONFERENCE OF THE ACM SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2017, :1-14
[10]  
Daglis Alexandros, 2019, P 24 INT C ARCH SUPP, P35