A Wasserstein distance-based spectral clustering method for transaction data analysis

被引:0
作者
Zhu, Yingqiu [1 ]
Huang, Danyang [2 ,3 ,4 ]
Zhang, Bo [2 ,3 ]
机构
[1] Univ Int Business & Econ, Sch Stat, Huixin Dong St 10, Beijing 100029, Peoples R China
[2] Renmin Univ China, Ctr Appl Stat, Zhongguancun St 59, Beijing 100872, Peoples R China
[3] Renmin Univ China, Sch Stat, Zhongguancun St 59, Beijing 100872, Peoples R China
[4] Renmin Univ China, Innovat Platform, Zhongguancun St 59, Beijing 100872, Peoples R China
基金
中国国家自然科学基金;
关键词
Spectral clustering; Wasserstein distance; Empirical cumulative distribution function; Transaction data; SEGMENTATION; INEQUALITIES; ALGORITHMS;
D O I
10.1016/j.eswa.2024.125418
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of online payment platforms, it is now possible to record massive transaction data. Clustering on transaction data significantly contributes to analyzing merchants' behavior patterns. This enables payment platforms to provide differentiated services or implement risk management strategies. However, traditional methods exploit transactions by generating low-dimensional features, leading to inevitable information loss. In this study, we use the empirical cumulative distribution of transactions to characterize merchants. We adopt Wasserstein distance to measure the dissimilarity between any two merchants and propose the Wasserstein-distance-based spectral clustering (WSC) approach. Based on the similarities between merchants' transaction distributions, a graph of merchants is generated. Thus, we treat the clustering of merchants as a graph-cut problem and solve it under the framework of spectral clustering. To ensure feasibility of the proposed method on large-scale datasets with limited computational resources, we propose a subsampling method for WSC (SubWSC). The associated theoretical properties are investigated to verify the efficiency of the proposed approach. The simulations and empirical study demonstrate that the proposed method outperforms feature-based methods in finding behavior patterns of merchants.
引用
收藏
页数:21
相关论文
共 65 条
[1]   Optimal subsampling for large-scale quantile regression [J].
Ai, Mingyao ;
Wang, Fei ;
Yu, Jun ;
Zhang, Huiming .
JOURNAL OF COMPLEXITY, 2021, 62
[2]   Clustering and Community Detection With Imbalanced Clusters [J].
Aksoylar, Cem ;
Qian, Jing ;
Saligrama, Venkatesh .
IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2017, 3 (01) :61-76
[3]  
Alborzi Mahmood, 2016, International Journal of Business Information Systems, V23, P1, DOI 10.1504/ijbis.2016.078020
[4]  
Beygelzimer A., 2006, INT C MACHINE LEARNI, P97, DOI DOI 10.1145/1143844.1143857
[5]   One-Dimensional Empirical Measures, Order Statistics, and Kantorovich Transport Distances [J].
Bobkov, Sergey ;
Ledoux, Michel .
MEMOIRS OF THE AMERICAN MATHEMATICAL SOCIETY, 2019, 261 (1259) :1-+
[6]   Optimal selection for direct mail [J].
Bult, JR ;
Wansbeek, T .
MARKETING SCIENCE, 1995, 14 (04) :378-394
[7]   Intelligent value-based customer segmentation method for campaign management: A case study of automobile retailer [J].
Chan, Chu Chai Henry .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (04) :2754-2762
[8]  
Chen X., 2011, P 25 AAAI C ARTIFICI, P313
[9]   To establish online shoppers' markets and rules for dynamic CRM systems An empirical case study in Taiwan [J].
Chiang, Wen-Yu .
INTERNET RESEARCH, 2012, 22 (05) :613-625
[10]  
Chung F., 2006, AM MATH SOC