An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data

被引:14
作者
Huang, Junsheng [1 ,2 ]
Mao, Baohua [1 ,2 ,3 ]
Bai, Yun [1 ,2 ]
Zhang, Tong [1 ,2 ]
Miao, Changjun [4 ]
机构
[1] Beijing Jiaotong Univ, Sch Traff & Transportat, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Key Lab Transport Ind Big Data Applicat Technol C, Beijing 100044, Peoples R China
[3] Beijing Jiaotong Univ, Integrated Transportat Res Ctr China, Beijing 100044, Peoples R China
[4] China Acad Railway Sci Corp Ltd, Signal & Commun Res Inst, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Intelligent Transportation System; missing values imputation; fuzzy C-means; genetic algorithm; EXPECTATION-MAXIMIZATION ALGORITHM; GENETIC ALGORITHM; REGRESSION; SELECTION; PREDICTION; VALUES;
D O I
10.3390/s20071992
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Various traffic-sensing technologies have been employed to facilitate traffic control. Due to certain factors, e.g., malfunctioning devices and artificial mistakes, missing values typically occur in the Intelligent Transportation System (ITS) sensing datasets, resulting in a decrease in the data quality. In this study, an integrated imputation algorithm based on fuzzy C-means (FCM) and the genetic algorithm (GA) is proposed to improve the accuracy of the estimated values. The GA is applied to optimize the parameter of the membership degree and the number of cluster centroids in the FCM model. An experimental test of the taxi global positioning system (GPS) data in Manhattan, New York City, is employed to demonstrate the effectiveness of the integrated imputation approach. Three evaluation criteria, the root mean squared error (RMSE), correlation coe fficient (R), and relative accuracy (RA), are used to verify the experimental results. Under the +/- 5% and +/- 10% thresholds, the average RAs obtained by the integrated imputation method are 0.576 and 0.785, which remain the highest among different methods, indicating that the integrated imputation method outperforms the history imputation method and the conventional FCM method. On the other hand, the clustering imputation performance with the Euclidean distance is better than that with the Manhattan distance. Thus, our proposed integrated imputation method can be employed to estimate the missing values in the daily traffic management.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Performance of the K-means and fuzzy C-means algorithms in big data analytics
    Salman Z.
    Alomary A.
    International Journal of Information Technology, 2024, 16 (1) : 465 - 470
  • [42] Fuzzy min-max neural networks for categorical data: application to missing data imputation
    Rey-del-Castillo, Pilar
    Cardenosa, Jesus
    NEURAL COMPUTING & APPLICATIONS, 2012, 21 (06) : 1349 - 1362
  • [43] Missing data imputation using decision trees and fuzzy clustering with iterative learning
    Nikfalazar, Sanaz
    Yeh, Chung-Hsing
    Bedingfield, Susan
    Khorshidi, Hadi A.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2020, 62 (06) : 2419 - 2437
  • [44] Audio Classification Using GA-Based Fuzzy C-Means
    Kang, Myeongsu
    Kim, Jong-Myon
    FRONTIER AND INNOVATION IN FUTURE COMPUTING AND COMMUNICATIONS, 2014, 301 : 393 - 400
  • [45] Implementation and Comparison of K-Means and Fuzzy C-Means Algorithms for Agricultural Data
    Shedthi, Shabari B.
    Shetty, Surendra
    Siddappa, M.
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2017, : 105 - 108
  • [46] Data clustering using eDE, an enhanced differential evolution algorithm with fuzzy c-means technique
    Ramadas, Meera
    Abraham, Ajith
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2018, 26 (02) : 867 - 881
  • [47] Imputation of Missing Data Using Fuzzy Neighborhood Density-Based Clustering
    Razavi-Far, Roozbeh
    Saif, Mehrdad
    2016 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2016, : 1834 - 1841
  • [48] Privacy-preserved data hiding using compressive sensing and fuzzy C-means clustering
    Li, Ming
    Wang, Lanlan
    Fan, Haiju
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2020, 16 (02)
  • [49] Fuzzy c-Means Clustering for Uncertain Data Using Quadratic Penalty-Vector Regularization
    Endo, Yasunori
    Hasegawa, Yasushi
    Yukihiro, Hamasuna
    Kanzawa, Yuchi
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2011, 15 (01) : 76 - 82
  • [50] A heuristics solution with fuzzy c-means method
    Watanabe, Hiroaki
    Li, Lei
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2007, 10 (04): : 387 - 395