An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data

被引:14
|
作者
Huang, Junsheng [1 ,2 ]
Mao, Baohua [1 ,2 ,3 ]
Bai, Yun [1 ,2 ]
Zhang, Tong [1 ,2 ]
Miao, Changjun [4 ]
机构
[1] Beijing Jiaotong Univ, Sch Traff & Transportat, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Key Lab Transport Ind Big Data Applicat Technol C, Beijing 100044, Peoples R China
[3] Beijing Jiaotong Univ, Integrated Transportat Res Ctr China, Beijing 100044, Peoples R China
[4] China Acad Railway Sci Corp Ltd, Signal & Commun Res Inst, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Intelligent Transportation System; missing values imputation; fuzzy C-means; genetic algorithm; EXPECTATION-MAXIMIZATION ALGORITHM; GENETIC ALGORITHM; REGRESSION; SELECTION; PREDICTION; VALUES;
D O I
10.3390/s20071992
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Various traffic-sensing technologies have been employed to facilitate traffic control. Due to certain factors, e.g., malfunctioning devices and artificial mistakes, missing values typically occur in the Intelligent Transportation System (ITS) sensing datasets, resulting in a decrease in the data quality. In this study, an integrated imputation algorithm based on fuzzy C-means (FCM) and the genetic algorithm (GA) is proposed to improve the accuracy of the estimated values. The GA is applied to optimize the parameter of the membership degree and the number of cluster centroids in the FCM model. An experimental test of the taxi global positioning system (GPS) data in Manhattan, New York City, is employed to demonstrate the effectiveness of the integrated imputation approach. Three evaluation criteria, the root mean squared error (RMSE), correlation coe fficient (R), and relative accuracy (RA), are used to verify the experimental results. Under the +/- 5% and +/- 10% thresholds, the average RAs obtained by the integrated imputation method are 0.576 and 0.785, which remain the highest among different methods, indicating that the integrated imputation method outperforms the history imputation method and the conventional FCM method. On the other hand, the clustering imputation performance with the Euclidean distance is better than that with the Manhattan distance. Thus, our proposed integrated imputation method can be employed to estimate the missing values in the daily traffic management.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] A novel model to optimize multiple imputation algorithm for missing data using evolution methods
    Mohammed, Yasser Salaheldin
    Abdelkader, Hatem
    Plawiak, Pawel
    Hammad, Mohamed
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 76
  • [32] Iterative Fuzzy C Means, Fuzzy Silhouette, and Imputation for Missing Values in a Dataset
    Mausor, Farahida Hanim
    Jaafar, Jafreezal
    Taib, Shakirah Mohd
    Razali, Razulaimi
    2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTING (ICOCO), 2021, : 382 - 385
  • [33] DATA STREAM UNSUPERVISED PARTITIONING BASED ON OPTIMIZED FUZZY C-MEANS
    Wang, Yuding
    Chehdi, Kacem
    Cariou, Claude
    Vozel, Benoit
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 7265 - 7268
  • [34] Incremental Missing-Data Imputation for Evolving Fuzzy Granular Prediction
    Garcia, Cristiano
    Leite, Daniel
    Skrjanc, Igor
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2020, 28 (10) : 2348 - 2362
  • [35] A size-insensitive integrity-based fuzzy c-means method for data clustering
    Lin, Phen-Lan
    Huang, Po-Whei
    Kuo, C. H.
    Lai, Y. H.
    PATTERN RECOGNITION, 2014, 47 (05) : 2042 - 2056
  • [36] A knowledge mining method for continuous data based on fuzzy C-means clustering and rough sets
    Xu, Xi
    Yao, Qionghui
    Shi, Min
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 5846 - 5849
  • [37] Fuzzy C-means based clustering for linearly and nonlinearly separable data
    Tsai, Du-Ming
    Lin, Chung-Chan
    PATTERN RECOGNITION, 2011, 44 (08) : 1750 - 1760
  • [38] Dealing with Distances and Transformations for Fuzzy C-Means Clustering of Compositional Data
    Palarea-Albaladejo, Javier
    Antoni Martin-Fernandez, Josep
    Soto, Jesus A.
    JOURNAL OF CLASSIFICATION, 2012, 29 (02) : 144 - 169
  • [39] Fuzzy c-means algorithms for data with tolerance based on opposite criterions
    Kanzawa, Yuchi
    Endo, Yasunori
    Miyamoto, Sadaaki
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2007, E90A (10) : 2194 - 2202
  • [40] Fuzzy C-Means for Fraud Detection in Large Transaction Data Sets
    Carlsson, Christer
    Heikkila, Markku
    Wang, Xiaolu
    2018 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2018,