High-Performance Spatial Query Processing on Big Taxi Trip Data using GPGPUs

被引:10
作者
Zhang, Jianting [1 ]
You, Simin [2 ]
Gruenwald, Le [3 ]
机构
[1] CUNY, Dept Comp Sci, New York, NY 10021 USA
[2] CUNY, Grad Ctr, Dept Comp Sci, New York, NY USA
[3] Univ Oklahoma, Sch Comp Sci, Norman, OK 73019 USA
来源
2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS) | 2014年
关键词
High Performance; Spatial Query; Big Data; Taxi Trip; GPGPU;
D O I
10.1109/BigData.Congress.2014.20
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
City-wide GPS recorded taxi trip data contains rich information for traffic and travel analysis to facilitate transportation planning and urban studies. However, traditional data management techniques are largely incapable of processing big taxi trip data at the scale of hundreds of millions. In this study, we aim at utilizing the General Purpose computing on Graphics Processing Units (GPGPUs) technologies to speed up processing complex spatial queries on big taxi data on inexpensive commodity GPUs. By using the land use types of tax lot polygons as a proxy for trip purposes at the pickup and drop-off locations, we formulate a taxi trip data analysis problem as a large-scale nearest neighbor spatial query problem based on point-to-polygon distance. Experiments on nearly 170 million taxi trips in the New York City (NYC) in 2009 and 735,488 tax lot polygons with 4,698,986 vertices have demonstrated the efficiency of the proposed techniques: the GPU implementations is about 10-20X faster than the host system and completes the spatial query in about a minute by using a low-end workstation equipped with an Nvidia GTX Titan GPU device with a total equipment cost of below $2,000. We further discuss several interesting patterns discovered from the query results which warrant further study. The proposed approach can be an interesting alternative to traditional MapReduce/Hadoop based approaches to processing big data with respect to performance and cost.
引用
收藏
页码:72 / 79
页数:8
相关论文
共 50 条
  • [11] Incremental Query Processing on Big Data Streams
    Fegaras, Leonidas
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (11) : 2998 - 3012
  • [12] Contributions to High-Performance Big Data Computing
    Fox, Geoffrey
    Qiu, Judy
    Crandall, David
    Von Laszewski, Gregor
    Beckstein, Oliver
    Paden, John
    Paraskevakos, Ioannis
    Jha, Shantenu
    Wang, Fusheng
    Marathe, Madhav
    Vullikanti, Anil
    Cheatham, Thomas
    [J]. FUTURE TRENDS OF HPC IN A DISRUPTIVE SCENARIO, 2019, 34 : 34 - 81
  • [13] Multimedia processing using deep learning technologies, high-performance computing cloud resources, and Big Data volumes
    Mahmoudi, Sidi Ahmed
    Belarbi, Mohammed Amin
    Mahmoudi, Said
    Belalem, Ghalem
    Manneback, Pierre
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (17)
  • [14] An optimal framework for spatial query optimization using hadoop in big data analytics
    Dadheech P.
    Goyal D.
    Srivastava S.
    Kumar A.
    [J]. Recent Advances in Computer Science and Communications, 2020, 13 (06): : 1188 - 1198
  • [15] Improving Query Execution Performance in Big Data using Cuckoo Filter
    Mosharraf, Sharafat Ibn Mollah
    Adnan, Muhammad Abdullah
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1079 - 1084
  • [16] Big Data Trip Classification on the New York City Taxi and Uber Sensor Network
    Sun, Huiyu
    Hu, Siyuan
    McIntosh, Suzanne
    Cao, Yi
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2018, 19 (02): : 591 - 598
  • [17] A learned cost model for big data query processing
    Li, Yan
    Wang, Liwei
    Wang, Sheng
    Sun, Yuan
    Zheng, Bolong
    Peng, Zhiyong
    [J]. INFORMATION SCIENCES, 2024, 670
  • [18] Distributed Join Query Processing for Big RDF Data
    Elzein, Nahla Mohammed
    Majid, Mazlina Abdul
    Fakherldin, Mohammed
    Hashem, Ibrahim Abaker Targio
    [J]. ADVANCED SCIENCE LETTERS, 2018, 24 (10) : 7758 - 7761
  • [19] Approximate Query Processing for Big Data in Heterogeneous Databases
    Muniswamaiah, Manoj
    Agerwala, Tilak
    Tappert, Charles C.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5765 - 5767
  • [20] Query Performance Analysis of NoSQL and Big Data
    Samanta, Ashis Kumar
    Sarkar, Bidut Biman
    Chaki, Nabendu
    [J]. 2018 FOURTH IEEE INTERNATIONAL CONFERENCE ON RESEARCH IN COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (ICRCICN), 2018, : 237 - 241