Model-Driven Development of Web APIs to Access Integrated Tabular Open Data

被引:4
作者
Gonzalez-Mora, Cesar [1 ]
Tomas, David [1 ]
Garrigos, Irene [1 ]
Zubcoff, Jose Jacobo [2 ]
Mazon, Jose-Norberto [1 ]
机构
[1] Univ Alicante, Dept Software & Comp Syst, Alicante 03690, Spain
[2] Univ Alicante, Dept Sea Sci & Appl Biol, Alicante 03690, Spain
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Data integration; join; union; open data; data access; Web APIs; word embeddings;
D O I
10.1109/ACCESS.2020.3036462
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
More and more governments around the world are publishing tabular open data, mainly in formats such as CSV or XLS(X). These datasets are mostly individually published, i.e. each publisher exposes its data on the Web without considering potential relationships with other datasets (from its own or from other publishers). As a result, reusing several open datasets together is not a trivial task, thus requiring mechanisms that allow data consumers (as software developers or data scientists) to integrate and access tabular open data published on the Web. In this paper, we propose a model-driven approach to automatically generate Web APIs that homogeneously access multiple integrated tabular open datasets. This work focuses on data that can be integrated by means of join and union operations. As a first step, our approach detects unionable and joinable tabular open data by using a table similarity measure based on word embeddings. Then, an APIfication process is developed to create APIs that access the previously integrated datasets through a single endpoint. A running example is presented throughout the article, as well as a set of experiments for performance evaluation to show the feasibility of our approach.
引用
收藏
页码:202669 / 202686
页数:18
相关论文
共 45 条
[1]   The Seattle Report on Database Research [J].
Abadi, Daniel ;
Ailamaki, Anastasia ;
Andersen, David ;
Bailis, Peter ;
Balazinska, Magdalena ;
Bernstein, Philip ;
Boncz, Peter ;
Chaudhuri, Surajit ;
Cheung, Alvin ;
Doan, AnHai ;
Dong, Luna ;
Franklin, Michael J. ;
Freire, Juliana ;
Halevy, Alon ;
Hellerstein, Joseph M. ;
Idreos, Stratos ;
Kossmann, Donald ;
Kraska, Tim ;
Krishnamurthy, Sailesh ;
Markl, Volker ;
Melnik, Sergey ;
Milo, Tova ;
Mohan, C. ;
Neumann, Thomas ;
Ooi, Beng Chin ;
Ozcan, Fatma ;
Patel, Jignesh ;
Pavlo, Andrew ;
Popa, Raluca ;
Ramakrishnan, Raghu ;
Re, Christopher ;
Stonebraker, Michael ;
Suciu, Dan .
SIGMOD RECORD, 2019, 48 (04) :44-53
[2]   Motivations for open data adoption: An institutional theory perspective [J].
Altayar, Mohammed Saleh .
GOVERNMENT INFORMATION QUARTERLY, 2018, 35 (04) :633-643
[3]  
[Anonymous], 2019, OP DAT MAT REP
[4]  
[Anonymous], 2013, NIPS
[5]  
[Anonymous], 2014, CEUR WORKSHOP PROC
[6]  
[Anonymous], 2012, SIGMOD C
[7]   A framework for annotating CSV-like data [J].
Arenas, Marcelo ;
Maturana, Francisco ;
Riveros, Cristian ;
Vrgoc, Domagoj .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (11) :876-887
[8]   TabEL: Entity Linking in Web Tables [J].
Bhagavatula, Chandra Sekhar ;
Noraset, Thanapon ;
Downey, Doug .
SEMANTIC WEB - ISWC 2015, PT I, 2015, 9366 :425-441
[9]   Linked Data - The Story So Far [J].
Bizer, Christian ;
Heath, Tom ;
Berners-Lee, Tim .
INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2009, 5 (03) :1-22
[10]  
Bojanowski P., 2017, Transactions of the Association for Computational Linguistics, V5, P135, DOI [10.1162/tacla00051, DOI 10.1162/TACL_A_00051, DOI 10.1162/TACLA00051]