XML Structural Similarity Search Using MapReduce

被引:0
|
作者
Yuan, Peisen [1 ,2 ]
Sha, Chaofeng [1 ,2 ]
Wang, Xiaoling [3 ]
Yang, Bin [1 ,2 ]
Zhou, Aoying [2 ,3 ]
Yang, Su [1 ,2 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[3] East China Normal Univ, Shanghai Key Lab Trustworthy Comp, Software Engn Inst, Shanghai, Peoples R China
来源
WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS | 2010年 / 6184卷
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
XML is a de-facto standard for web data exchange and information representation. Efficient management of these large volumes of XML data brings challenges to conventional technique. To cope with large scale data, MapReduce computing framework as an efficient solution has attracted more and more attention in the database community recently. In this paper, an efficient and scalable framework is proposed for XML structural similarity search on large cluster with MapReduce. First, sub-structures of XML structure are extracted from large XML corpus located on a large cluster in parallel. Then Min-Hashing and locality sensitive hashing techniques are developed on the distributed and parallel computing framework for efficient structural similarity search processing. An empirical study on the cluster with real large datasets demonstrates the effectiveness and efficiency of our approach.
引用
收藏
页码:169 / +
页数:3
相关论文
共 50 条
  • [31] Structural similarity evaluation between XML documents and DTDs
    Tekli, Joe
    Chbeir, Richard
    Yetongnon, Kokou
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2007, PROCEEDINGS, 2007, 4831 : 196 - 211
  • [32] Measuring the structural similarity among XML documents and DTDs
    Bertino, Elisa
    Guerrini, Giovanna
    Mesiti, Marco
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2008, 30 (01) : 55 - 92
  • [33] An XML world wide web search engine using approximate structural matching
    Hu, WC
    Zhong, Y
    Lin, WC
    Chen, JF
    WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL 1, PROCEEDINGS: INFORMATION SYSTEMS DEVELOPMENT, 2001, : 410 - 415
  • [34] Structural Generalizability: The Case of Similarity Search
    Chodpathumwan, Yodsawalai
    Termehchy, Arash
    Ramsey, Stephen A.
    Shrestha, Aayam
    Glen, Amy
    Liu, Zheng
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 326 - 338
  • [35] A framework for structural similarity search in proteins
    Zhang, Shi-Hua
    Liu, De-Gang
    Zhang, Xiang-Sun
    Operations Research and Its Applications, 2005, 5 : 297 - 307
  • [36] Metric Similarity Joins Using MapReduce (Extended abstract)
    Chen, Gang
    Yang, Keyu
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    Chen, Chun
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1787 - 1788
  • [37] Sentiment analysis using semantic similarity and Hadoop MapReduce
    Youness Madani
    Mohammed Erritali
    Jamaa Bengourram
    Knowledge and Information Systems, 2019, 59 : 413 - 436
  • [38] Fast filtering of structural similarity search using discovery of topological patterns
    Park, SH
    Ryu, KH
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 396 - 401
  • [39] A MapReduce Based Distributed Framework for Similarity Search in Healthcare Big Data Environment
    Sarma, Hiren K. D.
    Dwivedi, Yogesh K.
    Rana, Nripendra P.
    Slade, Emma L.
    OPEN AND BIG DATA MANAGEMENT AND INNOVATION, I3E 2015, 2015, 9373 : 173 - 182
  • [40] Sentiment analysis using semantic similarity and Hadoop MapReduce
    Madani, Youness
    Erritali, Mohammed
    Bengourram, Jamaa
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 59 (02) : 413 - 436