Data Mining from NoSQL Document-Append Style Storages

被引:1
作者
Lomotey, Richard K. [1 ]
Deters, Ralph [1 ]
机构
[1] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK S7N 0W0, Canada
来源
2014 IEEE 21ST INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS 2014) | 2014年
关键词
Data mining; NoSQL; Bayesian Rule; Unstructured data; Apriori; Big Data;
D O I
10.1109/ICWS.2014.62
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The modern data economy, which has been described as "Big Data", has changed the status quo on digital content creation and storage. While data storage has followed the schema-dictated approach for decades, the recent nature of digital content, which is widely unstructured, creates the need to adopt different storage techniques. Thus, the NoSQL database systems have been proposed to accommodate most of the content being generated today. One of such NoSQL databases that have received significant enterprise adoption is the document-append style storage. The emerging concern and challenge however is that, research and tools that can aid data mining processes from such NoSQL databases is generally lacking. Even though document-append style storages allow data accessibility as Web services and over URL/I, building a corresponding data mining tool deviates from the underlying techniques governing web crawlers. Also, existing data mining tools that have been designed for schema-based storages (e.g., RDBMS) are misfits. Hence, our goal in this work is to design a unique data analytics tool that enables knowledge discovery through information retrieval from document-append style storage. The tool is algorithmically built on the inference-based Apriori, which aids us to achieve optimization of the search duration. Preliminary test results of the proposed tool also show high accuracy in comparison to other approaches that were previously proposed.
引用
收藏
页码:385 / 392
页数:8
相关论文
共 16 条
  • [1] ABRAMOWICZ W., 2003, AUSTRALASIAN J INFOR, V11
  • [2] Agrawal R., P 20 INT C VERY LARG
  • [3] [Anonymous], B SWISS STAT SOC
  • [4] BALINSKY A., HELMHOLTZ PRINCIPLE
  • [5] Delgado M., 2002, Pattern Detection and Discovery. ESF Exploratory Workshop Proceedings (Lecture Notes in Artificial Intelligence Vol. 2447), P140
  • [6] Dey L., P 3 WORKSH AN NOIS U, P107
  • [7] FELDMAN R, 1998, P 2 INT C PRACT ASP
  • [8] GODBOLE S., P 19 ACM INT C INF K, P1189
  • [9] The Text-mining based PubChem Bioassay neighboring analysis
    Han, Lianyi
    Suzek, Tugba O.
    Wang, Yanli
    Bryant, Steve H.
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [10] HSU J. Y., 1997, AM ASS ARTIFICIAL IN