Terms Mining in Document-Based NoSQL: Response to Unstructured Data

被引:3
|
作者
Lomotey, Richard K. [1 ]
Deters, Ralph [1 ]
机构
[1] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK S7N 0W0, Canada
关键词
Unstructured Data Mining; Big Bata; Viterbi algorithm; Terms; NoSQL; Association Rules; classification; clustering;
D O I
10.1109/BigData.Congress.2014.99
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unstructured data mining has become topical recently due to the availability of high-dimensional and voluminous digital content (known as "Big Data") across the enterprise spectrum. The Relational Database Management Systems (RDBMS) have been employed over the past decades for content storage and management, but, the ever-growing heterogeneity in today's data calls for a new storage approach. Thus, the NoSQL database has emerged as the preferred storage facility nowadays since the facility supports unstructured data storage. This creates the need to explore efficient data mining techniques from such NoSQL systems since the available tools and frameworks which are designed for RDBMS are often not directly applicable. In this paper, we focused on topics and terms mining, based on clustering, in document-based NoSQL. This is achieved by adapting the architectural design of an analytics-as-a-service framework and the proposal of the Viterbi algorithm to enhance the accuracy of the terms classification in the system. The results from the pilot testing of our work show higher accuracy in comparison to some previously proposed techniques such as the parallel search.
引用
收藏
页码:661 / 668
页数:8
相关论文
共 50 条
  • [21] DOCUMENT-BASED DIRICHLET CLASS LANGUAGE MODEL FOR SPEECH RECOGNITION USING DOCUMENT-BASED N-GRAM EVENTS
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 42 - 47
  • [22] Flexible Access Control and Confidentiality over Encrypted Data for Document-based Database
    Almarwani, Maryam
    Konev, Boris
    Lisitsa, Alexei
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY (ICISSP), 2019, : 606 - 614
  • [23] Evaluating NoSQL document oriented data model
    Hashem, Hadi
    Ranc, Daniel
    2016 IEEE 4TH INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD WORKSHOPS (FICLOUDW), 2016, : 51 - 56
  • [24] A Document-based Data Model for Large Scale Computational Maritime Situational Awareness
    Cazzanti, Luca
    Millefiori, Leonardo M.
    Arcieri, Gianfranco
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1350 - 1356
  • [25] Document-based service platform for telemedicine applications
    Lahteenmaki, Jaakko
    Leppanen, Juha
    Kaijanranta, Hannu
    Nikus, Kjell
    Veijonen, Teppo
    Laakko, Timo
    Nummiaho, Antti
    VTT SYMPOSIUM ON SERVICE SCIENCE, TECHNOLOGY AND BUSINESS, 2008, 253 : 178 - +
  • [26] A CRM Model Based on Mining Unstructured Customers' Data
    Deng Shaoling
    Li Yan
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 11277 - 11279
  • [27] Performance Evaluation of Unstructured NoSQL data over distributed framework
    Nyati, Suyog S.
    Pawar, Shivanand
    Ingle, Rajesh
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 1623 - 1627
  • [28] A document-based framework for internet application control
    Hodes, TD
    Katz, RH
    USENIX ASSOCIATION PROCEEDINGS OF THE 2ND USENIX SYMPOSIUM ON INTERNET TECHNOLOGIES AND SYSTEMS (USITS'99), 1999, : 59 - 70
  • [29] A similarity query system for road traffic data based on a NoSQL document store
    Damaiyanti, Titus Irma
    Imawan, Ardi
    Indikawati, Fitri Indra
    Choi, Yoon-Ho
    Kwon, Joonho
    JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 127 : 28 - 51
  • [30] The Acquisition of Structured Clinical Data from a Document-Based Electronic Medical Record System
    Takeda, Toshihiro
    Zhang, Dongyao
    Wada, Shoya
    Nakagawa, Akito
    Sugimoto, Kento
    Manabe, Shirou
    Matsumura, Yasushi
    MEDINFO 2019: HEALTH AND WELLBEING E-NETWORKS FOR ALL, 2019, 264 : 1600 - 1601