Terms Mining in Document-Based NoSQL: Response to Unstructured Data

被引:3
作者
Lomotey, Richard K. [1 ]
Deters, Ralph [1 ]
机构
[1] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK S7N 0W0, Canada
来源
2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS) | 2014年
关键词
Unstructured Data Mining; Big Bata; Viterbi algorithm; Terms; NoSQL; Association Rules; classification; clustering;
D O I
10.1109/BigData.Congress.2014.99
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unstructured data mining has become topical recently due to the availability of high-dimensional and voluminous digital content (known as "Big Data") across the enterprise spectrum. The Relational Database Management Systems (RDBMS) have been employed over the past decades for content storage and management, but, the ever-growing heterogeneity in today's data calls for a new storage approach. Thus, the NoSQL database has emerged as the preferred storage facility nowadays since the facility supports unstructured data storage. This creates the need to explore efficient data mining techniques from such NoSQL systems since the available tools and frameworks which are designed for RDBMS are often not directly applicable. In this paper, we focused on topics and terms mining, based on clustering, in document-based NoSQL. This is achieved by adapting the architectural design of an analytics-as-a-service framework and the proposal of the Viterbi algorithm to enhance the accuracy of the terms classification in the system. The results from the pilot testing of our work show higher accuracy in comparison to some previously proposed techniques such as the parallel search.
引用
收藏
页码:661 / 668
页数:8
相关论文
共 50 条
  • [41] Research and Application of Large Data Query Technology Based on NoSQL Database
    Yin Xiaoqin
    Luo Qiqiang
    PROCEEDINGS OF THE 2018 3RD INTERNATIONAL WORKSHOP ON MATERIALS ENGINEERING AND COMPUTER SCIENCES (IWMECS 2018), 2018, 78 : 202 - 207
  • [42] Extracting OLAP Cubes From Document-Oriented NoSQL Database Based on Parallel Similarity Algorithms
    Davardoost, Farnaz
    Babazadeh Sangar, Amin
    Majidzadeh, Kambiz
    CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING-REVUE CANADIENNE DE GENIE ELECTRIQUE ET INFORMATIQUE, 2020, 43 (02): : 111 - 118
  • [43] BUSINESS DEMANDS FOR PROCESSING UNSTRUCTURED TEXTUAL DATA - TEXT MINING TECHNIQUES FOR COMPANIES TO IMPLEMENT
    Zhecheva, Denitsa
    Nenkov, Nayden
    ACCESS-ACCESS TO SCIENCE BUSINESS INNOVATION IN THE DIGITAL ECONOMY, 2022, 3 (02): : 107 - 120
  • [44] High-performance data mining with skeleton-based structured parallel programming
    Coppola, M
    Vanneschi, M
    PARALLEL COMPUTING, 2002, 28 (05) : 793 - 813
  • [45] An analysis model of financial statements based on data mining
    Li Yanhong
    Liu Peng
    Qin Zheng
    2006 3RD INTERNATIONAL IEEE CONFERENCE INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2006, : 830 - 833
  • [46] Research of Data Graph Mining based on Telecommunication Customers
    Gao, Shang
    Li, Meimei
    COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION III, 2014, 443 : 402 - +
  • [47] Performance Analysis of Data Mining Algorithms Based on PCA
    Bai, Ruifeng
    Wang, Jie
    Yang, Lin
    Pan, Jingchang
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON MECHATRONICS, ELECTRONIC, INDUSTRIAL AND CONTROL ENGINEERING, 2015, 8 : 1506 - 1509
  • [48] Study of data mining based machinery fault diagnosis
    Jiang, D
    Huang, ST
    Lei, WP
    Shi, JY
    2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 536 - 539
  • [49] Data Mining Techniques Contributions to Support Electrical Vehicle Demand Response
    Soares, Joao
    Ramos, Sergio
    Vale, Zita
    Morais, Hugo
    Faria, Pedro
    2012 IEEE PES TRANSMISSION AND DISTRIBUTION CONFERENCE AND EXPOSITION (T&D), 2012,
  • [50] Impacts of data consistency levels in cloud-based NoSQL for data-intensive applications
    Ferreira, Saulo
    Mendonca, Julio
    Nogueira, Bruno
    Tiengo, Willy
    Andrade, Ermeson
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2024, 13 (01):