Context-aware automated quality assessment of textual data

被引:0
作者
Mylavarapu G. [1 ]
Viswanathan K.A. [2 ]
Thomas J. [2 ]
机构
[1] Department of Computer Science and Information Systems, Murray State University, Murray
[2] Department of Computer Science, Oklahoma State University, Stillwater, OK
关键词
automated data quality assessment; context-aware; data accuracy; data consistency; data context; Doc2Vec; lexicon; sentiment analysis; textual data;
D O I
10.1504/IJBIDM.2023.130588
中图分类号
学科分类号
摘要
Data analysis is a crucial process in the field of data science that extracts useful information from any form of data. With the rapid growth of technology, more and more unstructured data, such as text and images, are being produced in large amounts. Apart from the analytical techniques used, the quality of the data plays a prominent role in the accurate analysis. Data quality becomes inferior to poor maintenance and mediocre data generation strategies employed by amateur users. This problem escalates with the advent of big data. In this paper, we propose a quality assessment model for the textual form of unstructured data (TDQA). The context of data plays an important role in determining the quality of the data. Therefore, we automate the process of context extraction in textual data using natural language processing to identify data errors and assess quality. Copyright 2023 Inderscience Enterprises Ltd.
引用
收藏
页码:451 / 469
页数:18
相关论文
共 23 条
  • [21] Wang Z., Wang D., Li Q., Keyword extraction from scientific research projects based on SRP-TF-IDF, Chinese Journal of Electronics, 30, 4, pp. 652-657, (2021)
  • [22] Zhang X., Wang T., Topic tracking with dynamic topic model and topic-based weighting method, Journal of Software, 5, 5, pp. 482-489, (2010)
  • [23] Zhang Y., Zhou Y., Yao J.T., Feature extraction with TF-IDF and game-theoretic shadowed sets, International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 722-733, (2020)