Feature Extraction of Dialogue Text Based on Big Data and Machine Learning

被引:0
作者
Liu X. [1 ]
Zhang H. [1 ]
Cheng Y. [2 ]
机构
[1] Weifang University, China
[2] Adamson University, Philippines
关键词
Big data; Dialogue text; Feature extraction; Machine learning; Text features;
D O I
10.4018/IJWLTT.337602
中图分类号
学科分类号
摘要
In this article, a dialogue text feature extraction model based on big data and machine learning is constructed, which transforms the high-dimensional space of text features into the low-dimensional space that is easy to process, so that the best feature words can be selected to represent the document set. Tests show that in most cases, the classification accuracy of this model is higher than 88%, and the recall rate is higher than 85%, thus achieving the goal of higher classification accuracy with less computation. When extracting the features of dialogue texts, there is no need for preprocessing, just count the data such as lexical composition, sentence length and sentence-to-sentence relationship of the target text, and make linear analysis to obtain key indicators and weights. Based on this, the classification model can achieve good results, thus effectively reducing the workload and computation of text classification. © 2024 IGI Global. All rights reserved.
引用
收藏
相关论文
共 26 条
  • [1] Barbantan I., Porumb M., Lemnaru C., Potolea R., feature engineered relation extraction – Medical documents setting, International Journal of Web Information Systems, 12, 3, pp. 336-358, (2016)
  • [2] Bharti K. K., Singh P. K., Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering, Expert Systems with Applications, 42, 6, pp. 3105-3114, (2015)
  • [3] Calvo H., Paredes J. L., Figueroa-Nazuno J., Measuring concept semantic relatedness through common spatial pattern feature extraction on EEG signals, Cognitive Systems Research, 50, 8, pp. 36-51, (2018)
  • [4] Chakroborty S., Saha G., Feature selection using singular value decomposition and QR factorization with column pivoting for text-independent speaker identification, Speech Communication, 52, 9, pp. 693-709, (2010)
  • [5] Chambua J., Niu Z., Yousif A., Mbelwa J., tensor factorization method based on review text semantic similarity for rating prediction, Expert Systems with Applications, 114, 12, pp. 629-638, (2018)
  • [6] Chapman W. W., Savova G. K., Zheng J., Tharp M., Crowley R., Anaphoric reference in clinical reports: Characteristics of an annotated corpus, Journal of Biomedical Informatics, 45, 3, pp. 507-521, (2012)
  • [7] Dong J., Li X., Snoek C., Predicting visual features from text for image and video caption retrieval, IEEE Transactions on Multimedia, 20, 12, pp. 3377-3388, (2018)
  • [8] Figueiredo F., Rocha L., Couto T., Salles T., Goncalves M. A., Meira W., Word co-occurrence features for text classification, Information Systems, 36, 5, pp. 843-858, (2011)
  • [9] Garla V. N., Brandt C., Ontology-guided feature engineering for clinical text classification, Journal of Biomedical Informatics, 45, 5, pp. 992-998, (2012)
  • [10] Jiang Q., Wu Z., Kang J., Semantic key generation based on natural language, International Journal of Intelligent Systems, 37, 7, pp. 4041-4064, (2022)