Document Relevance Filtering by Natural Language Processing and Machine Learning: A Multidisciplinary Case Study of Patents

被引:0
|
作者
Bridgelall, Raj [1 ]
机构
[1] North Dakota State Univ, Coll Business, Dept Transportat & Supply Chain, POB 6050, Fargo, ND 58108 USA
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 05期
关键词
document search; supervised machine learning; unsupervised machine learning; natural language processing; latent Dirichlet allocation; non-negative matrix factorization; manifold learning; t-distributed stochastic neighbor embedding; term co-occurrence networks; RETRIEVAL;
D O I
10.3390/app15052357
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The exponential growth of patent datasets poses a significant challenge in filtering relevant documents for research and innovation. Traditional semantic search methods based on keywords often fail to capture the complexity and variability in multidisciplinary terminology, leading to inefficiencies. This study addresses the problem by systematically evaluating supervised and unsupervised machine learning (ML) techniques for document relevance filtering across five technology domains: solid-state batteries, electric vehicle chargers, connected vehicles, electric vertical takeoff and landing aircraft, and light detecting and ranging (LiDAR) sensors. The contributions include benchmarking the performance of 10 classical models. These models include extreme gradient boosting, random forest, and support vector machines; a deep artificial neural network; and three natural language processing methods: latent Dirichlet allocation, non-negative matrix factorization, and k-means clustering of a manifold-learned reduced feature dimension. Applying these methods to more than 4200 patents filtered from a database of 9.6 million patents revealed that most supervised ML models outperform the unsupervised methods. An average of seven supervised ML models achieved significantly higher precision, recall, and F1-scores across all technology domains, while unsupervised methods show variability depending on domain characteristics. These results offer a practical framework for optimizing document relevance filtering, enabling researchers and practitioners to efficiently manage large datasets and enhance innovation.
引用
收藏
页数:25
相关论文
共 50 条
  • [41] Optimizing Customer-Agent Interactions with Natural Language Processing and Machine Learning
    Lam, Sophia
    Chen, Charles
    Kim, Kristi
    Wilson, George
    Crews, J. Holt
    Gerber, Matthew S.
    2019 SYSTEMS AND INFORMATION ENGINEERING DESIGN SYMPOSIUM (SIEDS), 2019, : 65 - 70
  • [42] Machine learning based natural language processing of radiology reports in orthopaedic trauma
    Olthof, A. W.
    Shouche, P.
    Fennema, E. M.
    IJpma, F. F. A.
    Koolstra, R. H. C.
    Stirler, V. M. A.
    van Ooijen, P. M. A.
    Cornelissen, L. J.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2021, 208
  • [43] Perspectives of Machine Learning and Natural Language Processing on Characterizing Positive Energy Districts
    Han, Mengjie
    Canli, Ilkim
    Shah, Juveria
    Zhang, Xingxing
    Dino, Ipek Gursel
    Kalkan, Sinan
    BUILDINGS, 2024, 14 (02)
  • [44] Using Natural Language Processing and Machine Learning to Replace Human Content Coders
    Wang, Yilei
    Tian, Jingyuan
    Yazar, Yagizhan
    Ones, Deniz S.
    Landers, Richard N.
    PSYCHOLOGICAL METHODS, 2022, : 1148 - 1163
  • [45] Automating the Assessment of Multicultural Orientation Through Machine Learning and Natural Language Processing
    Goldberg, Simon B.
    Tanana, Michael
    Stewart, Shaakira Haywood
    Williams, Camille Y.
    Soma, Christina S.
    Atkins, David C.
    Imel, Zac E.
    Owen, Jesse
    PSYCHOTHERAPY, 2024,
  • [46] Combining Natural Language Processing and Federated Learning for Consumer Complaint Analysis: A Case Study on Laptops
    Tahsin M.U.
    Shanto M.S.H.
    Rahman R.M.
    SN Computer Science, 4 (5)
  • [47] On Application of Natural Language Processing in Machine Translation
    Zong, Zhaorong
    Hong, Changchun
    2018 3RD INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE), 2018, : 506 - 510
  • [48] Efficient analysis of drug interactions in liver injury: a retrospective study leveraging natural language processing and machine learning
    Ma, Junlong
    Chen, Heng
    Sun, Ji
    Huang, Juanjuan
    He, Gefei
    Yang, Guoping
    BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
  • [49] An Intelligent System for Classifying Patient Complaints Using Machine Learning and Natural Language Processing: Development and Validation Study
    Li, Xiadong
    Shu, Qiang
    Kong, Canhong
    Wang, Jinhu
    Li, Gang
    Fang, Xin
    Lou, Xiaomin
    Yu, Gang
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2025, 27
  • [50] Analyzing Student Communication Patterns in Science Classes Using Machine Learning and Natural Language Processing: A Case Study on High School Science Education
    Jeon, Cheol-Hong
    Shin, Jung-Yun
    Ryu, Suna
    JOURNAL OF SCIENCE EDUCATION AND TECHNOLOGY, 2025,