Document Relevance Filtering by Natural Language Processing and Machine Learning: A Multidisciplinary Case Study of Patents

被引:0
|
作者
Bridgelall, Raj [1 ]
机构
[1] North Dakota State Univ, Coll Business, Dept Transportat & Supply Chain, POB 6050, Fargo, ND 58108 USA
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 05期
关键词
document search; supervised machine learning; unsupervised machine learning; natural language processing; latent Dirichlet allocation; non-negative matrix factorization; manifold learning; t-distributed stochastic neighbor embedding; term co-occurrence networks; RETRIEVAL;
D O I
10.3390/app15052357
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The exponential growth of patent datasets poses a significant challenge in filtering relevant documents for research and innovation. Traditional semantic search methods based on keywords often fail to capture the complexity and variability in multidisciplinary terminology, leading to inefficiencies. This study addresses the problem by systematically evaluating supervised and unsupervised machine learning (ML) techniques for document relevance filtering across five technology domains: solid-state batteries, electric vehicle chargers, connected vehicles, electric vertical takeoff and landing aircraft, and light detecting and ranging (LiDAR) sensors. The contributions include benchmarking the performance of 10 classical models. These models include extreme gradient boosting, random forest, and support vector machines; a deep artificial neural network; and three natural language processing methods: latent Dirichlet allocation, non-negative matrix factorization, and k-means clustering of a manifold-learned reduced feature dimension. Applying these methods to more than 4200 patents filtered from a database of 9.6 million patents revealed that most supervised ML models outperform the unsupervised methods. An average of seven supervised ML models achieved significantly higher precision, recall, and F1-scores across all technology domains, while unsupervised methods show variability depending on domain characteristics. These results offer a practical framework for optimizing document relevance filtering, enabling researchers and practitioners to efficiently manage large datasets and enhance innovation.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] Machine Learning and Natural Language Processing in Mental Health: Systematic Review
    Le Glaz, Aziliz
    Haralambous, Yannis
    Kim-Dufor, Deok-Hee
    Lenca, Philippe
    Billot, Romain
    Ryan, Taylor C.
    Marsh, Jonathan
    DeVylder, Jordan
    Walter, Michel
    Berrouiguet, Sofian
    Lemey, Christophe
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (05)
  • [22] RESEARCH ON THE TEXT CLASSIFICATION BASED ON NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING
    Chen Keming
    Zheng Jianguo
    JOURNAL OF THE BALKAN TRIBOLOGICAL ASSOCIATION, 2016, 22 (03): : 2484 - 2494
  • [23] An intelligent patent recommender adopting machine learning approach for natural language processing: A case study for smart machinery technology mining
    Trappey, Amy
    Trappey, Charles V.
    Hsieh, Alex
    TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2021, 164
  • [24] Intelligent requirement-to-test-case traceability system via Natural Language Processing and Machine Learning
    Sawada, Kae
    Pomerantz, Marc
    Razo, Gus
    Clark, Michael W.
    2023 IEEE 9TH INTERNATIONAL CONFERENCE ON SPACE MISSION CHALLENGES FOR INFORMATION TECHNOLOGY, SMC-IT, 2023, : 78 - 83
  • [25] Analysis of Breakdown Reports Using Natural Language Processing and Machine Learning
    Ahmed, Mobyen Uddin
    Bengtsson, Marcus
    Salonen, Antti
    Funk, Peter
    INTERNATIONAL CONGRESS AND WORKSHOP ON INDUSTRIAL AI 2021, 2022, : 40 - 52
  • [26] Classifying Invention Objectives of Electric Vehicle Chargers through Natural Language Processing and Machine Learning
    Bridgelall, Raj
    INVENTIONS, 2023, 8 (06)
  • [27] From NLP (Natural Language Processing) to MLP (Machine Language Processing)
    Teufl, Peter
    Payer, Udo
    Lackner, Guenter
    COMPUTER NETWORK SECURITY, 2010, 6258 : 256 - +
  • [28] Presumptive Detection of Cyberbullying on Twitter through Natural Language Processing and Machine Learning in the Spanish Language
    Leon-Paredes, Gabriel A.
    Palomeque-Leon, Wilson F.
    Gallegos-Segovia, Pablo L.
    Vintimilla-Tapia, Paul E.
    Bravo-Torres, Jack F.
    Barbosa-Santillan, Liliana, I
    Paredes-Pinos, Maria M.
    2019 IEEE CHILEAN CONFERENCE ON ELECTRICAL, ELECTRONICS ENGINEERING, INFORMATION AND COMMUNICATION TECHNOLOGIES (CHILECON), 2019,
  • [29] An Empirical Study on Patent Novelty Detection: A Novel Approach Using Machine Learning and Natural Language Processing
    Chikkamath, Renukswamy
    Endres, Markus
    Bayyapu, Lavanya
    Hewel, Christoph
    2020 SEVENTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORK ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2020, : 135 - 141
  • [30] CivilityCheck: An Integrated Natural Language Processing and Machine Learning Framework to Detect Hateful and Offensive Language
    Bonthu, Bhulakshmi
    Abhay, Potluri
    Gottipati, Likitha Sai
    Vamsi, Gangisetty Krishna
    2ND INTERNATIONAL CONFERENCE ON SUSTAINABLE COMPUTING AND SMART SYSTEMS, ICSCSS 2024, 2024, : 985 - 988