Data science in light of natural language processing: An overview

被引:13
|
作者
Zeroual, Imad [1 ]
Lakhouaja, Abdelhak [1 ]
机构
[1] Mohamed First Univ, Fac Sci, Av Med 6 BP 717, Oujda 60000, Morocco
来源
PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS2017) | 2018年 / 127卷
关键词
Data science; Natural language processing; Data driven approches; Corpora; Machine learning;
D O I
10.1016/j.procs.2018.01.101
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The focus of data scientists is essentially divided into three areas: collecting data, analyzing data, and inferring information from data. Each one of these tasks requires special personnel, takes time, and costs money. Yet, the next and the fastidious step is how to turn data into products. Therefore, this field grabs the attention of many research groups in academia as well as industry. In the last decades, data-driven approaches came into existence and gained more popularity because they require much less human effort. Natural Language Processing (NLP) is strongly among the fields influenced by data. The growth of data is behind the performance improvement of most NLP applications such as machine translation and automatic speech recognition. Consequently, many NLP applications are frequently moving from rule-based systems and knowledge-based methods to data driven approaches. However, collected data that are based on undefined design criteria or on technically unsuitable forms will be useless. Also, they will be neglected if the size is not enough to perform the required analysis and to infer the accurate information. The chief purpose of this overview is to shed some lights on the vital role of data in various fields and give a better understanding of data in light of NLP. Expressly, it describes what happen to data during its life-cycle: building, processing, analyzing, and exploring phases. (C) 2018 The Authors. Published by Elsevier B.V.
引用
收藏
页码:82 / 91
页数:10
相关论文
共 50 条
  • [21] Language as a biomarker for psychosis: A natural language processing approach
    Corcoran, Cheryl M.
    Mittal, Vijay A.
    Bearden, Carrie E.
    Gur, Raquel E.
    Hitczenko, Kasia
    Bilgrami, Zarina
    Savic, Aleksandar
    Cecchi, Guillermo A.
    Wolff, Phillip
    SCHIZOPHRENIA RESEARCH, 2020, 226 : 158 - 166
  • [22] Bayesian Analysis in Natural Language Processing
    Cohen S.
    Synthesis Lectures on Human Language Technologies, 2016, 9 (02): : 1 - 276
  • [23] Review of Natural Language Processing in Radiology
    Luo, Jack W.
    Chong, Jaron J. R.
    NEUROIMAGING CLINICS OF NORTH AMERICA, 2020, 30 (04) : 447 - +
  • [24] Designing a Natural Language Processing System to Support Social Science Research
    Gone, Keshava Pallavi
    Smit, Michael
    PROCEEDINGS OF THE 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2023, 2023, : 345 - 347
  • [25] Social Science for Natural Language Processing: A Hostile Narrative Analysis Prototype
    Anning, Stephen
    Konstantinidis, George
    Webber, Craig
    PROCEEDINGS OF THE 13TH ACM WEB SCIENCE CONFERENCE, WEBSCI 2021, 2020, : 102 - 111
  • [26] Data science and AI in FinTech: an overview
    Cao, Longbing
    Yang, Qiang
    Yu, Philip S.
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2021, 12 (02) : 81 - 99
  • [27] Electronic Medical Record Data Mining and Processing Based on Natural Language Processing
    Zhang, Shichen
    PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND DIGITAL APPLICATIONS, MIDA2024, 2024, : 212 - 217
  • [28] Data science and AI in FinTech: an overview
    Longbing Cao
    Qiang Yang
    Philip S. Yu
    International Journal of Data Science and Analytics, 2021, 12 : 81 - 99
  • [29] Exploring Multimodal Data Approach in Natural Language Processing Based on Speech Recognition Algorithms
    Oleh, Basystiuk
    Ihor, Farmaha
    Zoriana, Rybchak
    2023 17TH INTERNATIONAL CONFERENCE ON THE EXPERIENCE OF DESIGNING AND APPLICATION OF CAD SYSTEMS, CADSM, 2023,
  • [30] Machine learning and natural language processing on the patent corpus: Data, tools, and new measures
    Balsmeieri, Benjamin
    Assaf, Mohamad
    Chesebro, Tyler
    Fierro, Gabe
    Johnson, Kevin
    Johnson, Scott
    Li, Guan-Cheng
    Lueck, Sonja
    O'Reagan, Doug
    Yeh, Bill
    Zang, Guangzheng
    Fleming, Lee
    JOURNAL OF ECONOMICS & MANAGEMENT STRATEGY, 2018, 27 (03) : 535 - 553