Data science in light of natural language processing: An overview

被引:13
|
作者
Zeroual, Imad [1 ]
Lakhouaja, Abdelhak [1 ]
机构
[1] Mohamed First Univ, Fac Sci, Av Med 6 BP 717, Oujda 60000, Morocco
来源
PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS2017) | 2018年 / 127卷
关键词
Data science; Natural language processing; Data driven approches; Corpora; Machine learning;
D O I
10.1016/j.procs.2018.01.101
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The focus of data scientists is essentially divided into three areas: collecting data, analyzing data, and inferring information from data. Each one of these tasks requires special personnel, takes time, and costs money. Yet, the next and the fastidious step is how to turn data into products. Therefore, this field grabs the attention of many research groups in academia as well as industry. In the last decades, data-driven approaches came into existence and gained more popularity because they require much less human effort. Natural Language Processing (NLP) is strongly among the fields influenced by data. The growth of data is behind the performance improvement of most NLP applications such as machine translation and automatic speech recognition. Consequently, many NLP applications are frequently moving from rule-based systems and knowledge-based methods to data driven approaches. However, collected data that are based on undefined design criteria or on technically unsuitable forms will be useless. Also, they will be neglected if the size is not enough to perform the required analysis and to infer the accurate information. The chief purpose of this overview is to shed some lights on the vital role of data in various fields and give a better understanding of data in light of NLP. Expressly, it describes what happen to data during its life-cycle: building, processing, analyzing, and exploring phases. (C) 2018 The Authors. Published by Elsevier B.V.
引用
收藏
页码:82 / 91
页数:10
相关论文
共 50 条
  • [31] Natural language processing systems for pathology parsing in limited data environments with uncertainty estimation
    Odisho, Anobel Y.
    Park, Briton
    Altieri, Nicholas
    DeNero, John
    Cooperberg, Matthew R.
    Carroll, Peter R.
    Yu, Bin
    JAMIA OPEN, 2020, 3 (03) : 431 - 438
  • [32] A methodology for the resolution of cashtag collisions on Twitter - A natural language processing & data fusion approach
    Evans, Lewis
    Owda, Majdi
    Crockett, Keeley
    Fernandez Vilas, Ana
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 127 : 353 - 369
  • [33] UMLS-based data augmentation for natural language processing of clinical research literature
    Kang, Tian
    Perotte, Adler
    Tang, Youlan
    Ta, Casey
    Weng, Chunhua
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2021, 28 (04) : 812 - 823
  • [34] A New Data Structure for Processing Natural Language Database Queries
    Frost, Richard A.
    Peelar, Shane
    WEBIST: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, 2019, : 80 - 87
  • [35] Solutions of Creating Large Data Resources in Natural Language Processing
    Huynh Cong Phap
    RECENT DEVELOPMENTS IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2016, 642 : 243 - 253
  • [36] Natural language processing approach for distributed health data management
    Forestiero, Agostino
    Papuzzo, Giuseppe
    2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020), 2020, : 360 - 363
  • [37] Teanga: A Linked Data based platform for Natural Language Processing
    Ziad, Housam
    McCrae, John P.
    Buitelaar, Paul
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2410 - 2415
  • [38] The big data era: The usefulness of folksonomy for natural language processing
    Sans, Laia
    Vallve, Ismael
    Teixido, Joan
    Picas, Josep Manel
    Martinez-Roldan, Jordi
    Pascual, Julio
    NEFROLOGIA, 2022, 42 (06): : 680 - 687
  • [39] A multi-dimensional data organization for natural language processing
    Cheng, Kam-Hoi
    Faris, Waleed
    Journal of Computational Methods in Sciences and Engineering, 2009, 9 (SUPPL.1) : 81 - 90
  • [40] Using natural language processing technology for qualitative data analysis
    Crowston, Kevin
    Allen, Eileen E.
    Heckman, Robert
    INTERNATIONAL JOURNAL OF SOCIAL RESEARCH METHODOLOGY, 2012, 15 (06) : 523 - 543