Learning Biological Sequence Types Using the Literature

被引:1
|
作者
Bouadjenek, Mohamed Reda [1 ]
Verspoor, Karin [1 ]
Zobel, Justin [1 ]
机构
[1] Univ Melbourne, Sch Comp & Informat Syst, Parkville, Vic 3010, Australia
来源
CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT | 2017年
基金
澳大利亚研究理事会;
关键词
Data Analysis; Data Quality; Biological Databases; Data Cleansing;
D O I
10.1145/3132847.3133051
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We explore in this paper automatic biological sequence type classification for records in biological sequence databases. The sequence type attribute provides important information about the nature of a sequence represented in a record, and is often used in search to filter out irrelevant sequences. However, the sequence type attribute is generally a non-mandatory free-text field, and thus it is subject to many errors including typos, mis-assignment, and non assignment. In GenBank, this problem concerns roughly 18% of records, an alarming number that should worry the biocuration community. To address this problem of automatic sequence type classification, we propose the use of literature associated to sequence records as an external source of knowledge that can be leveraged for the classification task. We define a set of literature-based features and train a machine learning algorithm to classify a record into one of six primary sequence types. The main intuition behind using the literature for this task is that sequences appear to be discussed differently in scientific articles, depending on their type. The experiments we have conducted on the PubMed Central collection show that the literature is indeed an effective way to address this problem of sequence type classification. Our classification method reached an accuracy of 92.7%, and substantially outperformed two baseline approaches used for comparison.
引用
收藏
页码:1991 / 1994
页数:4
相关论文
共 50 条
  • [21] Stock Market Index Prediction Using Machine Learning and Deep Learning Techniques
    Saboor, Abdus
    Hussain, Arif
    Agbley, Bless Lord Y.
    ul Haq, Amin
    Li, Jian Ping
    Kumar, Rajesh
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 37 (02) : 1325 - 1344
  • [22] Methods for assessing the effects of environmental parameters on biological communities in long-term ecological studies - A literature review
    Verniest, Fabien
    Greulich, Sabine
    ECOLOGICAL MODELLING, 2019, 414
  • [23] An Evolving Outbreak Simulation Using Active Learning
    Rivard, Rebecca S.
    AMERICAN BIOLOGY TEACHER, 2020, 82 (08) : 545 - 552
  • [24] On using remote user defined functions as wrappers for biological database interoperability
    Chen, LY
    Jamil, HM
    INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2003, 12 (02) : 161 - 195
  • [25] Efficient Exploration of Biological Data using Semantic Web Compatible Databases
    Zaki, Nazar
    Tennakoon, Chandana
    2016 3RD INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2016), 2016, : 67 - 72
  • [26] Identifying Key Learning Factors in Service-Leaning Programs Using Machine Learning
    Wang, Kangzhong
    Fu, Eugene Yujun
    Ngai, Grace
    Leong, Hong Va
    2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 1312 - 1317
  • [27] Semantic speech analysis using machine learning and deep learning techniques: a comprehensive review
    Tyagi, Suryakant
    Szenasi, Sandor
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (29) : 73427 - 73456
  • [28] Adaptive learning objects in the context of eco-connectivist communities using learning analytics
    Diego, Mosquera
    Carlos, Guevara
    Jose, Aguilar
    HELIYON, 2019, 5 (11)
  • [29] Predicting types of occupational accidents at construction sites in Korea using random forest model
    Kang, Kyungsu
    Ryu, Hanguk
    SAFETY SCIENCE, 2019, 120 : 226 - 236
  • [30] Designing online species identification tools for biological recording: the impact on data quality and citizen science learning
    Sharma, Nirwan
    Colucci-Gray, Laura
    Siddharthan, Advaith
    Comont, Richard
    van der Wal, Rene
    PEERJ, 2019, 6