Information extraction from semi-structured data in the protein data bank by induction of a data description pattern

被引:0
|
作者
Kawaguchi, Y [1 ]
Kaneta, Y [1 ]
Ohkawa, T [1 ]
Nakamura, H [1 ]
Ito, N [1 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, Osaka, Japan
来源
METMBS'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES | 2003年
关键词
Protein Data Bank; XML; information extraction; description pattern; induction;
D O I
暂无
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
PDB (Protein Data Bank) is a primary database that stores the three-dimensional data of a protein structure. This paper proposes a system, the PDB REMARK transcoder, that semi-automatically extracts significant data from REMARK lines, a part of the PDB data, and transcodes them to XML (eXtensible Markup Language) format. This system induces a description pattern from some protein entries to accept gradual variations of REMARK lines. Tokens (words and phrases) are clustered by evaluating their similarity using token attributes, and their contents are recognized by cluster labels. By using finite state automatons, description patterns are induced, and then iterative structures are correspondly nested into XML formats. The confidence of the output XML data is confirmed by log files. Applying the system to the REMARK lines of 8,906 protein entries clarified the effectiveness of the method.
引用
收藏
页码:94 / 99
页数:6
相关论文
共 50 条
  • [41] Announcing the launch of Protein Data Bank China as an Associate Member of the Worldwide Protein Data Bank Partnership
    Xu, Wenqing
    Velankar, Sameer
    Patwardhan, Ardan
    Hoch, Jeffrey C.
    Burley, Stephen K.
    Kurisu, Genji
    ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2023, 79 : 792 - 795
  • [42] Silver and gold in the Protein Data Bank
    Carugo, Oliviero
    JOURNAL OF INORGANIC BIOCHEMISTRY, 2017, 175 : 244 - 247
  • [43] Waterless structures in the Protein Data Bank
    Wlodawer, Alexander
    Dauter, Zbigniew
    Rubach, Pawel
    Minor, Wladek
    Loch, Joanna, I
    Brzezinski, Dariusz
    Gilski, Miroslaw
    Jaskolski, Mariusz
    IUCRJ, 2024, 11 : 966 - 976
  • [44] Intrinsic disorder in the protein data bank
    Le Gall, Tanguy
    Romero, Pedro R.
    Cortese, Marc S.
    Uversky, Vladimir N.
    Dunker, A. Keith
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2007, 24 (04) : 325 - 341
  • [45] Traffic Condition Information Extraction From Twitter Data
    Herwanto, Guntur Budi
    Dewantara, Deny Prasetya
    2018 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICELTICS): INTELLIGENT DEVICES AND COMPUTING FOR ACCELERATING INDUSTRY 4.0 AND ENRICHING SMART SOCIETIES, 2018, : 95 - 100
  • [46] On the dynamical incompleteness of the Protein Data Bank
    Marino-Buslje, Cristina
    Miguel Monzon, Alexander
    Zea, Diego Javier
    Silvina Fornasari, Maria
    Parisi, Gustavo
    BRIEFINGS IN BIOINFORMATICS, 2019, 20 (01) : 356 - 359
  • [47] Information Extraction from Unstructured Data using RDF
    Gandhi, Kalgi
    Madia, Nidhi
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON ICT IN BUSINESS INDUSTRY & GOVERNMENT (ICTBIG), 2016,
  • [48] Bio2X: a rule-based approach for semi-automatic transformation of semi-structured biological data to XML
    Yang, S
    Bhowmick, SS
    Madria, S
    DATA & KNOWLEDGE ENGINEERING, 2005, 52 (02) : 249 - 271
  • [49] Data independent induction over structured networks
    Creese, SJ
    Roscoe, AW
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 2995 - 3001
  • [50] A Rule-based Information Extraction System for Human-readable Semi-structured Scientific Documents
    Chen, Gang
    An, Baoran
    Zeng, Sifeng
    PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 75 - 84