Information extraction from semi-structured data in the protein data bank by induction of a data description pattern

被引:0
|
作者
Kawaguchi, Y [1 ]
Kaneta, Y [1 ]
Ohkawa, T [1 ]
Nakamura, H [1 ]
Ito, N [1 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, Osaka, Japan
来源
METMBS'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES | 2003年
关键词
Protein Data Bank; XML; information extraction; description pattern; induction;
D O I
暂无
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
PDB (Protein Data Bank) is a primary database that stores the three-dimensional data of a protein structure. This paper proposes a system, the PDB REMARK transcoder, that semi-automatically extracts significant data from REMARK lines, a part of the PDB data, and transcodes them to XML (eXtensible Markup Language) format. This system induces a description pattern from some protein entries to accept gradual variations of REMARK lines. Tokens (words and phrases) are clustered by evaluating their similarity using token attributes, and their contents are recognized by cluster labels. By using finite state automatons, description patterns are induced, and then iterative structures are correspondly nested into XML formats. The confidence of the output XML data is confirmed by log files. Applying the system to the REMARK lines of 8,906 protein entries clarified the effectiveness of the method.
引用
收藏
页码:94 / 99
页数:6
相关论文
共 50 条
  • [1] Information extraction from Web pages using semi-structured data alignment
    Kuboyama, Tetsuji
    Miyahara, Tetsuhiro
    Hirokawa, Sachio
    Itou, Eisuke
    WMSCI 2005: 9th World Multi-Conference on Systemics, Cybernetics and Informatics, Vol 1, 2005, : 42 - 47
  • [2] Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web
    Dong, Xin Luna
    Hajishirzi, Hannaneh
    Lockard, Colin
    Shiralkar, Prashant
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3543 - 3544
  • [3] A Survey on Mapping Semi-Structured Data and Graph Data to Relational Data
    Yuan, Gongsheng
    Lu, Jiaheng
    Yan, Zhengtong
    Wu, Sai
    ACM COMPUTING SURVEYS, 2023, 55 (10)
  • [4] Keyword Search on Structured and Semi-Structured Data
    Chen, Yi
    Wang, Wei
    Liu, Ziyang
    Lin, Xuemin
    ACM SIGMOD/PODS 2009 CONFERENCE, 2009, : 1005 - 1009
  • [5] Data Warehouse Based Approach to the Integration of Semi-structured Data
    Ahmad, Houda
    Kermanshahani, Shokoh
    Simonet, Ana
    Simonet, Michel
    ADVANCES IN WEB AND NETWORK TECHNOLOGIES, AND INFORMATION MANAGEMENT, 2009, 5731 : 88 - 99
  • [6] Generating finite-state transducers for semi-structured data extraction from the Web
    Hsu, CN
    Dung, MT
    INFORMATION SYSTEMS, 1998, 23 (08) : 521 - 538
  • [7] Automatic information extraction from semi-structured Web pages by pattern discovery
    Chang, CH
    Hsu, CN
    Lui, SC
    DECISION SUPPORT SYSTEMS, 2003, 35 (01) : 129 - 147
  • [8] A partition index for XML and semi-structured data
    Kim, J
    Kim, HJ
    DATA & KNOWLEDGE ENGINEERING, 2004, 51 (03) : 349 - 368
  • [9] Storing semi-structured data on disk drives
    Bhadkamkar, Medha
    Farfan, Fernando
    Hristidis, Vagelis
    Rangaswami, Raju
    ACM Transactions on Storage, 2009, 5 (02) : 1 - 35
  • [10] Business information extraction from semi-structured webpages
    Sung, NH
    Chang, YS
    EXPERT SYSTEMS WITH APPLICATIONS, 2004, 26 (04) : 575 - 582