Content Extraction of Biological Datasets Using Soft Computing Techniques

被引:4
作者
Prakash, Kolla Bhanu [1 ,2 ]
Rangaswamy, M. A. Dorai [1 ]
机构
[1] Sathyabama Univ, Fac Comp Sci Engn, Madras 600119, Tamil Nadu, India
[2] Chirala Engn Coll, Fac Comp, Chirala 523157, India
关键词
Content Extraction; Biology; Attribute; Multilingual; Pattern;
D O I
10.1166/jmihi.2016.1931
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Content extraction and identification of biological datasets is gaining prominence in the present day. Since, many biological datasets are available online, it becomes difficult to identify or extract content from similar datasets. Especially, when it comes to multilingual web documents, this becomes more difficult. Content extraction is the process of identifying main content of a web page which may consist of different forms of data in an unstructured and non-homogeneous manner. The present study is an attempt to develop a pixel-based approach-which gives flexibility in dealing with any language or media- and start from generic text level to a hybrid unstructured level. The proposed technique is purely data driven and does not make use of domain dependent background information, nor does it rely on predefined document categories or a given list of topics. Model is tested with different attribute inputs and it is found that a minimum of 2 x 2 attribute is required to assess the content. But after testing with several biological data sets it is found that 3 x 3 attribute gives better result for analysis and content extraction. This is later tested with other language words to form a more elaborate base set.
引用
收藏
页码:932 / 936
页数:5
相关论文
共 12 条
[1]  
Cai D., 2013, MSRTR200379, P98052
[2]  
Debnath S, 2005, LECT NOTES COMPUT SC, V3488, P285
[3]   Content code blurring: A new approach to content extraction [J].
Gottron, Thomas .
DEXA 2008: 19TH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2008, :29-33
[4]  
Gupta S., 2003, P 12 INT C WORLD WID, P207, DOI DOI 10.1145/775152.775182
[5]  
Hawkey Kirstie., 2005, CHI'05 Extended Abstracts on Human Factors in Computing Systems, CHI EA'05, page, P1443, DOI [10.1145/ 1056808.1056937, DOI 10.1145/1056808.1056937]
[6]  
Jones W., 2002, P ASIST US
[7]  
Li Y., 2002, IMAGE DATABASES, P261
[8]  
Mantratzis C., 2005, HYPERTEXT 05, P145
[9]  
Moreno A.J., 2009, Proceedings of the 9th Dutch-Belgian information retrieval workshop, P50, DOI DOI 10.1145/1299015.1299021
[10]  
Pinto D., 2002, JCDL 2002. Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, P46, DOI 10.1145/544220.544228