Causal gene identification using combinatorial V-structure search

被引:29
作者
Cai, Ruichu [1 ,2 ]
Zhang, Zhenjie [3 ]
Hao, Zhifeng [1 ]
机构
[1] Guangdong Univ Technol, Fac Comp Sci, Guangzhou, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[3] Illinois Singapore Pte Ltd, Adv Digital Sci Ctr, Singapore, Singapore
关键词
Causal gene; V-Structure; Gene expression data; Causality; MARKOV BLANKET INDUCTION; FEATURE-SELECTION; BAYESIAN NETWORK; LOCAL CAUSAL; EXPRESSION; DISCOVERY; ALGORITHM;
D O I
10.1016/j.neunet.2013.01.025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the advances of biomedical techniques in the last decade, the costs of human genomic sequencing and genomic activity monitoring are coming down rapidly. To support the huge genome-based business in the near future, researchers are eager to find killer applications based on human genome information. Causal gene identification is one of the most promising applications, which may help the potential patients to estimate the risk of certain genetic diseases and locate the target gene for further genetic therapy. Unfortunately, existing pattern recognition techniques, such as Bayesian networks, cannot be directly applied to find the accurate causal relationship between genes and diseases. This is mainly due to the insufficient number of samples and the extremely high dimensionality of the gene space. In this paper, we present the first practical solution to causal gene identification, utilizing a new combinatorial formulation over V-Structures commonly used in conventional Bayesian networks, by exploring the combinations of significant V-Structures. We prove the NP-hardness of the combinatorial search problem under a general settings on the significance measure on the V-Structures, and present a greedy algorithm to find suboptimal results. Extensive experiments show that our proposal is both scalable and effective, particularly with interesting findings on the causal genes over real human genome data. (c) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:63 / 71
页数:9
相关论文
共 33 条
  • [1] Aliferis CF, 2010, J MACH LEARN RES, V11, P171
  • [2] Aliferis CF, 2010, J MACH LEARN RES, V11, P235
  • [3] [Anonymous], 2009, CAUSALITY MODELS REA
  • [4] The centrosome in human genetic disease
    Badano, JL
    Teslovich, TM
    Katsanis, N
    [J]. NATURE REVIEWS GENETICS, 2005, 6 (03) : 194 - 205
  • [5] Gene expression informatics - it's all in your mine
    Bassett, DE
    Eisen, MB
    Boguski, MS
    [J]. NATURE GENETICS, 1999, 21 (Suppl 1) : 51 - 55
  • [6] Bromberg F, 2009, J MACH LEARN RES, V10, P301
  • [7] BASSUM: A Bayesian semi-supervised method for classification feature selection
    Cai, Ruichu
    Zhang, Zhenjie
    Hao, Zhifeng
    [J]. PATTERN RECOGNITION, 2011, 44 (04) : 811 - 820
  • [8] What is Unequal among the Equals? Ranking Equivalent Rules from Gene Expression Data
    Cai, Ruichu
    Tung, Anthony K. H.
    Zhang, Zhenjie
    Hao, Zhifeng
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (11) : 1735 - 1747
  • [9] An efficient gene selection algorithm based on mutual information
    Cai, Ruichu
    Hao, Zhifeng
    Yang, Xiaowei
    Wen, Wen
    [J]. NEUROCOMPUTING, 2009, 72 (4-6) : 991 - 999
  • [10] Mapping complex disease traits with global gene expression
    Cookson, William
    Liang, Liming
    Abecasis, Goncalo
    Moffatt, Miriam
    Lathrop, Mark
    [J]. NATURE REVIEWS GENETICS, 2009, 10 (03) : 184 - 194